Latency and Throughput as key components of network performance

We have recently added another transit feed to our New York PoP, with a declared aim to bring down latency between London and New York to sub 70ms. We are more than happy to be able to state that current latencies between London Telehouse and New York are now around 67ms. An update to our latency overview has been posted here as well: https://worralorrasurfa.castlegem.co.uk/whmcs/knowledgebase.php?action=displayarticle&id=43.

With that, we want to explain the essentials of latency and throughput a bit.

Network latency in general states how long it takes for a packet to travel the distance between its source and destination. Network throughput, however, defines how much of your data you can send in one go (per time unit). Latency and througput are usually not directly related, unless we are in a situation where a link becomes saturated (upon which throughput will decrease, and latencies will most likely increase), and different applications or purposes require varying degrees of quality in terms of latency and throughput.

For example, if you want to manage a Linux server via ssh from home, you would like to see small latencies: you want to see what you type right away and not have to wait for ages for the characters to appear on your screen on the shell. Latency here is key, but throughput is not that important: ssh does not need enormous amounts of bandwidth. Now, video streaming is something different. If you want to watch youtube videos, you want the videos to come down your internet connection as smooth as if you were watching TV at home. In this case you need decent throughput, i.e. a lot of data per time unit, but latency is not that much of an issue here: it wont matter much if your video starts after 1 or 2 seconds, just as long as it is smooth.

Currently, we see emphasis on small latencies increasing. While this has always been a big concern for us due to the nature of our clients (a real lot of them are traders who require superb latencies to the exchanges), throughput used to be the decisive parameter for internet connections. Part of this shift in emphasis, we believe, is caused by the fact that nowadays most typical internet applications live very well with bandwidths available.

How can we measure latency and throughput? For latencies, ping, traceroute, and mtr are excellent friends. We wrote about these in a previous post, but let’s go into some examples:

ping

ping, put simply, checks the connectivity between source and destination:

# ping HOSTNAME
PING HOSTNAME (IP) 56(84) bytes of data.
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=1 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=2 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=3 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=4 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=5 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=6 ttl=60 time=66.8 ms

We can see that the latency between our host (a London Telehouse server) and the destination (one of our routers in New York) is pretty much 66.8ms. ping takes different arguments such as the size of the packets, or the number of packets to be sent, etc. The manpage (man ping) will give you details.

Traceroute

traceroute will not only check the latency between the source and destination, but will also show latencies (and thus possible issues) on the way there:

# traceroute HOSTNAME
traceroute to HOSTNAME(IP), 30 hops max, 60 byte packets
 1  ... (...)  0.419 ms  0.463 ms  0.539 ms
 2  40ge1-3.core1.lon2.he.net (195.66.224.21)  10.705 ms  10.706 ms  10.422 ms
 3  100ge1-1.core1.nyc4.he.net (72.52.92.166)  67.176 ms  67.189 ms  67.174 ms
 4  10ge9-7.core1.sjc2.he.net (184.105.213.197)  141.010 ms  140.897 ms  140.928 ms
 5  10ge1-2.core1.fmt2.he.net (72.52.92.73)  136.597 ms  136.746 ms  136.885 ms
 6  ....castlegem.co.uk (IP)  136.855 ms  136.437 ms  136.635 ms

As we can see, we get rather stable latencies throughout all the way from London to California. Large variations in the latencies on the way are not necessarily an indication for issues yet, though, as long as the destination latencies are still smooth and regular. Possible reasons for deviations on the way to your destination could be routers rate limiting their replies or, in worse case, routers or networks indeed being congested (we will get to measuring throughput shortly).

MTR

mtr can in a way be considered the combination of ping and traceroute. It displays the network path packets travel, and it keeps doing that by sending packet after packet.

HOSTNAME (0.0.0.0)                                                                   Fri Mar  7 09:51:28 2014
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                                         Packets               Pings
 Host                                                                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. IP                                                                                  0.0%    28    0.3   1.9   0.3  41.4   7.7
 2. vl365-globalcrossing-peer.jump.net.uk                                               0.0%    28    0.3   5.8   0.3  64.3  16.6
 3. po7-20G.ar4.CHI2.gblx.net                                                           0.0%    28  259.2 114.3  89.3 259.2  57.0
 4. DESTINATION                                                                         0.0%    28   91.8  91.9  91.6  94.5   0.6

We can see that hop #3 has a large standard deviation, but latency to the destination is very consistent. In our case, this is from London to Chicago. Hop #3 simply seems to rate limit these probing packets, hence has a larger latency, or/and is busy doing other things than talking to us. It would not be uncommon to see packet loss on the routers either – this is fine and also due to rate limiting mechanisms – just as long as the destination latency is still consistent, i.e. no packet loss, and no extreme deviations.

That is all good – but how do we check throughput? There are several makeshift means to measure throughput, they range from timing browser requests on the command line (such as time lynx -source http://www.google.com/ > /dev/null) to using ftp with hashmarks on and the more common wget http://HOST/testfile. These will all give you a cursory glimpse into how fast you can download data from a destination to computer. There is, however, a very nice tool called iperf that does this job in a very professional manner.

iperf

iperf can measure throughput between two network locations, and it can give you a good idea of bottlenecks when used in combination with traceroute or mtr. The drawback of iperf is that you not only need a client, but also a server to connect to. iperf is thus primarily indeed more of a professional tool, i.e. something set up between providers or commercial clients and their providers to sort out potential issues, define SLAs, etc.

There is an excellent introductory article on iperf from 2007, which we are happy to link to here: http://www.enterprisenetworkingplanet.com/netos/article.php/3657236/Measure-Network-Performance-with-iperf.htm.

Example output, both from the server and client side, can be seen below:

# ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local IPx port 5001 connected with IPy port 59508
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec   566 MBytes   472 Mbits/sec
# ./iperf -c HOSTNAME -t 10
------------------------------------------------------------
Client connecting to HOSTNAME, TCP port 5001
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[  3] local IPy port 59508 connected with IPx port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   566 MBytes   474 Mbits/sec

Here we conclude our brief overview and hope that some of you will find it useful indeed!

NTP Amplification Attack – the gist of it

With recent DDOS attacks increasingly using NTP as an attack vector, and one of Cloudflare’s clients recently having been hit with a DDOS attack just short of 400gbps, we believe it is necessary to summarise what’s been going on, how such attacks are made possible at all, and what the community, and providers can do to prevent or mitigate such attacks as best possible.

A concise overview by means of a CERT alert can be found here: https://www.us-cert.gov/ncas/alerts/TA14-013A.

Essentially, an attacker send a certain command to a vulnerable NTP server, using a spoofed source address. The command itself is very short and produces very little traffic. The response, however, is a lot larger, besides the response is going to be sent back to the spoofed source address. This response is typically about 206 times larger than the initial request – hence the name amplification – a very effective means to quickly fill up even very powerful internet pipes.

Cloudflare published a very interesting article as well, giving a quick overview about the most recent attack and the technology behind it: http://blog.cloudflare.com/technical-details-behind-a-400gbps-ntp-amplification-ddos-attack.

The recommended course of action here is to secure your NTP server (cf.https://isc.sans.edu/diary/NTP+reflection+attack/17300) , as well as ensure that spoofed packets do not leave your network. Sample procedures are explained at BCP38.info.

iftop – or where’s my server’s bandwidth going?!

During the past weeks we gave a small introduction to UNIX and Linux commands that may be nice to have at hand when it comes to administrating a server from the command shell, making some quick changes, or generally assisting a sysadmin with her every day tasks.

Today we want to have a look at iftop – a small program that allows you to check what your dedicated or virtual private server is doing in terms of internet traffic: where packets go to, and where they come from.

This is useful when you want to investigate some process or virtual machine hogging bandwidth on a server, or when you see unsual traffic patterns from your monitoring systems.

The syntax as such is very simple, for a start it should be sufficient to run

# /usr/sbin/iftop -i eth1 -p -P

from the shell (you will typically need root privileges). The -i switch lets you specify which interface to listen on, -p runs iftop in promiscuous mode (necessary for some virtualisation architectures), and -P shows portnumbers/services in addition to hosts.

On a standard CentOS install, iftop needs extra repositories to be installed (or to be compiled from source), and you will need (n)curses and libpcap packages installed as well.

 

Additional and in-depth information can be found here:
http://www.ex-parrot.com/pdw/iftop/ (author, source code)
http://www.cyberciti.biz/faq/centos-fedora-redhat-install-iftop-bandwidth-monitoring-tool/ (overview, examples)
http://sickbits.net/iftop-finding-traffic-hogs/ (overview, examples)

 

IOPS and RAID considerations

IOPS (input/output operations per second) are still – maybe even more so than ever – the most prominent and important metric to measure storage performance. With SSD technology finding its way into affordable, mainstream server solutions, providers are eager to outdo each other offering ever higher IOPS dedicated servers and virtual private servers.

While SSD based servers will perform vastly better than SATA or SAS based ones, especially for random I/O, the type of storage alone isn’t everything. Vendors will often quote performance figures using lab conditions only, i.e. the best possible environment for their own technology. In reality, however, we are facing different conditions – several clients competing for I/O, as well as a wide ranging mix of random reads and writes along with sequential I/O (imagine 20 VPS doing dd bs=1M count=128 if=/dev/zero of=test conv=fdatasync).

Since most providers won’t offer their servers without RAID storage, let’s have a look at how RAID setups impact IOPS then. Read operations will usually not incur any penalty since they can use any disk in the array (total theoretical read IOPS available therefore being the sum of the individual disks’ read IOPS), whereas the same is not true for write operations as we can see from the following table:

RAID level Backend Write IOPS per incoming write request
0 1
1 2
5 4
6 6
10 2

We can see that RAID 0 offers the best write IOPS performance – a single incoming write request will equate to a single backend write request – but we also know that RAID 0 bears the risk of total array loss in case a single disk fails. RAID 1 and 10, the latter being providers’ typical or most advertised choice, offers a decent tradeoff – 2 backend writes per single incoming write. RAID 5 and RAID 6, with their additional, robust setup, bear the largest penalty.

When calculating the effective IOPS, thus, keep in mind the write penalty individual RAID setups come with.

The effective IOPS performance of your array can be estimated using the following formula:

IOPSeff = n * IOPSdisk / ( R% + W% * FRAID )

with n being the number of disks in the array, R and W being the read and write percentage, and F being the RAID write factor tabled above.

We can also calculate the total IOPS performance needed based on an effective IOPS workload and a given RAID setup:

IOPStotal = ( IOPSeff * R% ) + ( IOPSeff * W% )

So if we need 500 effective IOPS, and expect around 25% read, and 75% write operations in a RAID 10 setup, we’d need:

500 * 0.25 + 500 * 0.75 * 2 =  875 total IOPS

i.e. our array would have to support at least 875 total, theoretical IOPS. How many disks/drives does this equate to? Today’s solid state drives will easily be able to handle that, but what about SATA or SAS based RAID arrays? A typical SAS 10k hard disk drive will give you around 100-140 IOPS. That means we will need 8 SAS 10k drives to achieve our desired IOPS performance.

Conclusion:
All RAID levels except RAID 0 have significant impact on your storage array’s IOPS performance. The decision about which RAID level to use is therefore not only a question about redundancy or data protection, but also about resulting performance for your application’s needs:

  1. Evaluate your application’s performance requirements;
  2. Evaluate your application’s redundacy needs;
  3. Decide which RAID setup to use;
  4. Calculate the resulting IOPS performance necessary;

 

Sources:

Calculate IOPS in a storage array by Scott Lowe, TechRepublic, 2/2010
Getting the hang of IOPS by Symantec, 6/2012

 

 

 

Checking connectivity

There are various tools to measure and check connectivity of your dedicated server and virtual private server. Below we will give an overview over the most common ones, along with their most widespread use.

  1. ping
    ping is probably the most well known tool to check whether a server is up or not. A ping is a small packet of traffic sent from the originating machine to the destination machine which expects a so called echo reply to see whether the destination host is up and running and responding. The typical Linux syntax is:
    ping [-c INTEGER] [-n] [-q] HOSTNAME | IP address
    with -c followed by the number of packets to send, -n for numeric (IP address only – no dns resolution), and -q for quiet output so that only the summary lines will be displayed. The output will display how long it takes for each packet (or the packets on average) to travel back and forth between the source and destination host (round trip time). Large deviations in the min/avg/max values may indicate network congestion, whereas significant packet loss may indicate general network outages or congestion to a point where the network is simply too overloaded to allow anything else through and just drop packets instead. A 100% packet loss may, however, not necessarily indicate that the destination host is dead – it may simply be that the destination server is blocking ICMP ping packets via its firewall.
  2. traceroute
    traceroute is another useful tool that displays the route packets take from the originating host to the destination machine. It also displays round trip times, and can be used to identify potential issues on the way to the machine as well. It is important to understand that firewalls and routers are able to filter and deny these probing packets as well, so a non responding host may not necessarily be down, just as with ping. The typical Linux syntax is
    traceroute [-n] HOST | IP address
  3. mtr
    mtr can be seen as the combination of ping and traceroute – it displays not only the way packets travel down the line from the source to the destination, but also displays min/avg/max round trip statistics and packet loss. mtr is very helpful in determining network congestions or anomalies. The typical Linux syntax is
    mtr [-n] HOST | IP address

When would you typically use these tools:

  • when a host that is normally up can suddenly no longer be reached;
  • when you notice anomalies like slow network, packet loss, etc.;
  • when you want to prove that things are working OK on your end;

 

RAID

RAID is intended to keep your dedicated servers or your virtual private server (VPS) alive and your data redundant in case of single (or more) disk failures – allowing you to replace faulty hardware in the case of disk failure.

Our own opinion is that RAID is always worth the extra cost – it usually saves you from a lot of trouble when things go wrong. There are two main options to decide between when you want a RAID setup, these are software and hardware RAID. In the first case, your main CPU/memory take over the part of ensuring your desired RAID level, in the latter, you have extra (costly) hardware to handle that part of your machine.

Software RAID has advantages such as being cheaper and not subjecting you to vendor lock-in, and – in some cases – even outperforms a hardware RAID with today’s fast CPUs. Nevertheless, hardware RAID offers features a software RAID setup cannot, for example hot swapping disks, or write back caching if you have a BBU.

This post is not about the pros and cons of software vs. hardware RAID, however. Essentially, we want to present the four most common setups for data redundancy and integrity – RAID 1, RAID 5, RAID 6, and Raid 10 – in a concise summary.

RAID 1 is all about disk mirroring. You team up two identical disks, and they form a mirror, all your data is kept twice. You can lose one disk and still go on running your server. Of course, the storage efficiency is rather low – out of 2x2TB you only get 2TB in total.

RAID 5 is another very common setup. It needs at least 3 disks, and in a nutshell, you can lose one disk before things start getting sinister for your server. That gives you moderate storage efficiency – in a 3x2TB setup you get around 4TB in total, in a 4x2TB you get something close to 6TB in total.

RAID 6 could be seen as a further development of RAID 5, in laymen’s terms. Here you need at least 4 disks, and you can afford 2 disks going down before your disk array suffers data loss. The storage efficiency is worse than with RAID 5, but typically better than with RAID 1 since both RAID 5 and RAID 6 allow for more than just 3 or 4 disks to be used.

And finally, RAID 10 is a mix of RAID 0 (stripes over several disks) in combination with RAID 1 (mirroring). This gives the same capacity as RAID 1, as well as the same redundancy level, but requires at least 4 disks to work and is generally more expensive than RAID 5 or 6 compared to their capacity.

In terms of performance, RAID 10 generally outperforms the other RAIDs in terms of write speed. This difference becomes smaller with more disks in the array, but still, in your typical 4 disk setup, RAID 10 is fastest for writes, and RAID 1 is typically faster for writes than RAID 5 or 6 as well. In terms of read performance, RAID 1 lags behind the other options, whereas RAID 5, 6, and 10 can be considered pretty similar and vary depending on the applications and I/O going on.

Overall, if you don’t need much storage and want a cheap redundant solution, choose RAID 1, it offers enough performance for everyday’s applications as well. If you need more redundancy, but do not have write intensive applications, then RAID 5 or RAID 6 are fine. DB-intensive (really intensive in terms of writes) applications should consider RAID 10, however. The increase in write performance is well worth the extra cost, but pay attention to the number of disks in the array – the more disks in a RAID 5 or RAID 6 array, the better write performance becomes.

Traffic and bandwidth, revisited

Today, I read a thread in a feedback forum:

http://www.webhostingtalk.com/archive/index.php/t-1052232.html

There is a lot of talk and fuss about what is legitimate use for these 150TB plans, whether download sites or CDNs are allowed, what constitutes a CDN, and what does not, etc.

The entire thread is one single credo for our own traffic policy – as long as it is legal, use it for whatever you want, we reserve the traffic for you, end of story. Yes, this comes at a price, but there is no smallprint. You won’t be capped to a 10mbps port if you exceed your bandwidth allowance, you are not expected to use your traffic perfectly evenly across the entire month all the time, go have some spikes! This is what we call fair – we do not use freerider tactics to give a small group of large traffic users an advantage the majority of negligible traffic users are paying.

Put it the other way round: Assume an ISP has 100gbps of bandwidth available, and sells off single servers with 100TB usage each. That would give the following equation, roughly:

100 x 1000 x 300GB x => 30000 TB / 100 TB = 300 servers

A typical server with such deals will cost you GBP 100 per month, x 300 means the company is reaping 30,000 GBP per month in turnovers before the bandwidth is being oversold.

With 30,000 per month, they will have to cover their infrastructure costs, all staff costs, and all opportunity costs. Even if the company only had one single employee (the owner), this would never pay. So, how do they do it? Quite simple: overselling fully aware that 99% of users will never even come anywhere close to these magic numbers of 100TB per month or more. And for that final per cent, they will (and do, as we see) apply their T&C smallprint, and make a couple of exceptions for those who shout too loud. In the end, there are two winners: the ISP using such practices, and the shouters. The rest, the majority, pays for these.

Often you will also find terms such as <insert large TB number here> OR 100mbps unmetered. 100mbps unmetered will give you 30TB of traffic per month. Why, then, can you choose between options that are SO much unlike each other? 100, 150 TB per month on a gbps will cost the same as 100mbps unmetered? This simply doesn’t add up.

Also, such contracts typically come with a 6 month nicely reduced fee, but then you will be charged the full monthly price – for something you might not even need. If you know you are never going to use 150TB, why pay for them to cover the losses the ISP makes from the small number of customers who actually do use them? Usually, after the initial contract period, these machines could be obtained considerably cheaper if you only paid for the traffic you actually need instead of having to drag that cost around like a ball and chain around your ankle.

Bottom line: again, be careful, ask questions. These T&C are all legit – not nice maybe, but legit – and you need to wade through the smallprint in order to understand what you can expect from these ISPs, and what you cannot.

Memory consumption

(this post will also appear in our dedicated servers blog)

Nowadays, CPU performance is something that is pretty easy to achieve with comparably little investment. Intel’s L/X34xx, E/L/X56xx, or the latest E3 12xx(L) CPUs as well as AMD’s new Opteron 41xx series will cover most requirements one could come up with for a startup site, even with a considerable number of users online at the same time.

A fact, however, that is often neglected, is memory consumption. Memory is key and crucial to your website performing well and supporting a large number of users online at the same time.

Let us consider a typical LAMP environment, i.e. you are running Linux, Apache, MySQL, and PHP. The operating system will not need much overhead, typically a couple hundred MB should be enough, MySQL will need what you give to it (large buffers, high number of connections, etc. increase memory usage, naturally), and PHP is a double-edged sword in many cases anyway. One of the largest memory hog in moderately busy machines is the webserver itself – most often Apache. Even if carefully tuned, and going to extreme setups with either prefork or worker, you can end up anywhere between 5 to 15 MB per connection. Depending on the number and sort of modules you need, this will vary a lot – lightweight servers will be closer to the lower end, but an Apache having everything enabled is more likely to end up with 15MB of RAM per connection – there are alternatives to Apache, and most control panels using Apache will go for the prefork MPM, and unless you want to recompile and configure on your own, this is basically what you have to live with.

Let’s assume 10MB per connection, an average value. Imagine you want to be able to serve 500 simultaneous connections – with 10MB per connection, you are going to need 5GB of RAM for the webserver alone! If a lot of these connections are database intensive, then you might be coming close to 8GB of RAM usage already: so in order to have some safety margin, you should be going for 12GB or 16GB of RAM in your dedicated server (depending on CPU architecture and memory channels) or virtual private server.

Even on a site that is not really busy, with maybe 50 connections at the same time, you should be aware of overall memory consumption: 50 x 10MB, plus OS overhead, MySQL, and PHP will most likely add up to 2GB of memory as well, so here you should be going for 4GB already to have some growth potential.

What happens if the machine runs out of connections and/or memory, and eventually also out of swap space?

  • your site will respond more slowly;
  • your site may stop processing requests altogether;
  • your site may throw errors;
  • your site might become inconsistent: nothing is worse than an partially processed request that gets through to the database backend at random only;
  • your site may become unreachable;

A dedicated server will be able to handle such problems somewhat better than a virtual private server – due to its architecture and swap space available. The latter often does not have any swap space (there are virtualisation architectures where this is not true, however, such as KVM, for example) and will be hit full swing instead of being able to mitigate the crisis arising from low memory.

You want to avoid such scenarios – for the sake of your business and your customers. Do not save on memory – better have some spare than have none left.