Latency and Throughput as key components of network performance

We have recently added another transit feed to our New York PoP, with a declared aim to bring down latency between London and New York to sub 70ms. We are more than happy to be able to state that current latencies between London Telehouse and New York are now around 67ms. An update to our latency overview has been posted here as well: https://worralorrasurfa.castlegem.co.uk/whmcs/knowledgebase.php?action=displayarticle&id=43.

With that, we want to explain the essentials of latency and throughput a bit.

Network latency in general states how long it takes for a packet to travel the distance between its source and destination. Network throughput, however, defines how much of your data you can send in one go (per time unit). Latency and througput are usually not directly related, unless we are in a situation where a link becomes saturated (upon which throughput will decrease, and latencies will most likely increase), and different applications or purposes require varying degrees of quality in terms of latency and throughput.

For example, if you want to manage a Linux server via ssh from home, you would like to see small latencies: you want to see what you type right away and not have to wait for ages for the characters to appear on your screen on the shell. Latency here is key, but throughput is not that important: ssh does not need enormous amounts of bandwidth. Now, video streaming is something different. If you want to watch youtube videos, you want the videos to come down your internet connection as smooth as if you were watching TV at home. In this case you need decent throughput, i.e. a lot of data per time unit, but latency is not that much of an issue here: it wont matter much if your video starts after 1 or 2 seconds, just as long as it is smooth.

Currently, we see emphasis on small latencies increasing. While this has always been a big concern for us due to the nature of our clients (a real lot of them are traders who require superb latencies to the exchanges), throughput used to be the decisive parameter for internet connections. Part of this shift in emphasis, we believe, is caused by the fact that nowadays most typical internet applications live very well with bandwidths available.

How can we measure latency and throughput? For latencies, ping, traceroute, and mtr are excellent friends. We wrote about these in a previous post, but let’s go into some examples:

ping

ping, put simply, checks the connectivity between source and destination:

# ping HOSTNAME
PING HOSTNAME (IP) 56(84) bytes of data.
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=1 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=2 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=3 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=4 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=5 ttl=60 time=66.8 ms
64 bytes from gw-castlegem.init7.net (IP): icmp_seq=6 ttl=60 time=66.8 ms

We can see that the latency between our host (a London Telehouse server) and the destination (one of our routers in New York) is pretty much 66.8ms. ping takes different arguments such as the size of the packets, or the number of packets to be sent, etc. The manpage (man ping) will give you details.

Traceroute

traceroute will not only check the latency between the source and destination, but will also show latencies (and thus possible issues) on the way there:

# traceroute HOSTNAME
traceroute to HOSTNAME(IP), 30 hops max, 60 byte packets
 1  ... (...)  0.419 ms  0.463 ms  0.539 ms
 2  40ge1-3.core1.lon2.he.net (195.66.224.21)  10.705 ms  10.706 ms  10.422 ms
 3  100ge1-1.core1.nyc4.he.net (72.52.92.166)  67.176 ms  67.189 ms  67.174 ms
 4  10ge9-7.core1.sjc2.he.net (184.105.213.197)  141.010 ms  140.897 ms  140.928 ms
 5  10ge1-2.core1.fmt2.he.net (72.52.92.73)  136.597 ms  136.746 ms  136.885 ms
 6  ....castlegem.co.uk (IP)  136.855 ms  136.437 ms  136.635 ms

As we can see, we get rather stable latencies throughout all the way from London to California. Large variations in the latencies on the way are not necessarily an indication for issues yet, though, as long as the destination latencies are still smooth and regular. Possible reasons for deviations on the way to your destination could be routers rate limiting their replies or, in worse case, routers or networks indeed being congested (we will get to measuring throughput shortly).

MTR

mtr can in a way be considered the combination of ping and traceroute. It displays the network path packets travel, and it keeps doing that by sending packet after packet.

HOSTNAME (0.0.0.0)                                                                   Fri Mar  7 09:51:28 2014
Keys:  Help   Display mode   Restart statistics   Order of fields   quit
                                                                                         Packets               Pings
 Host                                                                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
 1. IP                                                                                  0.0%    28    0.3   1.9   0.3  41.4   7.7
 2. vl365-globalcrossing-peer.jump.net.uk                                               0.0%    28    0.3   5.8   0.3  64.3  16.6
 3. po7-20G.ar4.CHI2.gblx.net                                                           0.0%    28  259.2 114.3  89.3 259.2  57.0
 4. DESTINATION                                                                         0.0%    28   91.8  91.9  91.6  94.5   0.6

We can see that hop #3 has a large standard deviation, but latency to the destination is very consistent. In our case, this is from London to Chicago. Hop #3 simply seems to rate limit these probing packets, hence has a larger latency, or/and is busy doing other things than talking to us. It would not be uncommon to see packet loss on the routers either – this is fine and also due to rate limiting mechanisms – just as long as the destination latency is still consistent, i.e. no packet loss, and no extreme deviations.

That is all good – but how do we check throughput? There are several makeshift means to measure throughput, they range from timing browser requests on the command line (such as time lynx -source http://www.google.com/ > /dev/null) to using ftp with hashmarks on and the more common wget http://HOST/testfile. These will all give you a cursory glimpse into how fast you can download data from a destination to computer. There is, however, a very nice tool called iperf that does this job in a very professional manner.

iperf

iperf can measure throughput between two network locations, and it can give you a good idea of bottlenecks when used in combination with traceroute or mtr. The drawback of iperf is that you not only need a client, but also a server to connect to. iperf is thus primarily indeed more of a professional tool, i.e. something set up between providers or commercial clients and their providers to sort out potential issues, define SLAs, etc.

There is an excellent introductory article on iperf from 2007, which we are happy to link to here: http://www.enterprisenetworkingplanet.com/netos/article.php/3657236/Measure-Network-Performance-with-iperf.htm.

Example output, both from the server and client side, can be seen below:

# ./iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local IPx port 5001 connected with IPy port 59508
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.1 sec   566 MBytes   472 Mbits/sec
# ./iperf -c HOSTNAME -t 10
------------------------------------------------------------
Client connecting to HOSTNAME, TCP port 5001
TCP window size: 23.2 KByte (default)
------------------------------------------------------------
[  3] local IPy port 59508 connected with IPx port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec   566 MBytes   474 Mbits/sec

Here we conclude our brief overview and hope that some of you will find it useful indeed!

iftop – or where’s my server’s bandwidth going?!

During the past weeks we gave a small introduction to UNIX and Linux commands that may be nice to have at hand when it comes to administrating a server from the command shell, making some quick changes, or generally assisting a sysadmin with her every day tasks.

Today we want to have a look at iftop – a small program that allows you to check what your dedicated or virtual private server is doing in terms of internet traffic: where packets go to, and where they come from.

This is useful when you want to investigate some process or virtual machine hogging bandwidth on a server, or when you see unsual traffic patterns from your monitoring systems.

The syntax as such is very simple, for a start it should be sufficient to run

# /usr/sbin/iftop -i eth1 -p -P

from the shell (you will typically need root privileges). The -i switch lets you specify which interface to listen on, -p runs iftop in promiscuous mode (necessary for some virtualisation architectures), and -P shows portnumbers/services in addition to hosts.

On a standard CentOS install, iftop needs extra repositories to be installed (or to be compiled from source), and you will need (n)curses and libpcap packages installed as well.

 

Additional and in-depth information can be found here:
http://www.ex-parrot.com/pdw/iftop/ (author, source code)
http://www.cyberciti.biz/faq/centos-fedora-redhat-install-iftop-bandwidth-monitoring-tool/ (overview, examples)
http://sickbits.net/iftop-finding-traffic-hogs/ (overview, examples)