You can use the Linux
traceroute command to spot the slow leg of a network packet’s journey and troubleshoot sluggish network connections. We’ll show you how!
How traceroute Works
When you appreciate how
traceroute works, it makes understanding the results much easier. The more complicated the route a network packet has to take to reach its destination, the harder it is to pinpoint where any slowdowns might be occurring.
A small organization’s local area network (LAN) might be relatively simple. It’ll probably have at least one server and a router or two. The complexity increases on a wide area network (WAN) that communicates between different locations or via the internet. Your network packet then encounters (and is forwarded and routed by) a lot of hardware, like routers and gateways.
The headers of metadata on data packets describe its length, where it came from, where it’s going, the protocol it’s using, and so on. The specification of the protocol defines the header. If you can identify the protocol, you can determine the start and end of each field in the header and read the metadata.
traceroute uses the TCP/IP suite of protocols, and sends User Datagram Protocol packets. The header contains the Time to Live (TTL) field, which contains an eight-bit integer value. Despite what the name suggests, it represents a count, not a duration.
A packet travels from its origin to its destination via a router. Each time the packet arrives at a router, it decrements the TTL counter. If the TTL value ever reaches one, the router that receives the packet decrements the value and notices it’s now zero. The packet is then discarded and not forwarded to the next hop of its journey because it has “timed out.”
The router sends an Internet Message Control Protocol (ICMP) Time Exceeded message back to the origin of the packet to let it know the packet timed out. The Time Exceeded message contains the original header and the first 64 bits of the original packet’s data. This is defined on page six of Request for Comments 792.
traceroute sends a packet out, but then sets the TTL value to one, the packet will only get as far as the first router before it’s discarded. It will receive an ICMP time exceeded message from the router, and it can record the time it took for the round trip.
It then repeats the exercise with TTL set to 2, which will fail after two hops.
traceroute increases the TTL to three and tries again. This process repeats until the destination is reached or the maximum number of hops (30, by default) is tested.
Some Routers Don’t Play Nicely
Some routers have bugs. They try to forward packets with a TTL of zero instead of discarding them and raising an ICMP time exceeded message.
According to Cisco, some Internet Service Providers (ISPs) rate-limit the number of ICMP messages their routers relay.
traceroute has a default timeout for replies of five seconds. If it doesn’t receive a response within those five seconds, the attempt is abandoned. This means responses from very slow routers are ignored.
traceroute was already installed on Fedora 31 but has to be installed on Manjaro 18.1 and Ubuntu 18.04. To install
traceroute on Manjaro use the following command:
sudo pacman -Sy traceroute
traceroute on Ubuntu, use the following command:
sudo apt-get install traceroute
As we covered above,
traceroute's purpose is to elicit a response from the router at each hop from your computer to the destination. Some might be tight-lipped and give nothing away, while others will probably spill the beans with no qualms.
As an example, we’ll run a
traceroute to the Blarney Castle website in Ireland, home of the famous Blarney Stone. Legend has it if you kiss the Blarney Stone you’ll be blessed with the “gift of the gab.” Let’s hope the routers we encounter along the way are suitably garrulous.
We type the following command:
The first line gives us the following info:
- The destination and its IP address.
- The number of hops
traceroutewill try before giving up.
- The size of the UDP packets we’re sending.
All of the other lines contain information about one of the hops. Before we dig into the details, though, we can see there are 11 hops between our computer and the Blarney Castle website. Hop 11 also tells us that we reached our destination.
The format of each hop line is as follows:
- The name of the device or, if the device doesn’t identify itself, the IP address.
- The IP address.
- The time it took round trip for each of the three tests. If an asterisk is here, it means there wasn’t a response for that test. If the device doesn’t respond at all, you’ll see three asterisks, and no device name or IP address.
Let’s review what we’ve got below:
- Hop 1: The first port of call (no pun intended) is the DrayTek Vigor Router on the local network. This is how our UDP packets leave the local network and get on the internet.
- Hop 2: This device didn’t respond. Perhaps it was configured never to send ICMP packets. Or, perhaps it did respond but was too slow, so
- Hop 3: A device responded, but we didn’t get its name, only the IP address. Note there’s an asterisk in this line, which means we didn’t get a response to all three requests. This could indicate packet loss.
- Hops 4 and 5: More anonymous hops.
- Hop 6: There’s a lot of text here because a different remote device handled each of our three UDP requests. The (rather long) names and IP addresses for each device were printed. This can happen when you encounter a “richly populated” network on which there’s a lot of hardware to handle high volumes of traffic. This hop is within one of the largest ISPs in the U.K. So, it would be a minor miracle if the same piece of remote hardware handled our three connection requests.
- Hop 7: This is the hop our UDP packets made as they left the ISPs network.
- Hop 8: Again, we get an IP address but not the device name. All three tests returned successfully.
- Hops 9 and 10: Two more anonymous hops.
- Hop 11: We’ve arrived at the Blarney Castle website. The castle is in Cork, Ireland, but, according to IP address geolocation, the website is in London.
So, it was a mixed bag. Some devices played ball, some responded but didn’t tell us their names, and others remained completely anonymous.
However, we did get to the destination, we know it’s 11 hops away, and the round-trip time for the journey was 13.773 and 14.715 milliseconds.
Hiding Device Names
As we’ve seen, sometimes including device names leads to a cluttered display. To make it easier to see the data, you can use the
-n (no mapping) option.
To do this with our example, we type the following:
traceroute -n blarneycastle.ie
This makes it easier to pick out large numbers for round-trip timings that could indicate a bottleneck.
Hop 3 is starting to look a little suspect. Last time, it only responded twice, and this time, it only responded once. In this scenario, it’s out of our control, of course.
However, if you were investigating your corporate network, it would be worth it to dig a little deeper into that node.
Setting the traceroute Timeout Value
Perhaps if we extend the default timeout period (five seconds), we’ll get more responses. To do this, we’ll use the
-w (wait time) option to change it to seven seconds. (Note this is a floating-point number.)
We type the following command:
traceroute -w 7.0 blarneycastle.ie
That didn’t make much of a difference, so the responses are probably timing out. It’s likely the anonymous hops are being purposefully secretive.
Setting the Number of Tests
traceroute sends three UDP packets to each hop. We can use the
-q (number of queries) option to adjust this up or down.
To speed up the
traceroute test, we type the following to reduce the number of UDP probe packets we send to one:
traceroute -q 1 blarneycastle.ie
This sends a single probe to each hop.
Setting the Initial TTL Value
We can set the initial value of TTL to something other than one, and skip some hops. Usually, the TTL values are set to one for the first set of tests, two for the next set of tests, and so on. If we set it to five, the first test will attempt to get to hop five and skip hops one through four.
Because we know the Blarney Castle website is 11 hops from this computer, we type the following to go straight to Hop 11:
traceroute -f 11 blarneycastle.ie
That gives us a nice, condensed report on the state of the connection to the destination.
traceroute is a great tool to investigate network routing, check connection speeds, or identify bottlenecks. Windows also has a
tracert command that functions similarly.
However, you don’t want to bombard unknown devices with torrents of UDP packets, and be wary of including
traceroute in scripts or unattended jobs.
traceroute can place on a network might adversely impact its performance. Unless you’re in a fix-it-now kind of situation, you might want to use it outside of normal business hours.