How to Troubleshoot High Latency with Traceroute Command

High latency in the network can appear due to different reasons, for example congestion in the network, device faults, etc.This is usually measured and referred to as the Round-Trip-Time (RTT) or also Round-Trip-Delay (RTD), which is defined as the time it takes for an Echo message (ping) to reach the destination plus the time for its reply to arrive back to the source.

Two main tools are usually used to measure this network delay: ping and traceroute. While ping returns as output whether a destination is reachable along with statistics about packet loss and RTT, traceroute shows all the hops in the routing path from source to destination along with RTT results for each hop. When troubleshooting high latency issues, it is important to keep in mind the geographical distance between the hops, since longer distances mean longer RTT. As soon as you identify the problematic hops, you can try to ping between them directly and check in the routers in order to find the problem.

For instance, a traceroute can show where in the network (which hop in the routing path) the delay exists or starts. Such an example is shown in the traceroute output below:

traceroute shows high latency

This output shows that there is probably delay in hop 2, since the RTT time is significantly high between hop 1 and hop 2. However, it looks like between hop 2 and 3 the time difference is relatively small, so that should give a hint that problem exists only between 10.10.0.2 and 10.11.0.2.

One way to continue troubleshooting this issue, is to access the router in hop 1 (with address 10.10.0.2) and try to ping towards router in hop 2 (to address 10.11.0.2) in order to identify if there is high latency in the link between them, for example due too congestion (in such cases, packet loss might appear as well). Also it is usually good to double-check the status and configuration of the router interfaces, and the router logs.

However, if no problem is found in that link, then it could be the tricky case of asymmetric routing, where the routing path from source to destination is different from the return path (from destination back to source). That could mean that the delay shown towards the destination, is because the geographical distance in the return path is very long (longer than the forward path). Otherwise, if the geographical distance is not as long, then maybe there are some network delay/performance issues in the return path, e.g. congestion or faults.

So in the case of suspicion of asymmetric routing, a traceroute in the other direction is needed (from the destination IP address to the source IP address). In that traceroute, the return path is shown along with the RTT per hop, and then you can confirm if there is indeed asymmetric routing, and what seems to be the problem (long geographical distance, or network issues).

This kind of issues are quite common in the BGP routing world of the Internet, where various operators and carriers are involved in a path, from any part of the world, with different routing settings. Note that asymmetric routing is not a “fault” in itself, but it just makes it more difficult to detect the actual fault in the network.

The example below, shows an asymmetric routing case with high latency issues due to the return path:

traceroute-asymmetric routing delay

A traceroute in the forward direction (from 10.20.0.2 to 10.50.0.2) will have output:

traceroute shows delay

In this output, it seems that there is delay at hop 3. You can start troubleshooting, as mentioned above, by pinging across the link between the affected hops to check if the problem exists there. If there seems to be no obvious network issues, then the possibility of asymmetric routing needs to be considered.

In this case, a traceroute in the opposite direction is necessary in order to check if there is indeed asymmetric routing. In this example, a traceroute back would show that indeed the router in hop 3 has different routing to the source router (it does not have as next hop the previous router in hop 2, but instead another router/network). Then the problem can be either that the geographical distance is long enough to cause high latency or otherwise the distance is OK, but there might be network problems in that path (then you will need to check again the affected hops for network problems).

Note: Any router(s) in the path forward can have different routing towards the source that causes asymmetric routing. Then the delay would appear or start somewhere in the middle of the path.

To summarize, different factors can cause delay and it is always good to check for network performance issues in the actual links and/or keep in mind the possibility of asymmetric routing.

Related articles

Advertisements

About TelcoNotes

IP & VoIP networking

Posted on March 7, 2014, in IP Routing and tagged , , , , , , . Bookmark the permalink. 2 Comments.

  1. Hi – great article – I am having trouble with a network. Speed tests give consistently high results, but websites continually time out (even high profile sites like BBC, Google etc). I tried doing a traceroute as you suggest here, but it while doesn’t list any high results, it takes ages to move between the first 5 hops. For a few hops it just lists asterisks like this: * * * – what would this mean?

  2. Hi John,

    Those asterisk is an indication that those devices are configured to filter/not respond traceroute packets. But traffic is allowed to pass. It’s just that they are not showing you their actual IP ADDRESSES. This is mostly done for security.

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: