How to Troubleshoot High Latency with Traceroute Command
High latency in the network can appear due to different reasons, for example congestion in the network, device faults, etc.This is usually measured and referred to as the Round-Trip-Time (RTT) or also Round-Trip-Delay (RTD), which is defined as the time it takes for an Echo message (ping) to reach the destination plus the time for its reply to arrive back to the source.
Two main tools are usually used to measure this network delay: ping and traceroute. While ping returns as output whether a destination is reachable along with statistics about packet loss and RTT, traceroute shows all the hops in the routing path from source to destination along with RTT results for each hop. When troubleshooting high latency issues, it is important to keep in mind the geographical distance between the hops, since longer distances mean longer RTT. As soon as you identify the problematic hops, you can try to ping between them directly and check in the routers in order to find the problem.
For instance, a traceroute can show where in the network (which hop in the routing path) the delay exists or starts. Such an example is shown in the traceroute output below:
This output shows that there is probably delay in hop 2, since the RTT time is significantly high between hop 1 and hop 2. However, it looks like between hop 2 and 3 the time difference is relatively small, so that should give a hint that problem exists only between 10.10.0.2 and 10.11.0.2.
One way to continue troubleshooting this issue, is to access the router in hop 1 (with address 10.10.0.2) and try to ping towards router in hop 2 (to address 10.11.0.2) in order to identify if there is high latency in the link between them, for example due too congestion (in such cases, packet loss might appear as well). Also it is usually good to double-check the status and configuration of the router interfaces, and the router logs.
However, if no problem is found in that link, then it could be the tricky case of asymmetric routing, where the routing path from source to destination is different from the return path (from destination back to source). That could mean that the delay shown towards the destination, is because the geographical distance in the return path is very long (longer than the forward path). Otherwise, if the geographical distance is not as long, then maybe there are some network delay/performance issues in the return path, e.g. congestion or faults.
So in the case of suspicion of asymmetric routing, a traceroute in the other direction is needed (from the destination IP address to the source IP address). In that traceroute, the return path is shown along with the RTT per hop, and then you can confirm if there is indeed asymmetric routing, and what seems to be the problem (long geographical distance, or network issues).
This kind of issues are quite common in the BGP routing world of the Internet, where various operators and carriers are involved in a path, from any part of the world, with different routing settings. Note that asymmetric routing is not a “fault” in itself, but it just makes it more difficult to detect the actual fault in the network.
The example below, shows an asymmetric routing case with high latency issues due to the return path:
A traceroute in the forward direction (from 10.20.0.2 to 10.50.0.2) will have output:
In this output, it seems that there is delay at hop 3. You can start troubleshooting, as mentioned above, by pinging across the link between the affected hops to check if the problem exists there. If there seems to be no obvious network issues, then the possibility of asymmetric routing needs to be considered.
In this case, a traceroute in the opposite direction is necessary in order to check if there is indeed asymmetric routing. In this example, a traceroute back would show that indeed the router in hop 3 has different routing to the source router (it does not have as next hop the previous router in hop 2, but instead another router/network). Then the problem can be either that the geographical distance is long enough to cause high latency or otherwise the distance is OK, but there might be network problems in that path (then you will need to check again the affected hops for network problems).
Note: Any router(s) in the path forward can have different routing towards the source that causes asymmetric routing. Then the delay would appear or start somewhere in the middle of the path.
To summarize, different factors can cause delay and it is always good to check for network performance issues in the actual links and/or keep in mind the possibility of asymmetric routing.