Visualizing Traffic over ECMP and LAG Paths

Posted by on October 14, 2015

We focus a lot on external routing topics, such as BGP. To switch things up, today we’ll spend some time looking at internal forwarding decisions and how they shape traffic paths. Equal-Cost Multi-Path (ECMP) is an internal routing strategy where packets are distributed over multiple paths, each with the same “cost.” It is commonly used at Layer 3, for IP networks, though can also be used at other layers, such as Layer 2.

ECMP: Two Paths Diverge on a Network

You can think of ECMP a bit like the famous Robert Frost line: “Two roads diverged in a wood, and I took the one less traveled by.” In essence, multiple links diverge on a network and, sometimes, you want packets to also take the less traveled path. Assuming, of course, that each of the paths gets you to your destination and takes a similar number of hops. Voilà, ECMP.

ECMP is meant to maintain an equal share of traffic on each path and minimize out-of-order delivery or Path MTU issues caused by different paths. To achieve this, ECMP, by default, uses the IP 5-tuple (source IP, destination IP, protocol number, source port, destination port) to define a sequence of packets that will retain the same path. The network device then hashes the 5-tuple to choose the next-hop address and send the sequence of packets on their way.

You’ll find ECMP most often in carrier core networks as well as in data center leaf-spine network segments. Understanding its implications for traffic flows is important as you look at services flowing across the network.

Link Aggregation and Multi-Path Protocols

There are other alternatives to ECMP; it’s not the only multipath protocol around. Link Aggregation Groups (LAG) combine parallel connections for greater bandwidth and redundancy. LAG has a variety of vendor-neutral as well as proprietary schemes (e.g., EtherChannel, Aggregated Ethernet). These implementation often perform stateful load balancing using the same 5-tuple data as ECMP. You’ll find LAG implemented in many of the same places as ECMP, such as carrier core networks and data center edges. Tradeoffs between ECMP and LAG include speed to failure detection, scalability, supported network protocols and ease of use. Newer options include Multipath TCP (MPTCP) which performs load balancing at the transport layer, enabling multiple subflows that take unique paths for the same traffic flow.

Troubleshooting Traffic over ECMP and LAG

Typically, understanding path tracing information that traverses ECMP or LAG is a pain. A traceroute might be hard to decipher or not even take equal cost paths. Understanding which paths are connected between interfaces can be challenging in a complex network topology. And often traceroutes don’t even complete, as happened in the example here. Figure 1 shows a traceroute to Yahoo from my laptop on the Cogent network.

Figure 1: Traceroute to Yahoo, with multiple nodes per hop, indicating the use of ECMP / LAG.

So how can we make sense of this sort of output?

In ThousandEyes, we collect data on ECMP and LAG-based paths by actively probing the network with multiple rounds of packets to map out the different paths. For our Network tests, this is typically three packet streams per agent; monitoring from more agents increases the probability that you will see all of the available paths. We adjust the 5-tuple header on the packet streams to create different sequences that may travel along discrete paths where ECMP is in place.

We then visualize multiple paths within our Path Visualization; this helps network engineers more easily make sense of a complex topology as well as basing decisions on much more complete topologies. I’ll demonstrate the difference between an ICMP-based path trace and a TCP-based path trace. Figures 2 and 3 compare the same route between our Las Vegas, NV agent and the Yahoo Quincy, WA data center.

Figure 2: ICMP-based Path Visualization shows just one ECMP path within the Yahoo transit network.
Figure 3: TCP-based Path Visualization shows many ECMP paths with Cogent (left), the Yahoo transit network (center)
and the Yahoo Quincy, WA data center (green nodes).

As you can see, ECMP can be found in many network segments, including carrier core (Cogent’s Los Angeles POP), Yahoo’s transit network (100GigE Juniper routers) and Yahoo’s data center (leaf and spine).

Visualizing Your Networks

Having difficulty troubleshooting network paths that traverse ECMP or LAG? Check out Network tests and Path Visualization. And these do more than just ECMP. You’ll also see VoIP QoS, MPLS links, Path MTU and TCP MSS information as well as loss and latency metrics. You can check out ECMP paths in your network or your carriers’ networks by signing up for a free ThousandEyes Lite account.