Troubleshooting Path MTU and TCP MSS Problems

Posted by on March 18, 2014

In this article I’ll do a deep dive into some of the not-so-obvious capabilities of Deep Path Analysis, focusing on troubleshooting Path MTU and TCP MSS measurements. These two parameters are important in determining how application streams are split into different packets so that network interfaces across a path can forward them, and end-hosts are able to piece them together, both at the network (MTU) and transport (MSS) layers.

Maximum Transmission Unit and Path MTU

The Maximum Transmission Unit (MTU) of a network interface is the maximum packet size (in bytes) that the interface is able to forward. For example, for Ethernet v2, the MTU is 1500 bytes, but if Jumbo Frames are used the MTU can go up to 9000 bytes. Having a larger MTU has benefits since fewer packets are needed to transfer the same amount of data, and per packet processing is minimized. However, since packets are now bigger, per flow delays between packets are now also higher, so the minimum delay increases which is not good for real-time applications like, for example, voice.

The Path MTU (PMTU) between two end-hosts is the minimum MTU of all the interfaces used to forward packets between them. There’s a standard method called Path MTU Discovery (PMTUD) that is used by end-hosts to determine the PMTU of a connection. Both IPv4 and IPv6 standards impose a lower limit on the path MTU (summarized in the table below). Also, IP standards define a minimum datagram size that all hosts must be prepared to accept, which for IPv4 is 576 bytes. This means that all IPv4 end-hosts in the Internet are required to be able to piece together datagrams up to 576 bytes in length, even though practical values are much higher in reality (at least 1500 bytes).

MediaMTU (bytes)Min. datagram size at end-hosts (bytes)
Internet IPv4 Path MTUAt least 68576
Internet IPv6 Path MTUAt least 12801280
Ethernet v21500

IP Fragmentation and Path MTU

In IPv4, routers in the middle of a path are able to fragment IP datagrams if the DF (Don’t Fragment) flag in the IP header is not set. Packets are split into smaller fragments that need to be reassembled at the receiving host. However, IP fragmentation should be avoided whenever possible because of its drawbacks:

  • IP fragments are often dropped by firewalls, load balancers and stateful inspection devices; this is to prevent DDoS attacks since middle boxes often need to process the entire datagram before they can forward it, requiring state keeping; a recent DNS poison attack surfaced that exploits IP fragmentation.
  • IP fragmentation can cause excessive retransmissions when fragments encounter packet loss as TCP must retransmit all of the fragments in order to recover from the loss of a single fragment.
  • For cases where LAG is used, equal-cost multi-path (ECMP) typically uses fields from the 5-tuple to balance packets between different interfaces. With fragmentation only the first fragment carries 5-tuple information. This would mean subsequent fragments can be routed to different interfaces and arrive before the first fragment, and would be dropped by some security devices.

To avoid fragmentation, end-hosts typically run PMTUD for each destination, or are able to process ICMP “Packet too big” messages (default on Linux), and adjust the maximum TCP segment size (MSS) accordingly. The TCP MSS is advertised by each end of a TCP connection  to signal the maximum TCP segment each end can receive. By default most hosts advertised their TCP MSS as the local MTU minus headers (40 bytes for IPv4 and 60 bytes for IPv6). So a common value for TCP MSS is 1500-40=1460 bytes. However this does not prevent IP fragmentation since there can be paths with PMTU lower than 1500, e.g. GRE tunnels. Another (perhaps more efficient) technique is MSS clamping, where middle boxes actually change the value of MSS in active TCP connections. In any case, there are three main problem cases that can happen:

  • Oversized TCP MSS: Whenever the TCP MSS+headers is greater than the PMTU, ICMP “Packet Too Big” messages will be received and max. datagram size is adjusted. This means that multiple TCP packets can be dropped while the ICMP packet is not received, causing TCP retransmissions and extra delays.
  • Undersized PMTU: For UDP, the client performs IP fragmentation every time it needs to send a datagram larger than the PMTU. A typical case where this can happen is DNS. If PMTU < 576 bytes, initial DNS packets will be lost until the sender receives the ICMP “Packet Too Big”. It’s up to the application to detect this and retransmit. After receiving the ICMP packet, Linux updates its cache of the PMTU result per destination which can be seen with the command “ip show route cache”. The retransmitted datagram is going to suffer IP fragmentation to accommodate the smaller PMTU.
  • PMTUD failure: In IPv6, fragmentation is only done by the end-hosts, which means that end-hosts should be able to run PMTU to a destination and adjust the size of IP packets accordingly. If PMTU is failing under IPv6, which typically occurs when ICMP messages are filtered (PMTU black holes), you are about to enter a world of pain, since your IPv6 traffic will be dropped in the floor without further notice. Therefore, failed PMTU is a big problem in IPv6 connections. In IPv4 a similar problem happens if the Don’t Fragment flag is set (default in Linux).
ProtocolUndersized PMTUOversized TCP MSSUndersized TCP MSSPMTUD failure
IPv4PMTU<576 bytes IP fragmentation for UDP packets; critical in DNS serversMSS > PMTU – 40 bytes Will trigger a Can’t Fragment ICMP from the first TCP packetMSS < 1380 bytes (except DNS) Will generate extra packets increasing per packet processing.Large packets that have DF bit set can potentially always be dropped (pmtu black holes)
IPv6PMTU<1280 bytes Violation of IETF standards; packets will be droppedMSS > PMTU – 60 bytes Will trigger a Packet too Big ICMP from the first TCP packetMSS < 1220 bytes (except DNS) Will generate extra packets increasing per packet processing.Large packets can potentially always be dropped (pmtu black holes)

PMTU and MSS in Path Visualization

Whenever you go to the Path Visualization view in our platform, you can check the PMTU and MSS values that each agent is using when connecting to a target, as shown in Figure 1.

PMTU and MSS in Path Visualization

Figure 1: Properly configured Path MTU + Headers is less than TCP MSS.

In Figure 1, you can see the San Francisco agent is working with an MSS of 1380 bytes, and the result of PMTUD is 1500 bytes. This is the normal scenario where PMTU > MSS+headers. However, we detect several cases where either PMTU, MSS, or the combination of the two might create problems. Let’s look at some of these cases.

Case #1: Oversized TCP MSS

Figure 2 shows a case where the TCP MSS + headers is actually higher than the Path MTU. In the figure, all 3 IPv6 agents are using a combined TCP MSS of 1440 bytes, meaning the minimum between the MSS sent by the server and the MSS of the agent is 1440 bytes. This means that the client can send packets as large as 1500 bytes to the server. But given that the Path MTU is actually 1400 bytes, the client will receive a “Packet too big” ICMP message, and reset its maximum segment size to 1340 bytes. As you can imagine this process is not very efficient and contributes to performance degradation in the long run.

Oversized TCP MSS in Path Visualization

Figure 2: Oversized TCP MSS leads to performance degradation.

Case #2: Failed Path MTU Discovery

Figure 3 shows a case of a ThousandEyes agent in Boston that is not able to perform PMTUD over an IPv4 route. This case is problematic because it means ICMP “Can’t Fragment” messages are being dropped, and for paths with low PMTU, packets are just going to be dropped if the Don’t Fragment bit is set.

Failed Path MTU Discovery in Path Visualization

Figure 3: Path MTU discovery failed, which can lead to packet loss.

Case #3: Pinpointing Links with Low MTU

Figure 4 shows how Path Visualization can be used to actually pinpoint the link that had a drop in the MTU (1400 bytes), and generated an ICMP Packet Too Big message. This information is very useful to detect tunnel entry points, e.g. IP-in-IP tunnels, GRE Tunnels, IPSEC tunnels, etc.

Pinpointing Links with Low MTU in Path Visualization

Figure 4: Link highlighted in blue with undersized MTU of 1400 bytes.

Getting Started with PMTU and MSS Visualization

PMTU and MSS information is essential to troubleshoot enterprise networks, particularly to understand performance issues surrounding VPNs, IPv6 and tunneling. As more networks roll out IPv6 and jumbo-capable Ethernet, PMTU will increasingly become a metric that network operators will want to review. And PMTU path visibility can also help with more mundane problems, such as when links have the wrong MTU set. Start with our free version of ThousandEyes today to gain visibility into PMTU and MSS with Path Visualization.

Processing...