A Very Simple Model for TCP Throughput

Posted by on July 24, 2013

The Transmission Control Protocol (TCP) is probably the most popular communication protocol on the Internet today. TCP is a complex protocol, and understanding throughput behavior is hard, especially when dealing with lossy environments such as wireless networks. In this article, we look at one such model and show that it does accurately capture TCP throughput behavior even under lossy situations, typical of wireless networks. This demonstrates that it is possible to better understand and infer network characteristics such as throughputs that are directly influenced by TCP behavior.

A simple model for TCP throughput

TCP throughput, which is the rate that data is successfully delivered over a TCP connection, is an important metric to measure the quality of a network connection. Generally speaking, TCP throughput is measured as:

Throughput=\frac{WindowSize}{RTT}

In 1997, Matthew Mathis et al., came up with a model for the TCP throughput, which takes into account some link characteristics, such as the Maximum Segment Size (MSS), Round-Trip Time (RTT) and packet loss. Their derivation is taken from the analysis of how TCP behaves under loss during congestion avoidance phase. A TCP sender usually sends a batch of packets before receiving any acknowledgement from the receiver. Of course if the sender sends too many packets too fast they are going to be dropped because of congestion in intermediate queues.

To minimize this, TCP has a flow control mechanism that slows down the sender’s transmission rate if the devices in the path are not able to cope with the offered rate. A congestion window (cwnd) allows the sender to have at most cwnd unacknowledged bytes at any given time, effectively controlling the rate at which data is sent. There are different algorithms to control the behavior of congestion window, the most popular of which are New Reno and CUBIC. In steady-state, cwnd grows linearly one packet per RTT and when there’s a loss event (e.g., three duplicate ACKs) cwnd is set to half its value, as indicated in the figure below, extracted from the original Mathis paper.

TCP window evolution under periodic loss

Assuming the path has a packet loss of probability p, the sender will be able to send an average of 1/p packets before a packet loss. Under this model cwnd never exceeds a maximum W because at approximately 1/p packets a new packet loss makes cwnd drop to half of its value again. Hence, the total data delivered at every cycle of 1/p packets, is given by 1/p=(\frac{w}{2})^2+\frac{1}{2}(\frac{w}{2})^2=\frac{3}{8}W^2 (i.e., the area under sawtooth). Therefore, MSS\times\frac{3}{8}W^2 bytes are sent at every cycle of RTT\times \frac{W}{2}, which solving W=\sqrt{\frac{8}{3p}} we arrive at the following throughput:

T=\frac{MSS\times \frac{3}{8}W^2}{RTT\times \frac{W}{2}} = \frac{MSS/p}{RTT\sqrt{\frac{2}{3p}}}

Collecting all the constant terms in C=\sqrt{\frac{3}{2}}, we get:

 T=\frac{MSS\times C}{RTT \times\sqrt{p}}

This constant C lumps together several terms that are typically constant for a given combination of TCP implementation, ACK strategy (delayed vs non-delayed), and loss mechanism.

A simple validation

The following is an experiment between two hosts to validate the Mathis TCP throughput model. Host A sends traffic for 30 seconds to host B on a link configured to emulate different packet loss probabilities.

We used netem utility to set up the network interface with a constant latency of 100 ms (50 ms in each host) and a loss probability from 1% to 10% (on all packets leaving host A).

Iperf was used to measure the throughput from host A to host B by transmitting data during 30 seconds (-t) and using different congestion window algorithms: CUBIC (default on linux kernel since version 2.6.19) and New Reno (the more traditional scheme).

tcp model throughput vs loss blog post
Figure 2: Comparison of estimated throughput with loss in different models.

The graph above compares the estimated throughput (from the Mathis model) with the actual measured values under different packet loss scenarios. To filter out any outliers each data point is the median of five repeated experiments.

The graph shows that the model greatly approximates to the actual experimental measurements and that TCP throughput is very sensitive to loss. Loss is determinant because loss events regularly cap cwnd, the sender will never send more than W bytes at every RTT, effectively limiting its maximum throughput. So, for an average of 1% of packet loss, the TCP throughput will be restricted to 1,400 kbps (because at approximately every 100 packets a packet is lost and cwnd drops to half).

The shape of the curves also reveal that loss dramatically affects TCP throughput, especially for low values. For instance, the throughput drops to half from 0.1% to 0.4% of packet loss.

The Mathis model is particularly appropriate to estimate the TCP throughput in network paths with regular packet loss, which happens in steady state when there’s a constant rate of cross traffic in the path. If no congestion is observed, cwnd steadily increases until a corrupted packet is detected (e.g., wireless environment) or the receiver advertises an insufficient receive buffer. The model also works in lossy wireless links with a uniform loss probability.

Loss is not a constant characteristic of a network path. Instead, it varies according to different network conditions, such as the physical capacity of the links (and in particular of the bottleneck link) or network congestion, which delimits the available bandwidth. We talk more about network capacity and available bandwidth in a follow up post.

Processing...
  • (I sense a ‘TCP coded’ promo). On that subject and the ‘new’ (old FEC concept that ‘climbed’ up a layer) solutions: http://www.extremetech.com/computing/138424-increasing-wireless-network-speed-by-1000-by-replacing-packets-with-algebra

    • Btw, nice name :) (that’s partially why I landed here in the first place)

  • Srinivas

    Nice analysis, a few things come to mind

    1) RTT is never a constant so a constant 100 ms is never really found in the real world.
    2) Use of iperf does not mimic really network connections or workloads
    3) A single network connection does not mimic actual network conditions, e.g. TCP Fairness for one.
    4) Finally, you rarely see applications unless it is ftp like, setup a single connection and use it for download all the bytes using it.

    A real test of throughput would need multiple connections, varied RTT and losses to get a decent approximation. Netem does not allow that but there is Dummynet and also a project Tmix that will allow you get to replay traces with actual RTTs and losses to approximate this better.

    • Right, RTTs are never strictly constant. Several things can cause RTTs to change and fluctuate, such as routing changes, cross traffic, etc. However, we can say that a link has an average RTT, as well as loss or jitter. These are natural characteristics of the link and making these generalizations is very common and necessary to model real life scenarios.

      The same rationale applies to using iperf with a single TCP connection. Iperf is able to continuously send data over a TCP (or UDP) connection and measure its accumulated throughput. Because of this ability, it can overcome TCP’s slow start and reach congestion avoidance phase in a more reproducible way.

      You are correct in that real network conditions often contain multiple flows, but we wanted to focus this experiment on modeling TCP behavior in a lossy environment, as opposed to one where flows are competing for maximum available capacity. Using multiple network connections would simply congest the pipe to a point where the bottleneck would no longer be caused by the loss, but rather by the physical available bandwidth.

  • I think this is one of thhe most vital information for me.
    And i’m satisfied studying your article. But hould statement on ssome basic things, The site style is wonderful, tthe articles is actually
    great : D. Goood task, cheers

  • i find it helpful to think about the nonlinear sensitivity between these variables. thanks for helping get precise language around it all