[e2e] Why do we need TCP flow control (rwnd)?

Tue Jul 1 12:36:06 PDT 2008

I think of this kind of stuff as *real* network research, Fred.   Good 
stuff.

Fred Baker wrote:
> On Jul 1, 2008, at 9:03 AM, David P. Reed wrote:
>> From studying actual congestive collapses, one can figure out how to 
>> prevent them.
>
> OK, glad to hear it. I apologize for the form of the data I will 
> offer; it is of the rawest type. But you may find it illuminating. 
> Before I start, please understand that the network I am about to 
> discuss had a very serious problem at one time and has fixed it since. 
> So while the charts are a great example of a bad issue, this 
> discussion should not reflect negatively on the present network in 
> question.
>
> The scenario is a network in Africa that connected at the time to the 
> great wide world via VSAT. It is a university, and at the time had 
> O(20K) students behind a pair of links from two companies, one at 512 
> KBPS and one at 1 MBPS. I was there in 2004 and had a file (my annual 
> performance review) that I needed to upload. The process took all day 
> and even at the end of it failed to complete. I wondered why, whipped 
> out that great little bit of Shareware named PingPlotter (which I 
> found very useful back when I ran on a Windows system) and took this 
> picture:
>
>     ftp://ftpeng.cisco.com/fred/collapse/ams3-dmzbb-gw1.cisco.com.gif
>
> The second column from the left is the loss rate; as you can see, it 
> was between 30 and 40%. The little red lines across the bottom are 
> indicative of individual losses, and indicate that they were ongoing.
>
> Based on this data, I convinced the school to increase its bandwidth 
> by an order of magnitude. It had the same two links, but now at 5 and 
> 10 MBPS. Six months later I repeated the experiment but from the other 
> end, this time not using PingPlotter because I had changed computers 
> and PingPlotter doesn't run on my Mac (wah!). The difference between 
> the raw file and the "edited" file is 22 data points that were clear 
> outliers. In this, I ran simultaneous pings to the system's two 
> addresses, and in so doing measured the ping RTT on both the 5 and 10 
> MBPS path. You will see very clearly that the 10 MBPS path was not 
> overloaded and didn't experience significant queuing delays (although 
> the satellite delay is pretty obvious), while the 5 MBPS path was 
> heavily loaded throughout the day and many samples were in the 2000 ms 
> ballpark.
>
>     ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-edited.pdf
>     ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005.pdf
>
> The delay distribution, for all that high delay on the 5 MBPS path, is 
> surprisingly similar to what one finds on any other link. Visually, it 
> could be confused with a Poisson distribution.
>
>     
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-delay-distribution.pdf 
>
>
> Looking at it in log-linear, however, the difference between the two 
> links becomes pretty obvious. The overprovisioned link looks pretty 
> normal, but the saturated link has a clear bimodal behavior. When it's 
> not all that busy, delays are nominal, but it has a high density 
> around 2000 ms RTT and a scattering in between. When it is saturated - 
> which it is much of the day - TCP is driving to the cliff, and the 
> link's timing reflects the fact.
>
>     
> ftp://ftpeng.cisco.com/fred/collapse/Makerere-April-4-2005-log-linear-delay-distribution.pdf 
>
>
> A sample space of one is an example, not a study - data, not 
> information. But I think the example, coupled with our knowledge of 
> queuing theory and general experience, supports four comments:
>
> (1) there ain't nothin' quite like having enough bandwidth. If the 
> offered load vastly exceeds capacity, nobody gets anything done. This 
> is the classic congestive collapse scenario as predicted in rfcs 896 
> and 970. A  review of Nagle's game theory discussion in 970 is 
> illuminating.
>
> (2) there ain't nothin' quite like having enough bandwidth. In a 
> statistical network, if the offered load approximates capacity, delay 
> is maximized, and loss (which is the extreme case of delay) erodes the 
> network's effectiveness.
>
> (3) TCP's congestion control algorithms seek to maximize throughput, 
> but will work with whatever capacity they find available. If a link is 
> in a congestive collapse scenario, increasing capacity by an order of 
> magnitude results in TCP being released to increase its windows and, 
> through the "fast retransmit" heuristic, recover from occasional 
> losses in stride. It will do so, and the result will be to use the 
> available capacity regardless of what it is.
>
> (4) congestion control algorithms that tune to the cliff obtain no 
> better throughput than algorithms that tune to the knee. That is by 
> definition: both the knee and the cliff maximize throughput, but the 
> cliff also maximizes queue depth at the bottleneck. Hence, algorithms 
> that tune to the knee are no worse for the individual end system, but 
> better for the network and the aggregate of its users. The difference 
> between a link that is overprovisioned and one on which offered load 
> approximates capacity is that on one TCP moves data freely while on 
> the other TCP works around the fragility in the network to provide 
> adequate service in the face of performance issues.
>