[e2e] congestion collapse definition

David P. Reed dpreed at reed.com
Tue Sep 8 15:05:05 PDT 2009


Folks, I tend not to use the term "congestion collapse", though it is in 
common use in the Internet community.

The phenomenon I've been experiencing on the other thread about AT&T 3G 
data access network configuration on this list, if I'm correct (as I'm 
pretty sure I am) should probably be called "congestion collapse", or 
else we need a new term.

The phenomenon observed in Comcast's debacle with DOCSIS upstream 
buffering should be called by the same term - again, buffering is 
allowed to build on a shared queue carrying diverse traffic, without 
providing any feedback that can be recognized by TCP's rate control 
loop, leading to positive feedback and uncontrolled delay.

If I look at Wikipedia, for example, at the definition of congestion 
collapse there, it says that CC is characterized by large buffering 
delays AND lost packets.  However, in the Comcast and ATT cases here, 
the queues get so obnoxiously long (5-10 seconds) that users presumably 
give up running apps long before packet *loss* sets in due to overflow.

This appears to be because all the TCP stacks are doing their job: new 
connections slow-start, then AI accelerates at a rate that is gradual 
enough (and over short-enough connections) that the huge buffers can 
stabilize at the point where human pain is the congestion control algorithm.

Human pain was the load control algorithm in early overloaded 
TimeSharingSystems.  On the original Multics system, people realized 
that in the middle of the day it was *foolish* to start a program that 
ran more than one second, because priority given to line editors over 
compute jobs meant that compute jobs would NEVER complete (unless one 
did an obscure thing called "quit-starting" the program to interact once 
a second by stopping and restarting the compile - some hackers rigged up 
terminals to automatically send interrupt/restart commands once per 
second to get their work done, but the rest of us coders worked mostly 
between 11 pm and 6 am).

Of course another part of fixing ATT's problem is to fix the *upstream* 
capacity of the network.  The bottleneck wouldn't occur if the output 
queue of the bottleneck router could drain as fast as users can generate 
demand.


Back to my question: should this phenomenon be included in "congestion 
collapse" (I believe so), or should we invent a new more specific name 
(Buffer Madness?).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.postel.org/pipermail/end2end-interest/attachments/20090908/c592ad67/attachment.html


More information about the end2end-interest mailing list