[e2e] Open the floodgate - back to 1st principles

Mon Apr 26 08:49:15 PDT 2004

David,
  We might be in "violent agreement" on this point.  There are two 
different issues.  (1) That the large buffers are there and (2) how 
Reno/AIMD uses them.
  And, in a way, it all comes down to your phrase "absorb transient 
shocks".  I think that the TCP experts of the early 1990s were exactly 
asking for a delay-bandwidth product worth of buffer per port in order to 
absorb transient shocks, namely the transient shock designed into AIMD/Reno 
by having it continue to grow its cwnd until something breaks.  Thus they 
fully intended to fill that buffer during the peak phase of the Reno/AIME 
sawtooth.  And they hoped that Reno/AIMD could recover as the buffer was 
drained.
  I do not recall the details of this idea as it was developed in the early 
1990s and hope a knowledgeable veteran will step up and correct and/or 
complete this story.
  But I'm pretty sure that, even if this idea did work well in the early 
1990s, it does not still work.  In what now passes as a high-speed 
wide-area path today, it takes many minutes for Reno/AIMD to recover, and 
the conventional delay-bandwidth product does not suffice.
  And, in the meantime, the designed-in behavior of growing cwnd until 
something breaks does have some negative environmental impact.
  Regards,
        -- Guy

--On Sunday, April 25, 2004 23:57:05 -0400 "David P. Reed" 
<dpreed at reed.com> wrote:

> Guy -
>> I am honestly not sure if this rule of thumb is being remembered
>> correctly  or if router designers examine it critically.
>
> I'm not doubting that the extra buffer memory in routers is useful, but
> my point was who decided that it should run filled rather than
> essentially empty?
>
> That's like saying that all capacitors in a circuit should be charged to
> their breakdown voltage, or all highways should be filled with bumper to
> bumper traffic to optimize the morning commute.
>
> The whole point of buffer memory in routers is to absorb transient
> shocks, which can only be done if they are near empty.
>
> At 10 gigabits/second, the United States is about 150 million bits (20
> megabytes) wide.  A fully pipelined path with its bottleneck link being
> 10 gigabits/sec would have no more than about 40 megabytes of buffer
> memory occupied in steady state (2 x the number of bytes in transit),
> divided by the number of routers on the path; of course if you send
> packets that are too large, that increases the amount of buffering needed
> to achieve the bottleneck goodput (why users think that performance
> increases with packet size is an interesting thing).   Any sustained
> "cross traffic" involving a bottleneck link would reduce the memory
> needed for the path, requiring less occupancy to sustain the maximum
> obtainable goodput.  Bursty source traffic would also for the same reason
> reduce the total memory needed to achieve the best achievable goodput.
>
> The point of those big buffers is solely to absorb transients (when a
> link breaks briefly, or a burst of "cross traffic" appears and
> disappears), but if the sources don't slow down quickly, all that happens
> is that you've introduced a sustained backlog clog (essentially a
> sustained traffic jam) that grows monotonically as long as the load is
> maintained, because the outflow can not run any faster than its standard
> rate.
>
> The longer it takes for the source to hear about congestion that it can
> help resolve, the more dramatic a slowdown that source will have to do in
> order to prevent massive discarding of traffic in the network.   So
> building up big piles of traffic in network buffers that is not needed to
> achieve full pipelining only argues for AIMD-like full-brakes-on
> responses.
>
> Worse yet, when the buffers are maintained (or get) full, a higher and
> higher proportion of "congestion signals" get returned to senders that
> are near the end of their connections, and thus cannot have much of an
> impact on reducing the congestion.   If all I have left to retransmit is
> a few bytes, I can cause very little of the built-up load to go away -
> the connections that sustain the assumed high load will be among the
> newer ones, that are just getting started.   They won't have sent much,
> so there is no reason for them to see any problems that will encourage
> them to start holding back.
>
> The whole point of RED and ECN (which are *brilliant*, elegant concepts)
> are to provide *early* signals that traffic is accumulating in those
> buffers, long before they are full.
>
> And it's also clear from control theory (and has been demonstrated) that
> when the drops (or ECN bits) are used to signal congestion, the right
> packets to drop (or set the ECN bits on) are the ones at the *head* of
> the congested outgoing router queue, not the tail.   That provides a much
> quicker signal of congestion, which leads to a quicker response, and
> smoother control of the application level.   This may seem
> "counterintuitive" but the bug is in the intuition - arising from the
> fact that humans tend to have difficulty thinking at the systems level.
>