[e2e] Are Packet Trains / Packet Bursts a Problem in TCP?

David P. Reed dpreed at reed.com
Wed Sep 27 13:09:56 PDT 2006

Fred - reasona ble response, but I take issue with your statement that 
the goal is "full use ot the bottleneck without overburdening it".   
Little's Lemma would suggest that idea is unreasonable.

In the OS community, we used to have a rule of thumb that operating 
above 50% of the service rate of almost any multiplexed resource is 
problematic, unless you have a very precise and completely predictable 
and constant load.  And Odlyzko has shown that enterprises are happy to 
run at about 10% capacity in return for lowering other, much more 
significant costs (such as jitter, latency, unreliability).

Perhaps the real message is that obsession with the fantasy that such a 
goal is achievable has led to incredibly silly research efforts like the 
I2 hotrod compettions.

Fred Baker wrote:
> On Sep 26, 2006, at 10:51 AM, Detlef Bosau wrote:
>> But what I´m curious about in all (sender) pacing approaches is: 
>> Where does the pacing rate stem from? Typically, it´s obtained from 
>> one of the numerous TCP formulae (Padhye, Mathis...), but these are 
>> first (admittedly rough) estimates and second need some measurements 
>> to have an initial value set.
>> Moreover, widely deployed pacing would practically replace the 
>> currently used AIMD mechanism for achieving a fair share, because the 
>> currently used AIMD probing leads to "microbursts", which means that 
>> the TCP sender _exceeds_ its fair share of rate for short periods of 
>> time. This doesn´t matter as TCP is not rate controlled.  And the 
>> final rate is corrected by ACK clocking.
>> How does this work with sender pacing, i.e. putting a leaky bucket  
>> or flow shaper or something like that in the TCP sender?
> The big problem in pacing is determining the rate available at the 
> bottleneck.
> Let's assume a truly perfect case - the bottleneck in the network has 
> a total capacity of N bps, and currently there are M sessions using 
> it. The M sessions, by magic, are operating in perfect TDM fashion - 
> as the last bit of one message departs, the first bit of the next 
> message becomes ready to send, and no queue ever forms.
> Is there room for one more bit on the wire? Well, if this was a TDM 
> network, no; something would get dropped. But it's not. So we have a 
> new sender that decides to start a TCP session and sends a SYN. What 
> happens is that his packet collides with some other packet, and a 
> queue one packet deep results. The new session gets very little 
> information about rate out of that - there is clearly capacity for at 
> least one SYN per RTT available, but he gets no idea what he is 
> competing with. But the other M sessions get a teensy bit of 
> information - their Acks get delayed a few nanoseconds, and as a 
> result they send their next data a few nanoseconds later. Their 
> average RTT becomes some function of sizeof(SYN)/M longer, and as a 
> result they each slow down if ever so slightly.
> Having no information about all this, the new session probes further, 
> sending one or more data segments. Let's assume that it sends several 
> (if it doesn't do so in the next RTT, it will do so in some subsequent 
> one), and that the receiver replies with two or more Acks. The sender 
> can infer that the capacity at the bottleneck is no smaller than
>            Bits acknowledged only by second Ack
>      -------------------------------------------------
>      Interval between arrivals of first and second Ack
> If its K segments were transmitted by the bottleneck back to back, 
> this would be a measure of the rate of the bottleneck (Keshev). If 
> packets from some other segments were interleaved, this would give the 
> rate at the bottleneck less the capacity used by those interleaving 
> segments. It won't EXCEED the rate at the bottleneck, but in this 
> example it seriously overestimates the capacity unused by other 
> sessions and therefore available to the new session at the bottleneck.
> Let's assume that it picked some rate and paced traffic to that rate. 
> It could do that calculation on every Ack it received, and feed those 
> into the moving average. There are a couple of possible scenarios, the 
> most benign of which include "the other sessions all slow down so that 
> this traffic all goes through at that rate" and "all other sessions 
> cease, so that the bottleneck is ONLY carrying this session's 
> traffic". In both of these relatively benign cases, the above 
> calculation says that the rate available is the same number. From the 
> sender's perspective, is that because that is the rate available at 
> the bottleneck? Or is it because that is the rate at which it sent its 
> probing traffic? Should it probe at a higher rate? If so, at what rate?
> So the new session has to decide to what extent it believes this new 
> pice of data. Maybe it simply jumps to that new rate. I'll leave the 
> analysis to the reader. Or may it includes that new rate estimate in 
> some form of moving average. If it does things just right, it will 
> ramp up its speed at the rate that the sum  of the competing sessions 
> slow down, and all will be cool. Anyone care to lay odds that it would 
> be "just right"?
> Now, change the scenario. The bottleneck is completely unused. The new 
> session comes up, and is trying to infer what capacity is available, 
> following the procedures it followed to get things "just right". Guess 
> what? It asymptotically approaches fully using the link, but takes one 
> heck of a long time to get there.
> The big problem in pacing is determining the rate available at the 
> bottleneck. And by the way, bottlenecks would have to have heap big 
> magic to operate in anything resembling a TDM fashion.
> Operational experience in the ATM world gives us another side of this. 
> If you give an ATM VC a certain rate and engineer that rate through 
> the network, ATM works a lot like TDM works - it uses the rate 
> authorized, and doesn't over-use it. If you screw up the engineering, 
> ATM acts like it is screwed up.
> Hence, I think that what the research, logic, and operational 
> experience tell us is that while pacing can be useful to break down 
> big bursts, in the end pacing doesn't achieve a what the Internet 
> community has been trying to achieve for a long time, which is full 
> use of the bottleneck without overburdening it. We can achieve 
> engineered use, like ATM, or we can from time to time over-burden a 
> bottleneck, but we can't avoid both.
> So I will argue that using pacing for events, like spreading out the 
> effect of a sequence of lost Acks, can be helpful, but in the end  the 
> procedures we use in New Reno etc are superior in the general case. 
> What we can do to improve those procedures might be related to RCP or 
> to some of the work that is being done on other-than-loss-triggered 
> procedures; we'll see. But pacing is not a silver bullet.

More information about the end2end-interest mailing list