[e2e] Are Packet Trains / Packet Bursts a Problem in TCP?

Thu Sep 28 08:43:43 PDT 2006

That is of course mostly true. I'll quibble on Little's result: it  
says (in the immortal words of the wikipedia) "The average number of  
customers in a stable system (over some time interval) is equal to  
their average arrival rate, multiplied by their average time in the  
system." What you're referring to in this context is the set of  
formulae related to delay, variance, etc, in non-deterministic  
statistical systems, which are each inversely proportional to some  
variation on (one minus the utilization), which approaches a limit of  
zero. But your observation regarding those equations tending to  
infinity at full utilization is correct.

The point is that while a network operator values headroom in the  
average case, as you aptly note, the network user values response  
time. Now, if he says he wants to look at a ten megabyte file (think  
YouTube) and values response time, his proxy (TCP) needs to maximize  
throughput in order to minimize elapsed time. Hence, TCP values  
throughput, and it does so because its user values response time.  
Take a good look at the various congestion management algorithms used  
in TCPs over the years, and what they have all tried to do is  
maximize throughput, detect the point where utilization at the  
bottleneck approaches 100% (whether by measuring loss resulting from  
going over the top or by measuring the increase in delay that you  
point us to), and then back off a little bit. TCP seeks to maximize  
throughput.

Something else that is very important in this context is the matter  
of time scale. When a network operator measures capacity used, he  
most commonly points to an MRTG trace. MRTG samples SNMP counters  
every 300 seconds and plots the deltas between their values. You no  
doubt recall T-shirts from a decade ago or more that said something  
about "same day service in a nanosecond world". MRTG traces, is as  
useful as they are in monitoring trends, are not very useful in  
telling us about individual TCPs. The comparison is a little like  
putting a bump counter in Times Square, reading it at a random time  
of day once a month, and making deep remarks about the behavior of  
traffic at rush hour. They don't give you that data.

I'd encourage, as an alternative, http://www.ieee-infocom.org/2004/ 
Papers/37_4.PDF. This looks at real traffic in the Sprint network a  
few years ago and analyzes traffic behavior. It makes three important  
observations:
  - with 90% confidence, POP-POP variation in delay within the  
network is less
    than one ms.
  - also with 90% confidence, POP-POP delay variation spikes to ~10  
ms frequently.
  - on occasion (six times in the study), POP-POP delay varies by as  
much as 100
    ms, and in so doing follows a pattern whose characteristics are  
consistent
    with the intersection of high rate data streams, not measurement  
error or
    router misbehavior as some have suggested.

Note that these are POP-POP, not CPE-CPE; the study says nothing  
about access networks or access links. It talks about the part of the  
network that Sprint engineered.

They don't show what the 10 ms spikes look like, but I will  
conjecture that they are smaller versions of the six samples they did  
display, and similarly suggest the momentary intersection of high  
rate data streams.

What this tells me, coupled with my knowledge of routers and various  
applications, is that even within a large and well engineered ISP  
network (Sprint's is among the best in the world, IMHO) there are  
times when total throughput on a link hits 100% and delay ramps up.  
If they are running delay-sensitive or loss-sensitive services, it  
will be wise on their part to put in simple queuing mechanisms such  
as those suggested in draft-ietf-tsvwg-diffserv-class-aggr-00.txt to  
stabilize the service during such events. The overall effect will be  
negligible, but it will materially help in maintaining the stability  
of routing and of real time services. SImply adding bandwidth helps  
immeasurably, but measurement tells me that it is not a final solution.

On Sep 27, 2006, at 5:45 PM, David P. Reed wrote:

> The point of Little's Lemma is that the tradeoff for using the full  
> bottleneck bandwidth is asymptotically infinite delay, delay  
> variance, and other statistics.
>
> If you value utilization but not response time, of course you can  
> fill the pipes.
> But end users value response time almost always much higher than   
> twice the throughput.
>
> And of course you can give delay-free service to a small percentage  
> of traffic by prioritization, but all that does is make the  
> asymptotic growth of delay and delay variance for the rest of the  
> traffic even worse.
>
>
> Fred Baker wrote:
>>
>> On Sep 27, 2006, at 3:34 PM, Detlef Bosau wrote:
>>
>>> Wouldn´t this suggest (and I had a short glance at Fred´s answer  
>>> and perhaps he might contradict here) that we intendedly drop the  
>>> goal of achieving a full load in favour of a "load dependend ECN"  
>>> mechanism, i.e. when the load of a link exceeds a certain limit,  
>>> say 50 % of the bandwidth,  any passing packets are stamped with  
>>> a forward congestion notification. Thus, we would keep the  
>>> throughput on a limit we cannot exceed anyway, but limit the  
>>> incomming traffic that way that queues can fullfill their  
>>> purpose, i.e. interleave the flows and buffer out asynchronous  
>>> traffic.
>>
>> I certainly encouraged Sally et al to publish RFC 3168, and yes I  
>> would agree that something other than a loss-triggered approach  
>> has real value, especially at STM-n speeds where the difference  
>> between "nominal delay due to queuing" and "loss" is pretty sharp.  
>> I don't think I would pick "50%"; it would be at a higher rate.
>>
>> But that actually says about the same thing. Change the mechanism  
>> for detecting when the application isn't going to get a whole lot  
>> more bandwidth even if it jumps the window up by an order of  
>> magnitude, but allow it to maximize throughput and minimize loss  
>> in a way that s responsive to signals from the network.
>>