[e2e] Are Packet Trains / Packet Bursts a Problem in TCP?
jheffner at psc.edu
Sat Sep 30 10:10:18 PDT 2006
rick jones wrote:
>> It's obvious that doing hardware TSO for say an 8 K data chunk, i.e. 5
>> segments, provides great savings in terms of CPU cycles and I/O bus
>> utilization spent on transmitting TCP streams. But what is the
>> gain for going up from 8 K / 5 segment TSO to 64 K / 44 segment bursts,
>> knowing that bursts that large clearly coudl be a double-edged sword?
> One of those wonderful "it depends" sorts of things. If you have
> zero-copy, the diminishing returns are father out than if there is still
> a non-trivial per-byte cost (eg from copying from user to kernel). CKO
> and zero-copy address per-byte costs, TSO the per-packet.
> And as the TSO offload is increased, ACK processing becomes a greater
> and greater percentage of the CPU overhead, which gives rise to
> incentives for ACK avoidance heuristics :)
>> In other words, how large the TSO bursts could be considered tolerable
>> justified - roughly where should we draw the line?
> There is no line - at least not in the sense that we can say
> categorically "N burst good, M burst bad" because it will vary with the
> conditions. I'm content to let the admin have control over the maximum
> size of the TSO offload, the routers drop packets and/or set ECN, and a
> congestion control algorithm to keep things from melting and leave it at
I like to think of bursts as time rather than packets or bytes. I
believe this to be a relatively scalable way to think about it.
While it's true that there's really no absolute way to know that any
particular burst size is okay or not, I think it's likely the case that
a sub-millisecond burst is okay, and one that lasts tens of milliseconds
is likely not. This gets into the range where buffers in a lot of
production hardware can be overflowed, humans can notice the jitter, and
it's likely to cause significant RTT spikes, especially if a few occur
at the same time.
Especially in the case of TSO, there's close to zero benefit to sending
TSO segments that are longer than 1ms of bottleneck wire time, as the
TCP processing load for that packet rate is going to be well under 0.1%
on a modern machine. Incidentally, I cooked up a patch last week (I
should in to netdev soon) to limit TSO segment size on slow connections
based on packet time, not just fraction of the window size.
At GigE speeds, I haven't seen in practice that a 64k burst (about
0.5ms) is ever a problem.
More information about the end2end-interest