[e2e] TCP offload (was part of Protocols breaking the end-to-end argument)

rick jones perfgeek at mac.com
Mon Oct 26 08:12:58 PDT 2009

On Oct 26, 2009, at 5:53 AM, William Allen Simpson wrote:
> TCP Checksum Offload (shouldn't that be TCO or CSO?)

It wouldn't be the former for it is used for UDP as well. As for CSO  
vs CKO, a (stinking?) rose by any other name I suppose.  The feature  
got named CKO, I'm content to leave it as such.

> TCP Segment Offload (TSO) -- large TCP segments are broken into  
> smaller
> ones -- wouldn't be a problem where the stack always feeds the chip
> properly-sized PMTU segments.

I may have misunderstood your wording, but if TCP has already  
segmented, there wouldn't be much if any offloading.  The stack hands  
the chip everything it needs to know to make properly-sized segments  
on each large send. (IIRC Solaris experimented with Multi Data  
Transmit (MDT - a joy of a search term...) where they did have TCP do  
all the segmentation and all it did was pass a list of multiple  
segments in one go (not unlike packet trains in say HP-UX 8.07 IP  
fragmentation and elsestacks I suspect) but that "poor man's TSO" I  
don't think went very far even though it could give a little boost  
over a non-TSO-capable NIC.  "All" the NICs can do TSO now. (TSO  
itself is sometimes referred to as "poor man's Jumbo Frames" and we  
would circle-back to a de jure MTU that has remained unchanged for  

> For routing a LAN jumbogram into a WAN,
> that's broken!  The drivers had better be smart enough to honor "Don't
> Fragment" (DF), even though that technically only applies to IP.   
> Best to
> turn it off for all routed packets.  Does your implementation?

I do not have my own TCP/IP stack :)  I interact to varying degrees  
with the stacks of others.  Based on that experience, the decision to  
do TSO is on a send by send basis.  TCP sets-up the send to be either  
TSO or non-TSO, the driver does the appropriate thing to the packet  
descriptor(s) to inform the NIC. While I cannot say that I've gone  
looking for the code in Linux and elsewhere, unless IP tries to set it  
up on a routed datagram, and I do not believe it does, TSO will not be  
applied as the datagram leaves via the egress interface.

> TCP Large Receive Offload (LRO) -- small TCP segments are combined  
> into
> larger ones -- is an unmitigated disaster.  The sender has no  
> ability to
> turn it off, and no idea that it's happening.  Assuming it leaves
> SYN-bearing segments untouched, I'd still think that breaks almost  
> every
> existing Ack-bearing TCP option.

You must really like the HP-UX and Solaris (and any other Mentat- 
derived stack's) ACK avoidance heuristics :) Another example of  
customer LAN/MAN needs/desires coming-up against what is felt to be  
necessary for the big-I Internet.  IIRC the Solaris stack does attempt  
to make a distinction between local and remote when deciding to (not)  
apply the ACK avoidance heuristic.  Both have mechanisms to evolve up  
to their levels of avoidance and devolve back to the chapter-and-verse  
ack-every-other behaviour suggested by the RFCs in the presence of  
anomalies.  Both can be controlled completely (on, off, degree) by the  
system administrator.

> In either of the latter cases, I don't see how PAWS Timestamps or  
> the MD5
> Authentication Option would ever work.

PAWS Timestamps need-not (should not?) be unique from segment to  
segment, only from window to window or transmission to retransmission  
yes?  So, on the sending side, since the host TCP is very much in  
control, if a sequence of N segments would have a PAWS increment in  
the middle, TCP can split the large send into two at that point.

I do not know if GRO (or the card-based LRO) does the opposite on the  
way in, but I could easily see (and not actually) them checking and  
asking "is this timestamp the same as the previous" when making  
coalescing decisions.

rick jones
Wisdom teeth are impacted, people are affected by the effects of events

More information about the end2end-interest mailing list