[e2e] TCP offload (was part of Protocols breaking the end-to-end argument)
rick jones
perfgeek at mac.com
Mon Oct 26 08:12:58 PDT 2009
On Oct 26, 2009, at 5:53 AM, William Allen Simpson wrote:
> TCP Checksum Offload (shouldn't that be TCO or CSO?)
It wouldn't be the former for it is used for UDP as well. As for CSO
vs CKO, a (stinking?) rose by any other name I suppose. The feature
got named CKO, I'm content to leave it as such.
> TCP Segment Offload (TSO) -- large TCP segments are broken into
> smaller
> ones -- wouldn't be a problem where the stack always feeds the chip
> properly-sized PMTU segments.
I may have misunderstood your wording, but if TCP has already
segmented, there wouldn't be much if any offloading. The stack hands
the chip everything it needs to know to make properly-sized segments
on each large send. (IIRC Solaris experimented with Multi Data
Transmit (MDT - a joy of a search term...) where they did have TCP do
all the segmentation and all it did was pass a list of multiple
segments in one go (not unlike packet trains in say HP-UX 8.07 IP
fragmentation and elsestacks I suspect) but that "poor man's TSO" I
don't think went very far even though it could give a little boost
over a non-TSO-capable NIC. "All" the NICs can do TSO now. (TSO
itself is sometimes referred to as "poor man's Jumbo Frames" and we
would circle-back to a de jure MTU that has remained unchanged for
decades....)
> For routing a LAN jumbogram into a WAN,
> that's broken! The drivers had better be smart enough to honor "Don't
> Fragment" (DF), even though that technically only applies to IP.
> Best to
> turn it off for all routed packets. Does your implementation?
I do not have my own TCP/IP stack :) I interact to varying degrees
with the stacks of others. Based on that experience, the decision to
do TSO is on a send by send basis. TCP sets-up the send to be either
TSO or non-TSO, the driver does the appropriate thing to the packet
descriptor(s) to inform the NIC. While I cannot say that I've gone
looking for the code in Linux and elsewhere, unless IP tries to set it
up on a routed datagram, and I do not believe it does, TSO will not be
applied as the datagram leaves via the egress interface.
> TCP Large Receive Offload (LRO) -- small TCP segments are combined
> into
> larger ones -- is an unmitigated disaster. The sender has no
> ability to
> turn it off, and no idea that it's happening. Assuming it leaves
> SYN-bearing segments untouched, I'd still think that breaks almost
> every
> existing Ack-bearing TCP option.
You must really like the HP-UX and Solaris (and any other Mentat-
derived stack's) ACK avoidance heuristics :) Another example of
customer LAN/MAN needs/desires coming-up against what is felt to be
necessary for the big-I Internet. IIRC the Solaris stack does attempt
to make a distinction between local and remote when deciding to (not)
apply the ACK avoidance heuristic. Both have mechanisms to evolve up
to their levels of avoidance and devolve back to the chapter-and-verse
ack-every-other behaviour suggested by the RFCs in the presence of
anomalies. Both can be controlled completely (on, off, degree) by the
system administrator.
> In either of the latter cases, I don't see how PAWS Timestamps or
> the MD5
> Authentication Option would ever work.
PAWS Timestamps need-not (should not?) be unique from segment to
segment, only from window to window or transmission to retransmission
yes? So, on the sending side, since the host TCP is very much in
control, if a sequence of N segments would have a PAWS increment in
the middle, TCP can split the large send into two at that point.
I do not know if GRO (or the card-based LRO) does the opposite on the
way in, but I could easily see (and not actually) them checking and
asking "is this timestamp the same as the previous" when making
coalescing decisions.
rick jones
Wisdom teeth are impacted, people are affected by the effects of events
More information about the end2end-interest
mailing list