[e2e] Protocols breaking the end-to-end argument

Sat Oct 24 09:24:23 PDT 2009

On Oct 24, 2009, at 5:06 AM, William Allen Simpson wrote:

> rick jones wrote:
>> Perhaps he is referring to chips which provide TCP/Transport  
>> Segmentation Offload - aka TSO - the functionality that allows the  
>> stack to hand the chip a chunk of data > the MTU, along with the  
>> initial TCP/IP headers and the connection's on the wire MSS, and  
>> then have the chip otherwise statelessly segment that larger chunk  
>> of data into MSS-sized segments for transmission on the wire/fibre/ 
>> etc.
> It is indeed.  Since the hardware driver is unaware of many things,
> such as path MTU, this is one of its serious impediments.

WRT PathMTU, the implementations with which I am familiar have the  
stack telling the NIC the on-the-wire size (what I tend to call the  
effective MSS) to use on each "large send" where that effective MSS is  
updated based on PathMTU information as/if it arrives.

> Sure, there are measurements that show several percentage points less
> CPU, but in most cases we're not CPU bound.  I'm not sure what problem
> it's solving, other than a checkbox to differentiate commodity  
> products.

When the functionality was introduced in the 1GbE NICs it was to allow  
them to be driven at link-rate with the then-contemporary CPUs, not  
only for easily dismissed (well, not IMO :) things like netperf  
TCP_STREAM, but also for items customers actually did like file  
transfers, or clustered database traffic, etc.  (ie if you can't get  
there with netperf, you ain't going to get there with FTP)

Now, this may be a place where my world starts to diverge from the  
rest of the end2end community's - indeed many of my employer's  
customers do things across the big-I Internet, but they do far more  
across their corporate LANs and intranets.  I can see where being CPU  
bound talking across the big-I Internet is perhaps rare, but being CPU  
bound when talking across the corporate 1 Gig LAN was not rare.  And  
essentially we have One Protocol to Rule Them All...

Yes, CPUs today are "faster" than at the dawn of 1 Gig Ethernet.  We  
are also at the dawn (perhaps a little past, depends I suppose on  
one's deployment longitude) of 10 Gig Ethernet.  Bless their hearts,  
when a customer upgrades their network from one speed to the next,  
they care little about Amdahl's Law etc and get quite agitated when  
one cannot achieve link-rate on the next higher speed.  Well, they  
might give you a generation's worth of lee-way, but by the time the  
second generation of the NIC arrives, their expectations are pretty  
firm.  If your solution cannot achieve link-rate, your solution is not  
selected.

TSO and GRO, like Jumbo Frames, can be thought of as the inevitable  
"inter-reaction" between customer expectations and a de jure network  
MTU size that has remained unchanged since the dawn of Ethernet.  Or,  
put another way, we have begun treating the Ethernet MTU as damaged  
and routed around it.

>> And if that upsets him, we better not tell him about the 10G NICs  
>> also doing receive offload... :)
> I'd heard of it, but thought that was pretty uniformly rejected.   
> Heck,
> the most basic TCP decision points would be impossible to implement,
> revise, or test.

"LRO" (multiple segment coalescing done in the chip and an uber frame  
hitting the host with the intermediate headers stripped) has been  
rejected in Linux-land in favor of GRO, which preserves the arriving  
segment boundaries via some clever linking of buffers (and perhaps  
some header-data split but I'm fuzzy there).

>> BTW, I do not believe that any router actually has TSO happen to  
>> TCP segments contained within the IP datagrams passing through it -  
>> although
>
> Only recently trying to decipher the Linux stack, but it all appears  
> to
> go through the same queue, routed packets included.  If the box  
> receives
> a jumbogram on one interface, it can be re-segmented out another, and
> I've not found any support for PMTUD or ECN or anything.
>
>
>> there have been issues in Linux with LRO (Large Receive Offload,  
>> distinct from General Receive Offload) when the system was acting  
>> as either a router or a bridge - because TSO doesn't happen in that  
>> path :)
> Again, I'm not as familiar with Linux-only terminology.  A quick  
> Google
> turns up "Generic Receive Offload", and that appears to be explicitly
> designed to merge segments in routers, and re-segment out the other  
> side:
>
>  http://lwn.net/Articles/311357/
>
> I'm pretty sure this is contrary to the end-to-end [argument,  
> principle,
> what-have-you]....

You are supposed to be ignoring the code-path behind the curtain :)

rick jones

http://homepage.mac.com/perfgeek