[e2e] Is a non-TCP solution dead?

Thu Apr 3 09:43:19 PST 2003

Jonathan, no, I'm not referring to what you term RPC/CIFS.  That behavior
would clearly be inefficient whatever the transport.  I'm referrring to an
ordinary file transfer, say from an NT server to a workstation, where TCP/IP
is the installed stack.  This has been a common config ever since Microsoft
began shipping TCP/IP.  The same behavior can be seen with other NOS
transactions, such as large SQL exchanges.  On NFS, you must have
misinterpreted what I meant -- NFS uses block exchanges as does SMB.  Both can
also, of course, have simple, 2-pkt exchanges.

On your driver topic, TCP behavior has nothing to do with an LAN interface's
ability to DMA.  The issue at hand, in terms of poor block performance, was
that the TCPs at either end of a 100mS link were unable to maintain the same
flow with .4% loss that they could with 0%, even within a factor of 20.  Thsi
was because of how the loss was handled and the fact that each block requested
for TCP transfer was put into an odd number of TCP 'segments', thus engaging
the ack timer.

On TCP gripes, they are old, and, if the archives for this list are available,
I'm sure you can find many expressed, not just by me.  I hoped it would be
clear what the defects described earlier in this chain were, but in case not,
here are a couple:  slow start, timed/alternate ack, backoff that equates loss
with congestion.  

Now any of these might be helped by various additional changes to TCP, but the
logical inconsistency is that something like ECN has to come from the network
-- the network layer's components all along the path(s)-- back to a TCP
sender.  Why assume a network-layer function in the transport?  Why assume
that, even if the above could be made to work, it would work long-term, in a
growing mix of non-TCP traffic?  Why assume it would be fair to all traffic? 
Others have come up with more questions.  

By the way, since we're about 2 miles from one another, I'll be happy to show
you any data you'd like to see.

Alex

Jonathan Stone wrote:
> 
> Alex,
> 
> When you say "SMB", I assume you mean the Server Message Block
> protocol used in (for example) CIFS traffic,  as deployed in Microsoft servers.
> 
> >1) Even Old SMB ('90s) allows 128kB of block transfer down to the Transport.
> 
> That's not rrelevant to the stop-and-wait RPC payloads which SMB
> passes down to its own transport.  If the traffic layered on SMB is
> CIFS remote-file RPC. CIFS does have the stop-and-wait RPC behaviour I
> sketched. So, *if* your example of a ``NOS'' refers to CIFS traffic,
> then the problem is not TCP at all, and it's disingenuous to claim
> that it is.
> 
> While SMB itself may allow 128Kbyte SMB frames, both the open-source
> (SNIA) and the long-expired Microsoft CIFS I-D specify an 16-bit field
> for the overall octet length of the CIFS payload inside the SMB delimiters.
> 
> Then again: if your example was not CIFS traffic: show us a packet
> trace with more than 64Kbyte stop-and-wait RPCs, where the packet
> trace suggests TCP (rather than the stop-and-wait SMB RPC) is at fault.
> 
> >Things like NFS have had similar ranges.
> 
> No. NFS does not have the limits familiar to SMB and CIFS users. None
> of the NFS implementations I've ever used do stop-and-wait RPCs; they
> all employ nfsds on the server and nfsiods on the clients (or
> more modern equivalents) to sustain multiple RPCs in flight.
> In contrast, SMB (with CIFS) rarely has more than one RPC in flight.
> 
> >These are beyond the windows a vendor's TCP normally begins at, or
> >gets to in a few MB.
> 
> Scuse me, but that's nonsense.  I've personally instrumented Ethernet
> drivers for CIFS traffic. A maximum-length CIFS read or write is just
> over 64Kbytes, which consumes 44-odd standard length Ethernet packets.
> I've taken packet traces which show that under typical load, an Intel
> gigabit NIC[1] will DMA that train of 44-odd packets into memory in
> one hit, and delivers one interrupt for the entire packet train,
> shortly after the link goes idle.  In those circumstances the sender's
> TCP window clearly *has* to be more than one RPC's worth, because the
> receiving TCP isn't seeing any of those 40-odd packets until the whole
> burst of 40-odd packets has already been deposited in the receiver's
> memory[2]. The system under test did indeed get there ``in a few MB'.
> 
> The attribution of poor performance of (unspecified) SMB traffic to
> TCP's ACK-every-second packet heuristic thus makes no sense whatever.
> 
> That said: there is a gotcha in fast-retransmit/fast-recovery, with
> RPC- oriented traffic: a drop of any of the last 4 segments of an RPC
> has too few segments following it to trigger the 3-dup-ack threshold.
> But for the specific example of SMB, its moot: even WindowsCE devices
> can (with registry editing) do SACK, and have done for what, nearly
> 3 years now?
> 
> Alex, if you know of legitimate technical gripes with TCP, I'm
> genuinely keen to hear them.  But war stories, with more colour than
> fact, with insufficient detail to ascertain causes, yet with the blame
> assigned to TCP regardless of the facts, is a waste of our collective time.
> 
> [1] one port of an Intel 82546 with Intel-supplied FreeBSD 4.x
> drivers: I don' have Microsoft source code to instrument).
> 
> [2] I had rfc1323 options enabled.  So has most every other TCP in
> the last 5 to 8 years.  RFC-1323 will be eleven years old next month.