[e2e] Reacting to corruption based loss

Wed Jun 29 20:03:14 PDT 2005

On Jun 28, 2005, at 11:35 PM, Cannara wrote:
>
> Now, if you were a network manager for a major corporation, would you 
> rush to
> fix a physical problem that generated less than 1% errors, if your 
> boss &
> users were complaining about mysterious slowdowns many times larger?  
> 0.4%
> wasn't even enough to trigger an alert on their management consoles.  
> You'd
> certainly be looking for bigger fish.  Well, TCP's algorithms create a 
> bigger
> fish -- talk about Henny Penny.  :]
>
> The files were transferred in many 34kB SMB blocks, which required 
> something
> like 23 server pkts per.  The NT servers had a send window of about 6 
> pkts
> (uSoft later increased that to about 12).  All interfaces were 
> 100Mb/s, except
> the T3 and a couple of T1s, depending on path.  RTT was about 70mS for 
> all
> paths.
>
> Thankfully, the Sniffer traces also showed exactly what the TCPs at 
> both ends
> were doing, despite Fast Retransmit, SACK, etc.: a) the typical, 
> default
> timeouts were knocking the heck out of throughput;

with a send window of only 6 packets, and a synchronous 
request/response protocol like SMB (IIRC) it would seem that fast rtx 
wouldn't have had much of a chance anyway

> b) the fact that transfers
> required many blocks of odd numbers of pkts meant the the Ack Timer at 
> the
> receiver was expiring on every block, waiting (~100mS) for the magical
> even-numbered last pkt in the block, which never came.

Why on earth should that have mattered unless perhaps the sending TCP 
had a broken implementation of Nagle that was going segment by segment 
rather than send by send?

rick jones
Wisdom teeth are impacted, people are affected by the effects of events