[e2e] Reacting to corruption based loss

Fri Jul 1 01:09:04 PDT 2005

Hi Alex,

Sorry for the delay in replying...

---- Original message ----
>Date: Wed, 29 Jun 2005 11:47:04 -0700
>From: Cannara <cannara at attglobal.net>  
>Subject: Re: [e2e] Reacting to corruption based loss  
>To: end2end-interest at postel.org
>
>Good response Sam.  The kind that leads to more thought, in fact.
>
>How many years ago for the 1st example, you ask.  For that one, 6.  For the
>one this year, 0.25 year.  :]
>
>You say you "spent a long time simulating the BSD stack".  That's great, and
>part of the problem.  Folks do simulations which are based on code written to
>simulate someone's ideas on how something works.  Then, they believe the
>simulations, despite what's actually seen in reality.  We all know that
>simulators and their use can be very limited in relevance, if not accuracy.

Yes I know, which was why I qualified my observation with my
practical experience as well.  I keep seeing these rathole
threads one e2e and, to my shame, dipped my ignorant toe in :-)
I think for the benefit of e2e part-time readers as myself, a synopsis
of the actual problem with TCP as it stands today for local and
global communication would be a good thing.  Because I can't
perceive any.  And let's not do the anectodal thing, I'm thinking
more of a cost-based analysis, including details of how much the
alleged problem is costing.

>One of the biggest issues is lack of release control for things as important
>as Internet protocols (e.g., TCP).  Thus the NT server may have a different
>version of TCP from that on the user's spanking new PC.  No one ever addresses
>even the basics of stack parameter settings in their manuals, and network
>staffers rarely have the time to go in and check versions, timer settings,
>yadda, yadda.  This is indeed why many performance problems occur. You fixed
>IRIX 6 years ago.  Great.

Um, it was a bug.  I didn't understand the argument...

>
>Now, why does the Internet work?  Not simply because of TCP, for sure.  Your
>experiment illustrates the rush to acceptance these points are raised
>against:  
>
>"I transfered a largish file to my sluggish corporate ftp server.  Took 77
>seconds (over the Internet, from San Francisco to Sunnyvale).  I then did the
>same thing, this time I unplugged my Ethernet cable 6 times, each time for 4
>seconds.  The transfer took 131 seconds."
>
>So, what is "largish" in more precise terms?  What are the RTT and limiting
>bit-rate of your "Internet" path from SF to S'vale?

As I said, it was "for fun". :-)

>The file evidently went
>right by our house!  But, despite the imprecision, we can use your result:  77
>+ 6 x 4 = 101.  Your transfer actually took 131 seconds, fully 30% more than
>one would expect on a link that's simply interrupted, not congested.  Good
>experiment!

But the relevant fact is that it worked.  And didn't suck too much.
And I'm confident that it would still work and not suck too much
even if SBC replaced all their hardware with something that had a
hypothetical bug in the OS that made my biterror notification not
inform my transport layer that loss was due to congestion and not
link flakiness. Sure you could architect something that utilized
every spare bit on a link, but at what cost?  And why?  What's
the justification for all the added points-of-failure?

Again, I don't follow this list much, but reading a few of your
postings, you seem to be suggesting that TCP/IP is fundamentally
flawed as a layer3/4 team and think that a replacement of the
protocol is in order.  Do I understand you correctly?

Cheers,
-- Sam
------------------------
Sam Manthorpe, Mirapoint