[e2e] Reacting to corruption based loss

Fri Jul 1 14:18:15 PDT 2005

Sam, the issue isn't that I'm "suggesting that TCP/IP is fundamentally
flawed as a layer3/4 team and think that a replacement of the protocol is in
order".  It's that the bigger, elephant-sized issue, alluded to by Cerf
himself, has been for years that protocol development for the Internet stopped
short.  Surely, if you're a corporate CIO and your employees have to spend 30%
more time, 8 hours a day passing their data around, you'd be concerned.  I
mean, if you could get the 30% back that your own experiment showed was lost
unnecessarily, then you could do your CIO job better and lay off some folks.
:]

The fact that our apps must use a transport that's brain dead about whether to
slow down because losses are errors or congestion creates an unnecessary
inefficiency that could have been resolved years ago, and now requires more
effort to move the established bureaucracy and installed base.  Of course, we
can write our own transports (and L3s), as some have, particularly when RTTs
are very large, as the space program has done for decades.

Your bottom line comment on your experiment's 30% slowdown of it "didn't suck
too much", illustrates the problem.  Any protocol stack (Atalk, Netware, XNS,
SNA, DECnet, Vines...) could have been used to get the same, or better,
result.  That's a pretty low bar to set for something as important as the
Internet.  

I'd not feel good about it, if it had been my responsibility to continue
TCP/IP protocol work, even given its non-competitive subsidy.  After all, was
there ever a bakeoff with other development results?  No.  TCP/IP development
stagnated, yet it was subsidized around the world by free distribution with
almost every OS and box being shipped.  Can't beat that marketing.  But, that
marketing, as we know with uSoft, inevitably leads to mediocrity.

The other side of the current state of mediocrity is the amazing lack of
installation control.  This is apart from the umpteen security flaws built
into the Internet protocols from the start, costing us billions to
ameliorate.  Release control is the more fundamental, bigger elephant, because
sloppiness in its realm leads to all manner of problems, inevitably expensive
to us all.

My point is that there's opportunity in all these issues to do better. 
There's been that opportunity for years.  A bureaucacy formed long ago that
thwarts addressing it.  It'd be great to see folks engage the opportunity and
make progress.

Alex

Sam Manthorpe wrote:
> 
> Hi Alex,
> 
> Sorry for the delay in replying...
> 
> ---- Original message ----
> >Date: Wed, 29 Jun 2005 11:47:04 -0700
> >From: Cannara <cannara at attglobal.net>
> >Subject: Re: [e2e] Reacting to corruption based loss
> >To: end2end-interest at postel.org
> >
> >Good response Sam.  The kind that leads to more thought, in fact.
> >
> >How many years ago for the 1st example, you ask.  For that one, 6.  For the
> >one this year, 0.25 year.  :]
> >
> >You say you "spent a long time simulating the BSD stack".  That's great, and
> >part of the problem.  Folks do simulations which are based on code written to
> >simulate someone's ideas on how something works.  Then, they believe the
> >simulations, despite what's actually seen in reality.  We all know that
> >simulators and their use can be very limited in relevance, if not accuracy.
> 
> Yes I know, which was why I qualified my observation with my
> practical experience as well.  I keep seeing these rathole
> threads one e2e and, to my shame, dipped my ignorant toe in :-)
> I think for the benefit of e2e part-time readers as myself, a synopsis
> of the actual problem with TCP as it stands today for local and
> global communication would be a good thing.  Because I can't
> perceive any.  And let's not do the anectodal thing, I'm thinking
> more of a cost-based analysis, including details of how much the
> alleged problem is costing.
> 
> >One of the biggest issues is lack of release control for things as important
> >as Internet protocols (e.g., TCP).  Thus the NT server may have a different
> >version of TCP from that on the user's spanking new PC.  No one ever addresses
> >even the basics of stack parameter settings in their manuals, and network
> >staffers rarely have the time to go in and check versions, timer settings,
> >yadda, yadda.  This is indeed why many performance problems occur. You fixed
> >IRIX 6 years ago.  Great.
> 
> Um, it was a bug.  I didn't understand the argument...
> 
> >
> >Now, why does the Internet work?  Not simply because of TCP, for sure.  Your
> >experiment illustrates the rush to acceptance these points are raised
> >against:
> >
> >"I transfered a largish file to my sluggish corporate ftp server.  Took 77
> >seconds (over the Internet, from San Francisco to Sunnyvale).  I then did the
> >same thing, this time I unplugged my Ethernet cable 6 times, each time for 4
> >seconds.  The transfer took 131 seconds."
> >
> >So, what is "largish" in more precise terms?  What are the RTT and limiting
> >bit-rate of your "Internet" path from SF to S'vale?
> 
> As I said, it was "for fun". :-)
> 
> >The file evidently went
> >right by our house!  But, despite the imprecision, we can use your result:  77
> >+ 6 x 4 = 101.  Your transfer actually took 131 seconds, fully 30% more than
> >one would expect on a link that's simply interrupted, not congested.  Good
> >experiment!
> 
> But the relevant fact is that it worked.  And didn't suck too much.
> And I'm confident that it would still work and not suck too much
> even if SBC replaced all their hardware with something that had a
> hypothetical bug in the OS that made my biterror notification not
> inform my transport layer that loss was due to congestion and not
> link flakiness. Sure you could architect something that utilized
> every spare bit on a link, but at what cost?  And why?  What's
> the justification for all the added points-of-failure?
> 
> Again, I don't follow this list much, but reading a few of your
> postings, you seem to be suggesting that TCP/IP is fundamentally
> flawed as a layer3/4 team and think that a replacement of the
> protocol is in order.  Do I understand you correctly?
> 
> Cheers,
> -- Sam
> ------------------------
> Sam Manthorpe, Mirapoint