[e2e] Reacting to corruption based loss

Wed Jun 29 10:51:53 PDT 2005

Detlef, on the questions relating to Internet loss, you should talk to the
folks at SLAC, who've been doing Ping-around-the-world for years.  Try Les
Cottrell, who may be listening.  It was, not long ago, that peering points,
especially back East, were losing 30% of frames for busy periods.  Don't know
what that is today.

Alex

Detlef Bosau wrote:
> 
> Sam Manthorpe wrote:
> 
> > > As an example of the latter, a major telecom company, whose services many of
> > > us are using this instant, called a few years back, asking for help
> >
> > How many (years?)
> 
> Alex reminded me on a strange situation, I met myself a couple of years
> ago.
> However, there is one lesson, I´ve learned meanwhile: It´s not the
> stork, who brings the babies ;-)
> There is a difference between correlation and causality.
> 
> In other words: It may happen quite often, that problems occur at the
> same time but with no causal relationship.
> One day, I met strong TCP/IP problems on a WAN line exhibiting a BER
> 10^-9, which was more than specified. However, I have thought about the
> situation a few years later and learned: BER 10^-9 => one packet in 125
> MBytes is corrupted => there are about four or five corrupted TCP
> datagrams when I download an ISO image for the new RedHat Linux
> distribution.
> 
> I don´t know whether this phrase exists in English as well, but in
> Germany we call this "beyond good and evil".
> 
> Four corrupted packets in an ISO image - and please consider, most TCP
> flows conists only of some dozen packets.
> 
> Nobody would ever notice those error rates. This _is_ neglectible.
> 
> I don´t know, what really caused the trouble. But it surely was not the
> BER.
> 
> I sometimes met, that those error rates were not the only problem that
> time, and more important: not the real cause for problems. A few years
> ago, we hat a cisco box which definitely scrambled IPX datagrams in
> certain cituaions. This bug was hard to find, at last we put sniffers at
> three locations along the path in the company network. However, it coud
> be identified, ciso fixed the problem and anything was fine.
> 
> Software bugs do happen, however that´s not the end of the world. And
> even more, I can blame no one for software bugs as long as I produce
> ones myself.
> 
> We had a problem, we identified it, we fixed it - anything was fine and
> andybody was lucky.
> 
> >
> > I can't help but wonder - if TCP/IP were generally so sensitive to a loss
> > of 0.4%, then why does the Internet work?  I spent a long time simulating
> 
> This is my question as well. Just for fun, I simulated TCP flows with
> packet error rates of 1$ to 5%.
> 
> And as far as I can remember, 1 % packet corruption rate did not really
> matter.
> 
> > the BSD stack a while back and it held up extremely well under random
> > loss until you hit 10% at which point things go non-linear.  I've also
> > never experienced what you describe, neither as a user nor or in my
> > capacity as engineer debugging customer network problems.
> >
> > And what's with that "major corporation" and "boss" stuff?  I'm guessing
> > they'd like the "replace the hardware" solution to the "replace the
> > whole infrastructure with something that's incompatible with everything else
> > on the planet" one.
> 
> Companies do often replace hardwre and software, if it only fixes the
> problem.
> 
> In industrial plants, people often are not interested in the real
> problem. They want a _fast_ and _cheap_ solution. So, if one says: "It´s
> the Cisco featureset!"
> and then the cisco box is replaced by onther model - possibly working
> around the problem as a side effect, anybody is lucky about it.
> It´s simply much cheaber to replace even an expensive cisco box than to
> have a dozen netwok consultants looking after the _real_ problem a few
> months or so.
> 
> Perhaps, cisco boxes are a bad example. But we met problems in protocols
> without flow control - which lead to problems in NIC with different
> buffers.
> => Not the software was rewritten but the NIC replaces.
> 
> Cheap, works (around the problem), anybondy is lucky.
> 
> However, one cannot always derive fundamental problems in TCP from this.
> 
> And the rationale behind this is an economical one - not a scientific
> one.
> 
> However: Does anybody have recent data about e2e packet corruption rates
> in Internet connections or corporate LANs, even with a large number of
> hops?
> 
> I think, this would be useful for the discussion here.
> 
> DB
> 
> --
> Detlef Bosau
> Galileistrasse 30
> 70565 Stuttgart
> Mail: detlef.bosau at web.de
> Web: http://www.detlef-bosau.de
> Mobile: +49 172 681 9937