[e2e] Reacting to corruption based loss
detlef.bosau at web.de
Wed Jun 29 04:17:39 PDT 2005
Sam Manthorpe wrote:
> > As an example of the latter, a major telecom company, whose services many of
> > us are using this instant, called a few years back, asking for help
> How many (years?)
Alex reminded me on a strange situation, I met myself a couple of years
However, there is one lesson, I´ve learned meanwhile: It´s not the
stork, who brings the babies ;-)
There is a difference between correlation and causality.
In other words: It may happen quite often, that problems occur at the
same time but with no causal relationship.
One day, I met strong TCP/IP problems on a WAN line exhibiting a BER
10^-9, which was more than specified. However, I have thought about the
situation a few years later and learned: BER 10^-9 => one packet in 125
MBytes is corrupted => there are about four or five corrupted TCP
datagrams when I download an ISO image for the new RedHat Linux
I don´t know whether this phrase exists in English as well, but in
Germany we call this "beyond good and evil".
Four corrupted packets in an ISO image - and please consider, most TCP
flows conists only of some dozen packets.
Nobody would ever notice those error rates. This _is_ neglectible.
I don´t know, what really caused the trouble. But it surely was not the
I sometimes met, that those error rates were not the only problem that
time, and more important: not the real cause for problems. A few years
ago, we hat a cisco box which definitely scrambled IPX datagrams in
certain cituaions. This bug was hard to find, at last we put sniffers at
three locations along the path in the company network. However, it coud
be identified, ciso fixed the problem and anything was fine.
Software bugs do happen, however that´s not the end of the world. And
even more, I can blame no one for software bugs as long as I produce
We had a problem, we identified it, we fixed it - anything was fine and
andybody was lucky.
> I can't help but wonder - if TCP/IP were generally so sensitive to a loss
> of 0.4%, then why does the Internet work? I spent a long time simulating
This is my question as well. Just for fun, I simulated TCP flows with
packet error rates of 1$ to 5%.
And as far as I can remember, 1 % packet corruption rate did not really
> the BSD stack a while back and it held up extremely well under random
> loss until you hit 10% at which point things go non-linear. I've also
> never experienced what you describe, neither as a user nor or in my
> capacity as engineer debugging customer network problems.
> And what's with that "major corporation" and "boss" stuff? I'm guessing
> they'd like the "replace the hardware" solution to the "replace the
> whole infrastructure with something that's incompatible with everything else
> on the planet" one.
Companies do often replace hardwre and software, if it only fixes the
In industrial plants, people often are not interested in the real
problem. They want a _fast_ and _cheap_ solution. So, if one says: "It´s
the Cisco featureset!"
and then the cisco box is replaced by onther model - possibly working
around the problem as a side effect, anybody is lucky about it.
It´s simply much cheaber to replace even an expensive cisco box than to
have a dozen netwok consultants looking after the _real_ problem a few
months or so.
Perhaps, cisco boxes are a bad example. But we met problems in protocols
without flow control - which lead to problems in NIC with different
=> Not the software was rewritten but the NIC replaces.
Cheap, works (around the problem), anybondy is lucky.
However, one cannot always derive fundamental problems in TCP from this.
And the rationale behind this is an economical one - not a scientific
However: Does anybody have recent data about e2e packet corruption rates
in Internet connections or corporate LANs, even with a large number of
I think, this would be useful for the discussion here.
Mail: detlef.bosau at web.de
Mobile: +49 172 681 9937
More information about the end2end-interest