[e2e] What should e2e protocols know about lower layers?

Fri Oct 12 15:31:41 PDT 2001

In message <200110121933.f9CJX3k00988 at aland.bbn.com>,
Craig Partridge writes:

>
>In message <20011012090021.B28377 at ted.isi.edu>, Ted Faber writes:
>
>>still wouldn't catch it.  Granted, skipping the checksum on the local
>>link is an additional error source, but I suspect it's less of one than
>>a disk nearing its MTBF.  
>
>If my memory is right, while doing the data collection for our
>SIGCOMM paper and Jonathan Stone's dissertation, we saw some systems
>corrupt on the order of 1 packet in 100 (bad network adapters, so the
>data was corrupted by the sending host's adapter after the TCP checksum
>was computed but before the CRC was computed).  I think that kind of
>error rates beats most disks nearing the end of their lifetime...

I suspect Craig is mixing up two different cases.
At one site, we saw long-term error rates on the order of 1 packet in
1000 having invalid TCP checksums. That was due to a bug in
Microsoft's NT stack (which affected connection shutdown, not data
transfer).

On the only "end-host" LAN where we obtained permission to wiretap the
net and look for bad packets, approximately like 3 out of 28 hosts[*]
consistently generated bad packets as Craig describes; but the overall
failure rates were, I believe, less than 11 in 100 packets.
(its hard to say for sure, since permission to gather the data at all
was contigent on not keeping or examining any undamaged packets).

For those NICs which i did catck did damage packets, the rates were
still worse than disks near, but not at, the edge of failure.
What we dont know, is how many such NICs are deployed.

[*] The population of hosts changed during the course of
the experiment.