[e2e] Re: [Tsvwg] Really End-to-end or CRC vs everything else?

Sun Jun 10 16:52:10 PDT 2001

> >I know of no other easily proven statement for
> >bit "stuckedness" for Alder (or even other statistical
> >or frequency based checking/detection algorithms).
> 
> The TCP checksum catches all error patterns that are 15-bits longer or
> less and all but one 16-bit long error.  If you did the TCP checksum as
> a 32-bit checksum (32-bit adds instead of 16) you'd have the same
> property.  If single bit errors are the error pattern, CRC is overkill.
> 
> Key point here is we need an error model that is realistic to determine
> what checksum to use.

This is exactly right.  Not only is coming up with a realistic error model
for an end-to-end checksum difficult, there is also no guarantee that any
system-level error model which is chosen will remain realistic over time.

A classic example of this with respect to the TCP checksum, which I think
has generally fared pretty well in real life, was an early, buggy, pre-AAL5
ATM DSU from one vendor which would occasionally reorder consecutive 48
byte chunks of packets when under load.  As long as the front of the
packet was undamaged the TCP checksum would never detect this (it is
fortunate that there are routing protocols which depend on the TCP
checksum for protection or the owner of the network using the boxes might
not have noticed and fixed this in a hurry, leaving users to fend for
themselves).  I've alway thought that if it had been known at the time
the TCP checksum was chosen that there would be transmission technologies
that could plausibly make packet-rearrangement errors common, a different,
non-associative checksum might have been picked instead.

A lesser example is the CRC-16.  The standard HDLC CRC-16 has the property
that it will detect all errors where an odd number of bits are in error.
The cost of this is that the ability of the standard CRC-16 to detect
errors with an even number of bits in error is weaker than 16-bit CRC's
which don't have this property (it still detects all but 1 in 2^16 of all
possible errors, but since half of all possible errors have an odd number
of bits in error it detects fewer of the other half).  This is still a
desirable property if one makes the traditional transmission system
assumption that lower Hamming weight errors occur with much greater
frequency, since it means that packets under 4 kB in length need at least
4 errored bits for an error to go undetected.  The problem is that for HDLC
over SONET, which came into use about 25 years after the CRC-16 did, we
ended up running a self-synchronous x^43 + 1 scrambler on top of the
CRC-protected packet data.  Self-synchronous scramblers have the effect
of multiplying bit errors, and the x^43 + 1 scrambler turns every
transmission bit error into two errored bits in the packet.  Methodically
doubling the bit errors makes most errors have an even number of bits
in error, which makes one wish that they'd picked a CRC-16 polynomial
for HDLC which wasn't maximally weak at detecting such errors.

Choices which are really good under some sets of circumstances are really
bad under others.  For link-by-link protection from transmission errors we
have the luxury of crafting the error detection to match the characteristics
of the link, and even of changing it if the original choice is proven wrong,
but end-to-end checksums are long-lived and are supposed to be able to treat
the network stuff in between the ends as an ever-changing black box.  I
have no idea how you design for this.

Dennis Ferguson