[e2e] Re: [Tsvwg] Really End-to-end or CRC vs everything else?

Douglas Otis dotis at sanlight.net
Mon Jun 11 16:41:29 PDT 2001


You can't be suggesting a simple summation is worth using in the face of
router memory errors.  You have detected these in the wild and noted their
positions within a 32 word.  You have statistics that indicate there are
more bits in error than others suggesting there may be a weak bus driver
being seen.  From the simple tests that I have run, a simple summation is
extremely weak in this area.  A CRC still does well even when the entire
packet is corrupted including the CRC itself.  There is no need for the CRC
to be affected to improve the performance of the algorithm.  I can say that
Fletcher-16 2^n should be avoided altogether due to this extremely weak
memory bus performance.  It is no where near 2^32 in preformance.  It is
closer to 2^6.  The only reason for placing a mandatory chunk at the end of
the packet would be to ensure against truncation which a good check should
catch.  If there is only one checksum type allowed, then placing this at the
end of the packet has the advantage of minimizing the potential passes this
packet needs in preparation.  Adler-32 suffers the Fletcher problem if the
packet is small or mostly zero.


> To: Craig Partridge; David P. Reed
> Cc: tsvwg at ietf.org; end2end
> Subject: Re: [e2e] Re: [Tsvwg] Really End-to-end or CRC vs everything
> else?
> In message <200106112048.f5BKmpF07926 at aland.bbn.com>Craig Partridge writes
> >
> >In message < at mail.reed.com>,
> "David P. Reed"
> >
> >writes:
> >I think you've missed the point.  In a prior note, you suggested
> a line of
> >thinking of assume an adversary.  Implicitly, that's an error model.
> >
> >So what if traffic doesn't match that error model -- that is to
> say, errors
> >are not ones an adversary would pick -- then the checksum chosen is the
> >wrong one.
> Craig,
> To be fair, there are several points kicking around here, and being
> made to (or intended for) different audiences.  The tsvwg folks are
> under time pressure to (re-)decide on a check function.  The e2e
> folks can take a longer, or at least a middle-vision, view.
> The way I'd put it to the tsvwg folks choosing a checksum is this:
> if you start appealing to catching all but 1 in 2^32 errors (or
> more accurately, 1 in 65521^2), then you have fallen into a fallacy.
> You have just conflated a purely combinatoric result, about the ratio
> of sizes of the domain and range of the error-check function, with a
> probabilistic statement about how likely *in practice* you are to
> catch errors.  What should give you a serious wake-up call,is to hear
> that even the constant function -- some constant 32-bit integer-- will
> catch the same fraction of all errors.  There's no grounds for
> labelling *any* function (in the mathematical sense) as stronger than
> another, unless we know something about the distribution of the
> errors, or how well some particular function does against some
> particular distribution.    CRCs are not any stronger  than
> checksums, *unless* we happen to know that the distribution of
> acutal errors tends to favour low Hamming-weight errors.
> (the data I and Craig have, is that it doesn't.)
> The point to Dave Reed is that the combinatoric argument is very
> general and applies to any function, whether the constant function, or
> a cryptographic hash, or a shared-secret key, provided we define
> "error check bits" to properly measure the fraction of bit
> combinations which are accepted, versus those which are
> rejected.
> One further thing I've mentioned in email is that I recently
> re-analyzed the captured error datasets which I and Craig and Vern
> gathered, and I did find one pattern which could be exploited here.
> The pattern is that the errors we found can be broadly characterized
> into two classes: either single-bit or short, low-hamming weight
> errors; or as errors where some prefix of the packet is bad; the
> packet is subjected to an error; and the error continues all the way
> to the end of the packet. The ratio of errored bits within that
> damaged `tail' of packet is very close to 0.5.
> That suggests an error model where we model packet-level errors as due
> to either signle-bit errors, memory-readout errors which affect a
> single word or cache line; or due to `stateful' errors in the
> hardware/software finite-state engines which move packets between
> packet- buffer memory, and the hardware which implements some specific
> media layer. (think of errors due to an under-run in a hardware FIFO,
> or a bad bit in a DMA pointer register.)
> There's two things to take away from that.  The first is that the
> errors we've acutally observed, in the only study of in-the-wildn
> packet-level errors I know of, the errors are so heavy that, on
> average, they affect more than R bits, for any R that's a plausible
> error-check. That says we're only going to catch errors stochastically.
> The second is that, since the errors seem to be stateful, putting the
> error-cheeck information at the end of the packet rather than in a
> fixed header field doesn't hurt, and (for the reasons we analyzed to
> death in our 98 ToN paper) will acutally help, for the kinds of
> nonuniform data we find in filesystems.
> If there's anything i can recommend to the tsvwg, its to pick even a
> 32-bit extension of the TCP checksum, rather than Adler32; and to
> think seriously about moving the error-check bits to the end.  Not to
> help hardware, but to make whatever error-check you use more resilient
> against errors in packet-processing engines which (once an error does
> hit) trash the remainder of the packet.
> _______________________________________________
> tsvwg mailing list
> tsvwg at ietf.org
> http://www1.ietf.org/mailman/listinfo/tsvwg

More information about the end2end-interest mailing list