[e2e] Re: [Tsvwg] Really End-to-end or CRC vs everything else ?

Fri Jun 8 13:41:13 PDT 2001

In message <5.1.0.14.2.20010608151421.03277d10 at mail.reed.com>,
"David P. Reed" writes:

>Since I'm a bit twiddler at heart, I'm going to "optimize" the CRC32 code 
>and the MD5-MAC code just to see if this is because of poor implementation.

Their CRC code uses a standard 256-entry lookup table; nothing
special.  Their MD5 inner loop is aggressively unrolled. And it's on a
little-endian machine, which helps md5 significantly; Joe Touch has a
paper from 1995 and rfc1810 which discuss that in detail.
Dave Feldmeier's polynomial-arithmetic code (ToN, dec 1995) is faster
for low-weight generator polynomials.  The Ethernet CRC-32 has enough
1 bits that table lookup is still faster.

If you want to see bit-pushing, Antoon Bosselaers has hand-tuned
assembly implementations at
http://www.esat.kuleuve.ac.be/~cosicart/ps/AB-9701.ps.gz

>But one might very sensibly choose MD5-MAC for all the reasons I've 
>mentioned earlier about foiling the foul middleboxes.

I think the high-order bit is to look at their numbers for adler32,
and to note that a Fletcher sum using 16-bit inputs (and without the
mod-65521, nasty on architectures with slow or no integer divide) is
even faster than adler32.  It's that computational-cost knee which is
of concern: pick a function any more costly and it'll be put into
outboard hardware, with enough of a benchmark win for people to turn
on the outboard checksums; thus destroying the end-to-end property of
the error check.