[e2e] Re: [Tsvwg] Really End-to-end or CRC vs everything else ?
vjs at calcite.rhyolite.com
Sun Jun 10 07:59:31 PDT 2001
> From: julian_satran at il.ibm.com
> It is also hard for an "outsider" ("out-of-the-Central-Electronic-Complex")
> to say what end-to-end is.
> If end-to-end is "memory-to-memory" then computing the checksum with a CPU
> or with an outboard
> engine differ only in the buses they take the data through and I am not
> sure that you will want to trust more or less the memory-to-CPU bus than
> the memory-to-adapters bus.
I agree with that conclusion or principle but disagree with the reasoning.
Whether the CPU that runs the application computes the checksum affects
the sort of errors that can be detected by the checksum. I'm thinking of
my own experience with bugs in the memory mapping hardware for out-board
network interface hardware and as well as my own bugs in the software for
setting those page tables.** Then there are the bugs I've seen in DMA
machinery. With the outboard checksumming turned on, all of those bugs
appeared as data corruption not detected by network checksums, but with
the outboard checksum machinery off and the the checksums done by the
CPU's that run the applications they appeared as UDP and TCP checksum
errors. Never mind that flipping the offboard checksum switch affects
timing and other things so much that it's not easy to find such bugs or
even positively conclude that they exist based on the change in symptoms.
I'm still in favor of pushing the checksum off board if the price of
the system allows, because of the performance effects and because for
every one of those memory corruption bugs that can be detected by
application CPU computing the TCP or UDP checksum there are zillions
of other bugs that cause memory corruption that is invisible to the
checksums. Those bugs range from the ever popular wild pointers in
unrelated kernel code to bugs in cache snoopers, cache machinery, and
bus/switch hardware and software.
**In those particular cases, you could think of each outboard device
as having a set of virtual memory page pages though which it viewed all
of system memory. There are many reasons for such complications, but
perhaps the easiest to see are when you are using 32-bit PCI devices in
systems that can have more than 4 GByte of memory.
Vernon Schryver vjs at rhyolite.com
More information about the end2end-interest