[e2e] UDP checksum field?

Cannara cannara at attglobal.net
Tue Apr 5 10:06:43 PDT 2005


Note that many manufacturers of bridges & routers over the years have had the
intelligence to include error-detection & correction in memory.  However, when
the marketing decisions are made about test and default configuration, that
feature is usually turned off, so performance will be better.  Check your
system manuals for those options!  

One of my personal experiences with this mistrake was at a major Wall St.
investment house, where their Sun jockeys wrote trading programs that the firm
obviously depended on to make $ every second of every day in every market for
every commodity around the world.  They called us at Net Gen because their
programs were changing unpredictably and they thought "it's the network" (the
usual guess).  So, flew to NYC with a Sniffer(r) and discussed the problem: 
"m" was changing to "n", "C" to "D", "6" to "7" every once in a while in their
sources, so compilations would fail despite no changes by the programmers.  I
told them a Sniffer won't be able to see changing source files on the net, so
we sat down to draw exactly where the bodies were buried in their systems.

The short story was, debug the server that holds the sources.  Since they had
huge disc & RAM in the server, and programs were written to disc but often sat
in cache RAM for a while (even days), we decided to test disc, but especially
RAM.  No tests showed anything.  Then one of their network guys (a VP, because
banks always have only VPs access data :) said he'd heard of a special,
extremely rough pattern test.  He downloaded it, ran it, and sure enough one
small group of bits in one RAM chip was a little flakey.  If EDC RAM had been
used, it would not have been an issue.  Hey, it wasn't the network, but it was
end-end!

Alex


Lloyd Wood wrote:
> 
> On Mon, 4 Apr 2005, Bob Braden wrote:
> 
> >   *> explaining exactly the problem and urging the TNS checksum be implemented.  No
> >   *> response ever came back, and, if you look at a TNS packet today, the checksum
> >   *> is still zero.  I guess no one has used the gateway software who cares about
> >   *> their data.  :]
> >   *>
> >   *> Alex
> >   *>
> >
> > Or, the incidence of (detected) failures is so low that no one cares.
> 
> This is arguably currently the state with RAM. If you write to a
> memory subsystem, you would like some confidence that when you read it
> back the value is correct. This is often assumed.
> 
> You can write a paranoid application to write to memory locations
> multiple times (and those sticking computers in orbit do), read back
> and compare and check all of memory for reliability periodically, but
> having a checksum on each memory location can be a better safeguard,
> though it decreases memory density somewhat.
> 
> There's been much furore of late about 'bad RAM' in Apple Macintoshes;
> many computers have moved to ECC RAM, but Apple (bar its
> commercially-focused XServe) has not. (A decade ago, people were
> grumbling about Apple not using parity RAM.)
> 
> The end-to-end argument remains as valid inside the computer too.
> 
> L.


More information about the end2end-interest mailing list