[e2e] UDP checksum field?

Mon Apr 4 09:30:47 PDT 2005

Yes, Lloyd is exactly right here. It is often the case that people turn off UDP checksums to "buy" more performance by relying on the CRC of the ethernet packet. It's not a stupid question - it's a very smart question, and a lot of smart people get fooled by this.

For example, the Sun datacenter back in the early 1990's had an NFS cluster project called Sunbox - an array of workstation CPUs that did divide and conquer to build a massive file server. It used an ethernet multiplexer to dynamically split the load. To buy back performance, they turned off the UDP checksum. It worked fine until they had a bad lot of ethernet boards with substandard memories - this wasn't picked up in tests because the test units were doing resends of the occasionally corrupted packets (UDP checksums usually was turned on), and in TCP the checksums would do resends as well. It was also a fairly rare problem, and the test periods were too short to pick up on the nature of this problem easily. 

But when UDP checksums were turned off in normal use, the resulting NFS requests were corrupting the filesystem (which in this case were database files), forcing rebuilds and manual repairs of database tables. 

As they were about to announce and release it, they suddenly discovered this problem - they noticed the corruption and in order to determine whether it was in the high level (stack or above) or lower levels, they turned on checksums and it worked immediately. 

They then examined the failed checksum packets to traceback in the lower level stack-down through the link layer to discover where the corruption occured. With logic analyzers, they were able to observe the contents going into memory from the NIC on reception was different than the contents going out of the memory and traveling across the bus to the processor. 

This is a surprisingly common problem in datacenters - sometimes the problem would be a switch, sometimes a configuration error, sometimes a programming error in the application, and so forth. I most recently experienced this problem with an overheated ethernet switch passing  video on an internal network.

I also ran into this at an Internet portal company where I was a manager. We were using NetApps file servers to mirror the daily information - NetApps at the time encouraged staff to turn off checksums to increase performance. The DBAs noticed problems and ended up doing frequent rebuilds, but couldn't figure out why. It took me a lot of time to convince my staff to turn on the checksums because they were told "they don't have to" by NetApps. Most datacenter staff work by cookbook, and this wasn't in the cookbook. When they finally tried it, it worked. This little problem cost us a lot of time and aggravation for very little (if any) performance gain. 

Performance gain by turning off checksums now can be obviated through the use of intelligent NIC technologies like SiliconTCP (http://jolitz.telemuse.net/pubs/pt2001_01/item) and TOE that calculate the checksum as the packet is being received. But we don't have this in commodity switches yet, so check that switch if you're having problems.

Higher level checksums are worth it every time. Don't leave the server without them. :-)

Lynne Jolitz.

----
We use SpamQuiz.
If your ISP didn't make the grade try http://lynne.telemuse.net

> -----Original Message-----
> From: end2end-interest-bounces at postel.org
> [mailto:end2end-interest-bounces at postel.org]On Behalf Of Lloyd Wood
> Sent: Monday, April 04, 2005 2:48 AM
> To: Faisal Aslam
> Cc: end2end-interest at postel.org
> Subject: Re: [e2e] UDP checksum field?
> 
> 
> On Sun, 3 Apr 2005, Faisal Aslam wrote:
> 
> > Why we have checksum field is in UDP header, as UDP does not provide
> > data retransmission etc? I think it is used only to silently
> > discarding a packet with wrong checksum (thats it?).
> 
> yes - you need an end-to-end check against a corrupted packet. UDP
> could have the checksum turned off, which proved disastrous for a
> number of applications, subtly corrupted filing systems which didn't
> have higher-level end2end checks etc.
> 
> > Is there any  other application of checksum field?
> 
> For other applications
> http://www.faqs.org/rfcs/rfc3828.html
> 
> UDP Lite originally sprang out of the observation that UDP has
> redundant length information, and that this information could be
> combined with the checksum (as in TCP/UDP) to give partial coverage.
> 
> L.
> 
> >
> > Sorry if the question is too naive.
> >
> > Thanks
> > Faisal
> >
> >
> 
> <http://www.ee.surrey.ac.uk/Personal/L.Wood/><L.Wood at eim.surrey.ac.uk>
>