The Check(Sum) Is in the Mail

There are many reasons you might need to verify that data didn’t become corrupted. For example, data that arrives via an external communications channel might require validation. Critical data read from flash memory or some other mass storage device might also need verification. On a big system (that is, a desktop computer), you might use a CRC, but for a small computer this is often overkill.

A popular choice for validating data is a checksum. Sounds simple, right? The sender adds all the data it is sending together and sends the sum along with the data. The receiver computes the same sum and checks to make sure the computed sum matches the received one.

A simple checksum, however, has several important limitations. Usually, the sum is only a limited size, like 8 bits. Even a wider bit width checksum will still overflow when you add enough to it. Using a one’s complement sum puts the carry bit back in so that the checksum is more representative of all the input bits.

So in one’s complement math (using 8-bit hex numbers), 0xFE + 0x05 = 0x04, which is 0xFE+0x05+carry (note that in normal math, 0xFE+0x05=0x103; the 0x100 is the carry bit for 8-bit math).

Imagine you have an 8-bit checksum on a stream of bytes. To keep it simple, consider a 3-byte stream: 0x80, 0x80, 0x02. The simple sum (if you discard the carry bit) is 0x02. Adding the carry back (one’s complement) gives 0x3. A common error might set bad bytes to zero. If the first two bytes are erroneously set to zero, then the simple sum won’t notice. The carry wrapping solves that problem.

However, the one’s complement method still fails in another common case: What if a major failure causes all received bytes to become zero? Using either method will cause the receiver to compute a zero checksum. Since the error caused all the bytes to read zero (including the checksum itself), the checksum passes. One way to solve that problem is to invert the checksum before you send it. Then a legitimate stream of zeroes will have a checksum of 0xFF. A zero checksum, in that case, is an error.

Turns out, this is exactly the checksum algorithm that IP packets use. The C code below is from RFC1071 and computes a 16-bit checksum using both of these techniques. You can see, the extra code required compared to a normal addition is very minimal, although note that the code depends on not overflowing a 32-bit variable. Instead of wrapping the carry around, this code adds the top part of the variable (which is really just the sum of the carries) to the checksum at the end. Checksums are no replacement for CRCs or other error-detecting and correcting codes. But used correctly, they are a great tool for your bag of tricks.

           /* Compute Internet Checksum for "count" bytes
            *         beginning at location "addr".
       register long sum = 0;

        while( count > 1 )  {
           /*  This is the inner loop */
               sum += * (unsigned short) addr++;
               count -= 2;

           /*  Add left-over byte, if any */
       if( count > 0 )
               sum += * (unsigned char *) addr;

           /*  Fold 32-bit sum to 16 bits */
       while (sum>>16)
           sum = (sum & 0xffff) + (sum >> 16);

       checksum = ~sum;