PSA: How to safely check buffer sizes in C

A common task in C is checking whether some data fits in a buffer. Unfortunately, doing it can be quite tricky, as http://lwn.net/Articles/278137/ shows. Going by the standard, the problem is that in C creating a pointer beyond the end of an array is undefined behaviour – which means the compiler can do whatever it wants – including sending your customers’ credit card numbers to Russia (actually that’s quite a probable outcome, because, as the link shows, the compiler may introduce a security bug, which will be exploited and result in the credit card numbers being sent).

 

First, I’ll like to say that C does allow you to create a pointer to exactly-the-end-of-an-array (say, if A is an array of length 5, then you can create a pointer to (the non-existent) A[5], as long as you don’t dereference it).

 

The right thing to do is actually quite simple,

size_t buffer_size = ...;
char * buffer = malloc(buffer_size);
if(!buffer) return NULL; // or something
...
unsigned int overhead = calculate_overhead_from_data();
unsigned int alleged_size = ask_data_what_the_size_is();
if(buffer_size < overhead ||
   buffer_size - overhead < alleged_size)
  fail();

Or, say, when getting a packet:

// assume packet_end is the actual end of the packet
if(packet_end - p < 2*sizeof(unsigned int)) fail();
unsigned int len = read_uint(p);
p += sizeof(unsigned int);

unsigned int another_field = read_uint(p);
p += sizeof(unsigned int);


if(packet_end - p < len) fail();
DO_SOMETHING_WITH_REST_OF_PACKET

The things you must NEVER do are this:

// CODE IS WRONG AND EVIL
// DO NOT USE UNLESS YOU ARE INTENTIONALLY INTRODUCING A BACKDOOR
// assume packet_end is the actual end of the packet
if(packet_end - p < 2*sizeof(unsigned int)) fail();
unsigned int len = read_uint(p);
p += sizeof(unsigned int);

// WRONG!
unsigned int another_field = read_uint(p);
p += sizeof(unsigned int);
if(p + len > packet_end) fail();

// ALSO WRONG!
if(packet_end - p < len + sizeof(unsigned int)) fail()

DO_SOMETHING_WITH_REST_OF_PACKET

The code in red tests for overflow by first creating a pointer overflowing a buffer, which overflows a buffer and begets undefined behaviour, and then tries to check the length, but the check comes too late – the program’s behaviour is already undefined, and the compiler, knowing so, may remove the check.
The code in orange is slightly less wrong – the check is actually well-defined, because unsigned integer addition is defined as addition modulo 256^sizeof(size_t) – however, it checks the wrong thing, because if an attacker puts, say, len=(size_t)-sizeof(unsigned int), then len+sizeof(unsigned int) = -4+4 = 0, and the check will succeed and copy a huge amount of data off the end of the buffer, causing problems. I repeat, the red and orange codes are evil – use the blue or turquoise code unless you are introducing a backdoor.

 

Note that in these examples, read_uint is assumed to be a macro/function that reads sizeof(unsigned int)=4 bytes from a char* as an unsigned integer (this isn’t the same as *(int*)packet, because of alignment and endianness issues) – there does not seem to be any standard here. Assuming data is big-endian (as in most network protocols), it can be defined as:

inline unsigned int read_uint(char * p) {
  unsigned int r = 0;
  for(int i=0;i<4;i++) // casting because char is signed.
    r = ((unsigned int)p[i]) + (r<<8);
  return r;
}