PSA: How to safely check buffer sizes in C

A common task in C is checking whether some data fits in a buffer. Unfortunately, doing it can be quite tricky, as http://lwn.net/Articles/278137/ shows. Going by the standard, the problem is that in C creating a pointer beyond the end of an array is undefined behaviour – which means the compiler can do whatever it wants – including sending your customers’ credit card numbers to Russia (actually that’s quite a probable outcome, because, as the link shows, the compiler may introduce a security bug, which will be exploited and result in the credit card numbers being sent).

 

First, I’ll like to say that C does allow you to create a pointer to exactly-the-end-of-an-array (say, if A is an array of length 5, then you can create a pointer to (the non-existent) A[5], as long as you don’t dereference it).

 

The right thing to do is actually quite simple,

size_t buffer_size = ...;
char * buffer = malloc(buffer_size);
if(!buffer) return NULL; // or something
...
unsigned int overhead = calculate_overhead_from_data();
unsigned int alleged_size = ask_data_what_the_size_is();
if(buffer_size < overhead ||
   buffer_size - overhead < alleged_size)
  fail();

Or, say, when getting a packet:

// assume packet_end is the actual end of the packet
if(packet_end - p < 2*sizeof(unsigned int)) fail();
unsigned int len = read_uint(p);
p += sizeof(unsigned int);

unsigned int another_field = read_uint(p);
p += sizeof(unsigned int);


if(packet_end - p < len) fail();
DO_SOMETHING_WITH_REST_OF_PACKET

The things you must NEVER do are this:

// CODE IS WRONG AND EVIL
// DO NOT USE UNLESS YOU ARE INTENTIONALLY INTRODUCING A BACKDOOR
// assume packet_end is the actual end of the packet
if(packet_end - p < 2*sizeof(unsigned int)) fail();
unsigned int len = read_uint(p);
p += sizeof(unsigned int);

// WRONG!
unsigned int another_field = read_uint(p);
p += sizeof(unsigned int);
if(p + len > packet_end) fail();

// ALSO WRONG!
if(packet_end - p < len + sizeof(unsigned int)) fail()

DO_SOMETHING_WITH_REST_OF_PACKET

The code in red tests for overflow by first creating a pointer overflowing a buffer, which overflows a buffer and begets undefined behaviour, and then tries to check the length, but the check comes too late – the program’s behaviour is already undefined, and the compiler, knowing so, may remove the check.
The code in orange is slightly less wrong – the check is actually well-defined, because unsigned integer addition is defined as addition modulo 256^sizeof(size_t) – however, it checks the wrong thing, because if an attacker puts, say, len=(size_t)-sizeof(unsigned int), then len+sizeof(unsigned int) = -4+4 = 0, and the check will succeed and copy a huge amount of data off the end of the buffer, causing problems. I repeat, the red and orange codes are evil – use the blue or turquoise code unless you are introducing a backdoor.

 

Note that in these examples, read_uint is assumed to be a macro/function that reads sizeof(unsigned int)=4 bytes from a char* as an unsigned integer (this isn’t the same as *(int*)packet, because of alignment and endianness issues) – there does not seem to be any standard here. Assuming data is big-endian (as in most network protocols), it can be defined as:

inline unsigned int read_uint(char * p) {
  unsigned int r = 0;
  for(int i=0;i<4;i++) // casting because char is signed.
    r = ((unsigned int)p[i]) + (r<<8);
  return r;
}

On Exceptions

Another programming subject that tends to confuse people is Exceptions. It appears that each programming language (including *each* machine code architecture) and each operating system has its own subtly-incompatible version.

On the surface, the idea appears quite simple: sometimes exceptional situations occur, and the program must handle them. However, when you dig deeper, you find several similar but distinct ways of thinking about it. I’ll try to detail the ones I heard about. Note that people occasionally design one implementation but actually implement another!

 

Continue reading

On Pseudorandomness

It has been said many times that the easiest way to break a crypto-system is by breaking its random-number generator. This seems to have a basis in practice – the PS3[1] and Debian[2] are only 2 of the most famous serious flaws caused by one. An important cause for this is that the PRNG (for obvious reasons) aren’t really visible on the wire or in other places. Another important cause is that PRNGs are used to perform several different but related tasks in a confusing way that isn’t really documented anywhere.

I can’t do anything to help the former problem, but I’ll try to explain the latter.

I will use PRNGs to refer to both the cryptographically-secure ones and the non-cryptographically-secure ones together (rather then calling the cryptographically-secure ones CSPRNGs). This is because “cryptographic-security” is only one of the axes that make a PRNG useful (or secure) for a purpose.

Almost always, one prefers programs to be deterministic (if only because that makes debugging much easier). To help it, CPUs are typically written to behave in a highly deterministic manner (except when one uses shared-memory parallelism, but that’s a problem for another post). However, sometimes, this property can be sometimes a problem. A few cases when this occurs are:

1) Some algorithm works properly in the average case, but displays bad behaviour in the worst case, which is triggered either by being similar-in-structure to the program (as in quicksort on a sorted list) or by input intentionally crafted by an adversary (as in Hash-Table DoS[3]). Note that in some cases, typically when a program does some kind of Monte-Carlo simulation (i.e. estimates a distribution by sampling, e.g. fuzzing), it can return wrong results rather than just taking a long time to finish.

2) Often, one needs to generate unique identifiers for some purpose and does not care if the length is overly long. Cryptographic systems often require this to avoid replay attacks (another confusing thing that I am going to write a post about) and keymat-reuse issues (sometimes called many-time-pads).

3) One needs to generate a secret key that cannot be predicted by an adversary, perhaps even for a long amount of time – decades, in a few cases. (e.g. all crypto stuff – short-term authentication cookies to a browser game, long-term keys protecting Top Secret documents, and everything between them).

4) Stream Ciphers – if a completely pseudorandom stream is xorred with a secret, the secret can be recovered from the result only by these knowing the stream. This provides a fast, securely-implementable form of encryption.

 

There are probably more uses, but these are the most common.

 

It seems that there are four properties of PRNGs that are used in the application:

 

1) Weak Unpredictability – Adversaries can’t predict the PRNG’s output. However, the set of outputs may still be relatively small so a brute-force attack may be possible. This is quite similar to strong unpredictability detailed below.

2) Strong Unpredictability – Adversaries can’t predict the PRNG’s output or iterate over all possibilities. For practical reasons (e.g. backups), this is often divided into short-term and long-term unpredictability, where long-term secrets shouldn’t be stored in the clear on non-specially-protected devices. An annoying problem with this property is that a backdoor can make a PRNG predictable to these in the know while being random to everybody else – making checking for it impossible – and that is the classic failure mode of this property.

3) Uniqueness – The results are distinct in different runs. This is distinct from unpredictability – generating a single stream with a PRNG from a secret seed but not saving the current position when the system reboots, from example, can destroy Uniqueness while preserving Unpredictability. The classic failure mode is a guard asking the same challenge and accepting the same response every day – allowing eavesdroppers to simply repeat it.

4) Whiteness – The stream does not have visible internal correlations. This is the property typically checked by “randomness tests” (e.g. Diehard) and often confused with Pseudorandomness. This can be further divided into Classical Whiteness – the absence of obvious patterns, and Cryptographic Whiteness – which requires the absence of all computably-discoverable patterns, which is important in e.g. stream ciphers. The classic failure mode is a subtle internal bias – slowly leaking a secret over multiple ciphertexts.

 

These properties can be combined in various ways – a stream, for example, can be Unpredictable and Unique, both not Unpredictably Unique, for example when a professor generates an exam by randomly selecting a few questions from a relatively small and constant set. Concatenating streams creates a stream that has the union of the properties of the substreams, but not together.

 

Preventing worst-cases typically requires a small amount of (Classical) Whiteness that in adversarial scenarios is Weakly Unpredictable (of course, if one leaks the randomness via the algorithm results, then ze needs not to reuse it). Note that CBC nonce problems[4] basically require this.

Unique Identifers and replay prevention (of course) require Uniqueness and only Uniqueness.

Secret Keys require Unpredictability, and often also require Uniqueness.

Stream Ciphers require all 3 together – Unpredictably Uniquely White data.

 

Clocks and Counters provide Uniqueness, but not Unpredictability.

Hash Functions typically take a stream, combine its properties and add Whiteness – so H(secret||ctr++) has all 3 properties together – the secret gives Unpredictability, the counter gives Uniqueness, and the hash combines them and gives Whiteness. However, hash functions don’t generate Unpredictability or Uniqueness – H(current-time) is quite predictable and H(secret) isn’t unique.

Hardware RNGs often provide Unique Unpredictability, but often not Whiteness.

 

[1] http://events.ccc.de/congress/2010/Fahrplan/attachments/1780_27c3_console_hacking_2010.pdf

[2] http://lwn.net/Articles/282038/

[3] http://events.ccc.de/congress/2011/Fahrplan/attachments/2007_28C3_Effective_DoS_on_web_application_platforms.pdf

[4] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.61.5887&rep=rep1&type=pdf

Using Higher-Order Macros in C to implement Monomorphisation

One of C++-s most-often-cited advantages (over any language) is it ability to monomorphise abstracted calls via templates – to reuse code without the overhead of indirection and virtual method calls.

As a part of the language, the only mainstream language that supports this feature is C++, although several novel languages, like Haskell and Rust, support it too. This is claimed to account for much of its speed advantage over managed languages, such as Java, in abstracted code.

 

Because of this, I was disappointed when it turned out that I have to write my supposed-to-be-fast alpha-beta implementation in C for my software project in a generic manner, which would seemingly require a virtual call every node and probably an allocation (because different boards have different sizes).

I thought about this problem and I think I found a solution. Before presenting it, I would like to describe a few quite-common C preprocessor tricks.

 

Trick 1 – foreach in C:

This is a quite old trick that can be used to implement foreach in C – the oldest use I found is in BSD’s queue.h

I think the best way to introduce it is by example – which iterates over a list-style linked list.

#include <stdio.h>

struct cons {
  void * car;
  struct cons * cdr;
};

// C99 trick - the struct is freed at the end of the
//   containing block.
// NOTE: in C++, it's freed at the end of the containing
//   STATEMENT - which still works here.
#define CONS(car, cdr) (&(struct cons){(car), (cdr)})

#define FOREACH_CONS(cur, name) \
  for(cur = (name); cur; cur = cur->cdr)

// Assumes l is a list of nul-terminated strings and print them
//   concatenated
void print_all_members(struct cons * l) {
  struct cons * cur;
  FOREACH_CONS(cur, l) {
    printf("%s", (char*) cur->car);
  }
}

int main() {
   print_all_members(CONS("Hello", CONS(", ",
                     CONS("World!", CONS("\n", 0)))));
   return 0;
}

Here, the user provides a variable of the appropriate type as cur, and name should hold the input list and the macro does the rest.

 

Trick 2 – higher order macros:

If you pass the name of a functional macro to a C preprocessor macro, the last macro can pass it’s own parameters.

So:

#include <stdio.h>

#define HIGHER_ORDER(u) do { u(1+5,8+1); } while(0)
#define MULT(a,b) printf("%d\n", a*b)

int main() {
  // This expands to do { MULT(1+5,8+1); } while(0)
  // Which expands to do { printf("%d\n", 1+5*8+1); } while(0)
  HIGHER_ORDER(MULT);
  return 0;
}

Prints 42.

 

Now for the negamax implementation:

Textbook negamax/alpha-beta looks like this (credit: Wikipedia):

function pvs(node, depth, α, β, color)
    if node is a terminal node or depth = 0
        return color × the heuristic value of node
    for each child of node
        score := -pvs(child, depth-1, -β, -α, -color)
        α := max(α, score)
        if α ≥ β
            break (* beta cut-off *)
    return α

 

If your game implementation keeps the board internally, and cloning boards does not require allocations, you can implement FOREACH_CHILD in the following manner:

struct my_game {
  char board[WIDTH][HEIGHT];
};

// assume implemented
int my_eval_game(struct my_game const *);
struct my_game make_move(struct my_game const *, struct move);
// Standard iterator methods - assumes a sentinel move for
//   which is_move_valid returns false
struct move my_first_move(struct my_game const *);
struct move my_next_move(struct my_game const *,
    struct move cur);
bool is_my_move_valid(struct move);

#define MY_FOREACH_CHILD(CUR, GAME)                           \
  for(struct move _foreach_child_cur = my_first_move(GAME);   \
      is_my_move_valid(GAME, _foreach_child_cur) &&           \
      (CUR = my_make_move(GAME, _foreach_child_cur), 1);      \
      _foreach_child_cur = my_next_move(GAME,                 \
                                        _foreach_child_cur))

You can implement alpha-beta by basically copying from the textbook (in a separate header), like this:

#define ALPHABETA_IMPL(SELF, NODE, FOREACH_CHILD, EVAL_GAME) \
static int SELF(NODE *node, int depth, int alpha,            \
    int beta, int color) {                                   \
  if(depth == 0) return EVAL_GAME(node) * color;             \
  int score = INT_MIN;                                       \
  NODE cur;                                                  \
                                                             \
  FOREACH_CHILD(cur, node) {                                 \
    score = -SELF(&cur, depth-1, -beta, -alpha, -color);     \
    if(alpha < score) alpha = score;                         \
    if(alpha >= beta) break;                                 \
  }                                                          \
                                                             \
    /* return objective score if no moves were possible */   \
  return score == INT_MIN ? EVAL_GAME(node) * color : alpha; \
}

Note that I use INT_MIN as a sentinel score to avoid needing to check if moves were made. Then it can be used like this:

ALPHABETA_IMPL(my_minmax, struct my_game, FOREACH_CHILD_MYGAME,
  my_eval_game)

Which I think is quite elegant, actually.

 

Next: how I am planning to implement virtual calls.