Fixing concurrency defects in multicore design

There are compelling reasons to move to multicore processor designs, but the risk is that doing so introduces the risk of concurrency defects in the software. These are easy to introduce – even apparently innocent code can harbor nasty multithreading bugs – and notoriously difficult to diagnose and eliminate when they occur.  GrammaTech’s Paul Anderson presents some techniques to mitigate the difficulties of finding and eliminating bugs.

Having reached the limits of performance gains that can be realized from miniaturization and integration, microprocessor manufacturers are more and more frequently choosing multicore processors as the best approach to increased performance in computationally intensive embedded systems.

However, realizing the potential performance gains with multicores has a cost. Software written to be single-threaded for a single core processor will realize little or no performance benefit when executed on a multicore processor. To take advantage of the cores it must be rewritten or adapted to use multithreading. 

The key challenge is to keep the cores busy as much as possible, while ensuring that they coordinate access to shared resources properly. Unfortunately writing such code is much harder than writing single-threaded code. Concurrent code is vulnerable to entirely new kinds of programming defects, such as deadlocks or race conditions, and when these occur, they can manifest in ways that are difficult to diagnose. Traditional techniques for finding and eliminating concurrency errors may be ineffective.

A principal reason concurrency bugs are so difficult is because there is an enormous number of ways in which the events in the threads can be interleaved when those threads execute. As the number of threads or instructions increases, the number of interleavings increases exponentially. If thread A executes M instructions and thread B executes N instructions, there are N+MCN possible interleavings of the two threads. For example, given two trivial threads with ten instructions each, there are 184,756 possible interleavings of those instructions. Even with very small programs it is clear that it is next to impossible to test all combinations. Secondly, even if it is feasible to identify a single interleaving that leads to a failure, it can be very difficult to set up a repeatable test case that uses that particular interleaving because scheduling of threads is effectively nondeterministic. Consequently debugging concurrent programs can be very expensive and time consuming.

Insidious race conditions
Of all the concurrency defects that can occur, race conditions are probably the most pernicious. Race conditions have been responsible for numerous system failures over the years. The famous case of the Therac-25 radiation therapy machine featured a race condition that was responsible for the deaths of several patients [2]. Similarly, the 2003 Northeast blackout was exacerbated by a race condition that resulted in misleading information being communicated to the technicians [3].

There are several different kinds of race conditions. One of the most common and insidious forms – data races – is the class of race conditions involving concurrent access to memory locations. A data race occurs when there are two or more threads of execution that access a shared memory location, at least one thread is changing the data at that location, and there is no explicit mechanism for coordinating access. If a data race occurs it can leave the program in an inconsistent state.

An example of a data race is shown in Figure 1. A manufacturing assembly line has entry and exit sensors, and maintains a running count of the items currently on the line. This count is incremented every time an item enters the line, and decremented every time an item reaches the end of the line and exits. If an item enters the line at the same time that another item exits, the count should be incremented and then decremented (or vice-versa) for a net change of zero. However, normal increment and decrement are not atomic operations: they are composed of a sequence of individual instructions that first load the value from memory, then modify it locally, and finally store it back in memory. If the updating transactions are processed in a multithreaded system without sufficient safeguards, a race condition can arise because the sensors read and write a shared piece of data: the count. The interleaving in Figure 1 results in an incorrect count of 69. There are also interleavings that can result in an incorrect count of 71, as well as a number that correctly result in a count of 70. 

Figure 1: Race condition leads to incorrect count of items on an assembly line.

This example neatly illustrates one of the properties of data races that makes them hard to diagnose: the symptom of corruption may only be observable long after the data race has occurred. In this case the fact that the count is wrong may only be noticed when the variable has the incorrect value of -1 after all items have exited the assembly line.

A widely-held belief is that some instances of data races are benign and can be tolerated. However this is not true except in rare circumstances. The C standard states unambiguously that compilers are allowed to assume that there are no data races, so optimizers can and do make transformations that are valid for single-threaded code, but which introduce bugs when there are apparently benign data races. These are subtle effects – even experienced programmers are regularly surprised by them. 

For example, consider a data race involving a single variable that is read by one thread and written to by another, and where the thread that reads the value does not care whether it sees the value before or after it has been changed by the other thread. It is in fact possible that the reader gets a value that is neither! One way in which this might happen is if the hardware is not capable of changing the value in one atomic step, such as a 64-bit variable on a processor that only supports 32-bit memory operations. The read and write will each take two instructions, so if the instructions in each thread are badly interleaved then the reader might see 32 bits of the old value and 32 bits of the new value. There are other reasons why this might happen – the compiler is allowed to assume that there are no data races, so it is free to generate code that relies on that assumption, and such code may introduce defects when a data race is present.

There are many other cases where apparently benign data races can lead to serious defects due to compiler optimizations. Boehm gives an excellent explanation along with several compelling examples [1].

It is tempting for programmers to attempt to avoid data race problems by using the volatile keyword. However that construct was not designed for that purpose, and it is also one of the most commonly misunderstood aspects of the C language. Compiler support for volatile is spotty at best, so it is unwise to rely on using it. The only way to be confident that a data race is benign is to inspect the machine code that was generated by the compiler to be sure it is doing what the programmer intended. This is something that should be left to only the most trusted and competent experts.

The best-known example is the singleton pattern. This is a commonplace idiom where the program maintains a reference to a single underlying object, and a Boolean variable encodes whether it has been initialized. This pattern is also known as “lazy initialization”.

The following code is an example of the pattern.

if (!initialized) {
    object = create();
    initialized = true;
}
… object …

This code is adequate for a single-threaded program, but it is not multiple thread safe because it has a data race on the variable named initialized. If called by two different threads there is a risk that both threads will see !initialized at virtually the same time, and both will call create(), thus violating the singleton property.

To make this thread safe, the natural approach is to protect the entire if statement with a lock. However, acquiring and releasing locks is expensive, so programmers try to avoid the expense by using the “double-checked locking” idiom – a check outside the lock and another inside. The inner check is there to confirm that the first check still holds after the lock has been acquired.

if (!initialized) {
    lock();
    if (!initialized) {
        object = create();
        initialized = true;
    }
    unlock();
}
… object …

Superficially this looks like it will suffice and indeed it will as long as the statements are guaranteed to execute in that order. However, an optimizing compiler may generate code that switches the order of object = create() and initialized = true. After all, there is no dependence between those two statements. In that case, if a second thread enters this code any time after the assignment to ‘initialized’, that thread will then use the value of object before it has been initialized.

Consequently, to achieve high levels of assurance, the best practice is to find and remove all data races. Bug finding and remediation is a standard activity, but the unique properties of concurrency bugs mean that the usual techniques can be ineffective. 

Leave a Reply

Your email address will not be published. Required fields are marked *

*