The Law According to Amdahl

I have always tried to be as much of a hardware guy as a software guy. Of course, when it comes to microcontrollers, there is a lot of overlap. But sometimes I am surprised that something common in one discipline is rarely heard in the other, at least by the same name.

One example is Amdahl’s law. It is common to hear designers of CPUs (especially in parallel systems) use the term, but it is relatively rare among my software colleagues. Maybe it is because the eponymous Gene Amdahl was a CPU designer.

There is a lot of debate about what the law really is and what it really means. The actual Amdahl paper that gave birth to it actually only describes the idea behind it, not the math. The law itself is simple and reasonably common sense (you can read a technical description of it on Wikipedia and a critique of how the phrase is commonly used). However, I’d like to paraphrase and generalize it to this form: The most improvement you can get by optimizing a part can be no greater than the contribution of that part to the whole.

You mostly hear about this in relation to parallel processing. If you have a program that takes two hours to run and one hour of that isn’t subject to parallel execution, then creating parallel execution for the rest can’t get the total runtime below one hour (and, in fact, can’t even get to one hour unless you simply do no processing). However, you can apply that same idea more generally.

If you think about it, this simple idea really applies to any sort of optimization. If it costs eight dollars to build a device and the voltage regulator costs 50 cents, the most savings you could expect from optimizing just the voltage regulator is 50 cents, and that would require total removal of the part. Most likely the savings will be less.

It seems to me that the word “law” is a poor choice here, though. It is more like a rule of thumb because sometimes systems have complex interrelationships and it isn’t always easy to see the impact of one thing on the whole clearly. For example, suppose you have this C code:

// each scalar is a byte here

It might not seem this takes very long by itself. Changing it to have parallel arrays instead of a structure may not seem especially productive and doesn’t seem as elegant:


But there might be a big benefit if I can change the associated initialization from this:

int i;
for (i=0;i<sizeof(dataset)/sizeof(dataset[0]));i++)

To this:


This could be faster if your runtime library uses native facilities to do fast block fills. However, that is highly likely; modern processors have lots of ways to work with blocks of memory better than pure C can manage. However, like anything else, you’d have to measure it to be sure. So even though changing the format of the data seem innocuous in the main loop, it can enable larger gains elsewhere. If you apply Amdahl’s law to the right part (that is, in this case, the entire part that uses the dataset structure), then it is still correct. But in real cases, it can sometimes be much harder to figure out all the related pieces.

Still, Amdahl’s law is a good rule of thumb, even beyond its original application in parallel computing. My point isn’t that every little bit doesn’t count and add up. But Amdahl’s law is a useful heuristic to suggest what parts of your code (or system) you should spend the most time on when optimizing.