(Are you tired of waiting for “make” to find the one C file that you changed? Do you want to take advantage of your multi-core system to compile in parallel but are afraid that the build won’t be reliable? Do you perform a “clean build” before checking in because you don’t trust your build system? You’re not alone. Learn from Mike Shal, who is teaching a class (ESC-200) at the Embedded Systems Conference about why “make” fails its basic promise of a working and scalable build system, why the prevailing wisdom found in “Recursive Make Considered Harmful” is wrong, and what you can do about it.)
This article will focus on one simple topic: how make and related build systems inhibit development in large-scale C projects. In this context, “large-scale” is in the 10,000 to 100,000 file range; enough for an entire embedded system. The primary issue with these build systems is that they do not scale well.
This makes the development cycle slower and slower as the project gets larger and larger, resulting in wasted developer time. Unfortunately this problem cannot be solved merely by throwing more hardware at it. We need to perform a thorough investigation into why these build systems do not scale, and what can be done to fix it.
During this investigation we will look at some of the history of build systems, the performance of current build systems, and even peak under the hood to see why they perform the way they do. The inevitable conclusion of this investigation is that make, and any build system based on its principles, must not be used for development.
Here are two examples to see what problems we’ll be looking at in this discussion. First, open up a project that you’re working on, and do the following:
1. Perform a build, to make sure everything is up-to-date.
2. Without changing anything, time how long it takes to perform another build (this is called a null build).
How long does Step #2 take to complete? This is all wasted time. Nothing has changed, so nothing should be done. If your project is small then the build may be fast now, but how fast will it be when the project doubles or quadruples in size?
As a second test, try the following if you are able to in your build environment:
1. Intentionally leave out a dependency that you know will be used (for example, you could leave outa dependency on an automatically-generated header file).
2. Perform a build.
Does the build system warn you or produce an error message that a dependency is missing? Does it silently succeed as if nothing is wrong? Will it work if the build is executed serially, but fail randomly if run in parallel?
Both of these problems have one common result: developer time, YOUR time is wasted. Whether this is the time spent waiting for the build system to decide what to do, or whether it is time spent debugging a problem in the build description files, you are spending time working around the build system rather than working on your project. You should have that time back.
The term “build system” can mean many things in diffierent contexts; from full configuration and installation , to continuous integration . For this discussion, the term “build system” will have a very limited scope.
The build system is the piece of software that a developer executes after editing source files in order to bring the project upto date.Examples of build systems in this context include make  and SCons , among many others.
This article will focus only on a small subsection of the build system problem – we are going to look at it from the perspective of a developer going through the Edit-Compile-Test cycle. The build system is responsible for making the Compile step more efficient, so that the developer’s overall productivity can be increased. The total time it takes for the Compile step to execute can be broken down into two parts:
Timetotal = TimeBuildSystem + TimeCompiler/Nprocessors
An ideal build system and compiler would take no time to execute, resulting in a natural lower bound of Timetotal=0. The developer will be more productive as the build system gets closer to this lower bound.
The TimeCompiler variable is the time it takes for the sub-programs to execute. These sub-programs are spawned by the build system, which for a C project will typically be the C compiler.
This time will also include the time for the linker, archiver, and any scripts or other programs that the build system may spawn. The individual compiler steps are generally independent from one another and so they can be executed in parallel among N processors, subject to the constraints of the DAG.
Minimizing the TimeCompilervariable is a worthwhile goal, but that is the job of compiler writers. It is not the focus here. This paper is only concerned with minimizing the TimeBuildSystem portion of the total build time.
The TimeBuildSystem variable is the time it takes the build system to determine which files need to be re-compiled. For example, in make this would be the time to read in Makefiles and generated dependency files, stat() source and derived files, and walk through the DAG to compare timestamps among dependent files.
The build system will then invoke the compiler multiple times to bring files up to date. Note that the build system itself is I/O bound since it must read dependency files and check file timestamps or signatures, which all involve reads from the disk.
This is why only the TimeCompiler variable is divided by the number of processors. Although one could attempt to solve the Compiler part of the equation by buying more hardware or distributing the build among multiple machines, this will not significantly improve the build system part of the equation. For that we need to look at how the build system processes the DAG and decides what to build.
Perhaps the most common build system in use today is still make. Make worked reasonably well enough for small projects that it gained quick acceptance. For larger projects, where defining the build description in separate modules is desirable, a common practice was to use make recursively.
The “recursive make” pattern is where each Makefile recursively invokes other make processes for each subdirectory in the project. Since each individual make process doesn’t know what the others are doing, this resulted in a number of problems, as documented in Recursive Make Considered Harmful (RMCH) .
Unfortunately, RMCH did not go far enough. This led to an unfortunate pattern in recent years, in that many new build systems are developed withthe idea that a global DAG view is necessary to function properly.
Further, they assume that a global DAG view is suficient for ensuring the build is correct. Neither is the case. In fact, in this paper we will see that:
1. Loading the entire DAG is a waste of time, and unnecessary.
2. A build system needs to check the sub-commands’ inputs and outputs to ensure a correct build.
Later we will look at the performance problems associated with a global DAG, and we shall see an alternative to the global view known as the “partial DAG” view. Finally we will see why sub-commands must be checked.