First, the CPU usage will jump up and down, and so will the disk activity - but you'll rarely see them both high at the same time. That's because the compiler typically operates in three phases on a source file:
- It reads the source file and all the headers. This is disk-intensive but not CPU-intensive.
- Then it does all the usual compiling stuff like lexical analysis and parsing and code generation and optimizing. This makes heavy use of the CPU and RAM, but doesn't hit the hard disk much.
- Then it writes the object file out to disk. Again, the disk is very busy, and the CPU just waits around.
The answer to this is parallel builds. Common build tools like make and jam offer command line options to compile multiple files in parallel, using separate compiler instances in separate processes. That way, if one compiler process is waiting for the disk, the Linux kernel will give the CPU to another compiler process that's waiting for the CPU. Even on a single-CPU, single-core computer, a parallel build will make better use of the system and speed things up.
Second, if you're running on a multi-CPU or multi-core system and not doing much else, even at its peak, CPU usage won't peg out at the top of the panel. That's because builds are typically sequential, so they only use one core in one CPU, and any other compute power you have is sitting idle. If you could make use of those other CPUs/cores, things would go faster. And again, the answer is parallel builds.
Fortunately, the major C/C++ build systems support parallel builds, including GNU make, jam, and SCons. In particular, GNU make and jam both offer the "-j X" parameter, where X is the number of parallel jobs to compile at the same time.
The graph above shows what I would generally expect the results of parallel builds to be on a particular hardware configuration, going from left to right.
- When running with one compile at a time, sequentially, system resources are poorly utilized, so a build takes a long time.
- As the number of compiles running in parallel increases, the wall time for the build drops, until you hit a minimum. This level of parallelization provides the balanced utilization of CPU, disk, and memory we're looking for. We'll call this number of parallel compiles N.
- As the number of compiles passes N, the compile processes will increasingly contend for system resources and become blocked, so the build time will rise a bit.
- Then as the number of parallel compiles continues to rise, more and more of the compile processes will be blocked at any time, but roughly N of them will still be operating efficiently. So the build time will flatten out, and asymptotically approach some limit.
A Brief Aside On Significance
In any physical system, there's always some variation in measurements, and the same is true of computer benchmarks. So an important question in this kind of experimentation is: when you see a difference, is it meaningful or just noise?
To answer that, I ran parallelized benchmarks on Valentine (a two-core Sony laptop) and Godzilla (an eight-core Mac Pro). In each case, the Linux kernel was built twenty times with the same settings. Here are the results:
- Valentine, cached build, j=3. Average 335.91 seconds, standard deviation (sigma) 2.15, or 0.64% of the average.
- Valentine, non-cached build, j=3. Average 340.09 seconds, standard deviation 4.22, or 1.24% of the average.
- Godzilla, non-cached build, j=12. Average 67.82 seconds, standard deviation 0.54, or 0.79% of the average.
No comments:
Post a Comment