What’s the Deal with SoC Verification?

The title of this article may sound like the start of a Seinfeld joke from the 90s, but it’s actually a serious question. Many people do not appreciate what makes a system-on-chip (SoC) different from other semiconductor devices. Many companies, especially in electronic design automation (EDA), toss around the term “SoC” without defining it or explaining why it’s such an important concept. What does this name mean and why is verification for SoCs different from verification for other chips? The definition of “SoC” is the best place to start. As the name implies, a “system-on-chip” is a complete system in a single package, most likely on a single die, although 3-D integrated circuits built from multiple dice are becoming more common. Essentially, the SoC combines functionality that used to be distributed across multiple chips and perhaps even some discrete devices. It‘s hard to think of a system that does not contain some sort of processor, so the practical definition says that the SoC must include at least one processor.

The most common SoC architecture consists of one or more embedded processors, some on-chip memory, additional functional units, and interfaces to standard buses and perhaps off-chip memory as well. Some sort of on-chip bus, bus fabric, or network-on-chip connects all the units together. Since the embedded processors run software, the complete SoC is really the chip plus the code that runs on these processors. Some SoCs have heterogeneous architectures in which multiple distinct processors-CPU, DSP, image processor, etc.-all run code tailored for their individual functions.

The presence of processors is the key to what makes SoC verification different from verification of other chips. Smaller, less complex chips, as well as many of the blocks within the SoC, can be verified effectively using a simulation testbench that provides data to the chip inputs and checks resulting data on the chip outputs. Traditional testbenches may be as simple as a framework that allows the user to provide a series of binary values to the inputs and check output results using a waveform viewer. Of course, such a manual setup can verify little of the intended functionality for a complex design.

A modern testbench-based verification environment automatically generates randomized stimulus for the chip inputs under control of user-specified constraints (rules) and checks the results of each test automatically. This is much more efficient than hand-writing individual tests with a traditional testbench. Several verification methodologies have been created to develop constrained-random testbenches in a uniform fashion and to permit limited reuse of testbench components. The best known of these is the Universal Verification Methodology (UVM) from standards organization Accellera.

A constrained-random testbench works well up to a point, but does not scale up to full-SoC verification. It is just too hard to exercise all the functionality to be verified only from the chip’s inputs. Further, although the SoC’s embedded processors are usually capable of running code in simulation, the UVM does not provide any guidance at all for coordinating the activity of the processors with that of the testbench. In fact, any UVM-based simulations run at the SoC level usually replace the embedded processors with bus functional models (BFMs).

The result of these limitations is that many SoC teams do minimal verification at the full-chip level. They verify that the blocks have been connected correctly and perhaps run a few simple tests to verify that each major block is functioning. They rarely run tests that string blocks together into real scenarios that are run by the SoC in operation. This “stitch and ship” approach carries high risk, since it never generates complex interactions among the blocks that could expose design bugs or demonstrate performance deficiencies. Memory conflicts, bus saturation and other issues that can arise in the SoC when multiple blocks share resources are highly unlikely to be found in block-level verification.

Considering that the SoC’s functionality is largely controlled by its embedded processors, it’s not surprising that a pure testbench does not suffice. Some verification teams recognize this and handwrite tests to run on the embedded processors. These tests are usually not connected to the testbench and not well integrated into the overall verification effort. Further, it’s prohibitively difficult for a human to handwrite multi-threaded tests that exercise significant amounts of SoC functionality concurrently. Of course, that’s what’s needed to wring out the corner-case bugs and performance issues.

To be fully effective, SoC verification must include automation of the tests running on the embedded processors. Software can generate multi-threaded test cases running on multiple embedded processors in simulation. The test cases stimulate and coordinate concurrent activities within the processors and within the testbench, stress-testing the SoC . The test cases must be self-verifying by generating randomized input data, calculating expected results from the inputs, and checking that the chip’s outputs in simulation match the expected results.

Naturally, the test-case generator must be provided with information on how the SoC is supposed to operate so that the test cases verify proper functionality and check for proper results. The best way to represent intended SoC functionality is with a set of graph-based scenario models. The graph captures the chip’s data flow paths and documents how to configure the blocks to perform all the operations that the SoC is designed to do. Constraints on the graph guide the generator and keep it from producing test cases that don’t reflect intended behavior.

The figure shows an example of just such a generator, the TrekSoC product from Breker Verification Systems. This software tool automatically generates self-verifying C test cases that run on the SoC’s embedded processors with no operating system or other production software required. These test cases are multi-threaded so that they exercise many parts of the SoC in parallel to stress-test the design before tapeout. A sophisticated scheduler within the generator keeps track of multiple real-world scenarios running concurrently, moving them from thread to thread in order to stress-test the SoC as much as possible.

TrekSoC software generates self-verifying C test cases to run on the SoC’s embedded processors.

Since some of the generated C tests will read data from the chip’s inputs or send data to the chip’s outputs, the “TrekBox” component connects to existing bus functional models (BFMs) in the testbench and coordinates activity within the processors and the testbench. Each C test signals when it is ready to receive or produce data, and TrekBox handles the actual data transfer. Source data can also be loaded into memory and results in memory can be checked without disturbing the SoC’s operation. The graph-based scenario models provide all the information needed to generate an unlimited number of multi-threaded test cases that verify the SoC.

In summary, SoCs are enabling the semiconductor industry to continue to meet its goals for better, smaller and faster chips. They are different from other types of chips, and so verification of SoCs must be different as well. Development teams must recognize that their world has changed in the SoC era or risk producing a chip with serious bugs or non-competitive performance characteristics. Automatic generation of multi-threaded, self-verifying C test cases is a fairly new but well proven approach. Teams embracing “SoC verification” to take advantage of this approach will produce better, smaller chips faster.