Software forensics for embedded systems developers

The authors describe the use of a multithreshold voltage (Multi-Vt) flow technique that does not require embedded SoC architecture changes and allows a designer to decide when to use Low-Vt cells, which have better timing but higher leakage power, and when to use High-Vt cells which have lower leakage but worse timing.

Minimizing leakage power in systems-on-chip (SoCs) has become a major priority for designers because it increases drastically in submicron process technologies, becoming a major proportion of power usage. Ther ***a***e are various design techniques to optimize dynamic power, such as power gating and dynamic voltage and frequency scaling (DVFS), but these require architectural changes that add to chip complexity, which you want to avoid in SoCs. Multiple voltage threshold (Multi-Vt) flow is the only technique that doesn’t require changes to the SoC architecture; it depends instead on how judiciously the designer uses Low-Vt cells. Low-Vt cells have better timing but higher leakage power; High-Vt cells have lower leakage but worse timing.

To minimize leakage power, Multi-Vt cells are used during the logical synthesis stage of the design (Figure 1 below). Since High-Vt cells have more delays, these cells are used where timing is relaxed, whereas Std-Vt and Low-Vt cells are used at timing-critical places. The expectation is always to meet timing with optimal area and power. The important point here is that priority is still given to timing as logic synthesis is done at the worst process voltage temperature (PV ***a***T), i.e. WCS-HOT (worst case timing at maximum temperature), where delay of the cells is maximum.  (In Figure 1 RTL refers to register transfer level). 

Figure 1 Traditional Synthesis Flow

As we move onto lower technologies, i.e. from 90nm to 65nm to 45nm technology, timing delays have decreased and hence the chip operating voltage is reduced to save power. This results in new effects such as temperature inversion, which leads to higher threshold voltage with decreasing temperature.

Thus the cells show higher delays at lower-temperature corner rather than at the higher tem ***a***perature. Since the timing corner for setup optimization is the one where delay of the cells is maximum, in this case the worst corner for setup timing optimization should be WCS-COLD instead of WCS-HOT. So optimizing design at WCS-HOT would not actually be timing clean at WCS-COLD (worst case timing scenario at minimum temperature). The PVT condition for different corners can be referenced from Table 1 below.

Table 1 PVT conditions for different corners

We share the results of one block (Cortex A5 Core) in 40nm technology in several case studies. Details of the design are shown in Table 2 below:

Table 2 Cortex A5 Core design

Case 1: Synthesis of the design done with WCS-HOT libraries (traditional corner) and output netlist was timing clean in WCS-HOT. Loading the same netlist with WCS-COLD libraries showed significant timing violations, shown in Figure 2(a) below.

Figure 2(a) Synthesis done at WCS-HOT corner

Case 2: Synthesis of the design done with WCS-COLD libraries (because of temperature inversion effect) and output netlist was timing clean in WCS-COLD. Loading the same netlist with WCS-HOT libraries, there were no timing violations, as shown in Figure 2(b) below.

Figure2 (b) Synthesis done at WCS-COLD corner