Designing embedded SoCs using older resistive technologies

When designing an SoC with a generic 32-bit MCU based on 0.18um (180 nm) processes with flash and a rich suite of analog and digital IPs, the authors found that the pre-route engines from current EDA tool vendors are tuned for smaller transistor node sizes and are not very good at the larger 180 nm geometries. Here are the steps they took to overcome such problems.

With the emergence of newer and faster technologies, we have seen a rapid increase in the number of complex designs that push CMOS transistor geometries to 90 nm and smaller dimensions. But designs based on larger dimensions are not disappearing. In fact, process technology nodes with 180nm and 250nm geometries are still considered “hot”.

If you consider yourself well equipped with the latest 90 nm EDA tool in your kitty, assuming it will be just as efficient at relatively conservative technology nodes such as 180nm, you might be in for a surprise. More so if the so-called “small design” requires seamless backward package-pin compatibility, has high frequency requirements, and targets fierce gross-margin numbers.

While the latest EDA tools offer features useful at any process node – such as signal integrity, design for manufacturability (DFM), and lithographic enhancements – they also include capabilities that take advantage of technology enhancements available only at the smaller node geometries, such as high-K oxide, copper metals, shrinking metal and poly pitch, and a higher number of metal layers, etc.

At smaller geometries, copper metal is used for its lower resistivity to ensure lower voltage signals are delivered reliably. So if your EDA tools have features designed to help you fabricate devices using copper, they might struggle if you pick a more conservative 180 nm node where aluminum is still used to lay metal routes, and so they are unable to help you with the issues relating to higher resistivity as it relates to the vias and contacts of the wire topology.

With EDA tools a designer has to depend on the accuracy and consistency of timing results from stage to stage (from placement to clock tree synthesis and then to routing). The consistency of results as it relates to net delays produced by the tools is heavily dependent on the accuracy of parasitic extractions performed.

Before designs are finally “detail routed”, all tools make approximations to predict net lengths, vias, and contacts, and hence the parasitic (RC) extractions. So the consistency of your timing results will vary with the parasitic extraction numbers. However, in 180 nm nodes, we have seen tools that have great difficulty in predicting the pre-route timing, leading to surprising results in the post-route stages.

The congestion in such designs introduces further inaccuracies in timing because during the trial routing stage commercial EDA tools are not very good at estimating the exact number of metal detours and hence the metal segments and vias/contacts a particular path might take once the design is finally routed. The mismatch in the number of vias/contacts from pre-route to post-route stage is in fact the cause of timing miscorrelation.

In this article we will outline some of the things we have done to either work around the limitations of existing EDA tools or use alternative design methodologies that do not depend on them.

Starting point: a standard 32 bit MCU. The design we worked on was a generic 32-bit MCU based on 0.18um (180 nm) process with flash and a rich suite of analog and digital IPs. Flash is often a bottleneck in uniform standard cell placement because of its huge size, which requires the use of a lot of net hops around it and hence the congestion at the corners and notches.

The MCU constituted a major portion of the total chip area. When added to the area required for analog components and memories, this left only about 55% of the chip area available for normal placement and routing. The CMOS technology used was based on a five-metal process with all aluminum layers.

Resolving the Routing Resource Crunch
As is the normal practice in the SoC design flow, the top layers in our design were used for power routing. But since the 180 nm technology node is more resistive to current flow and results in greater heat dissipation, the EDA tool’s power planner had to be programmed to make the grid more dense, increasing the number of metal layers for a given area to compensate. But with more of the available chip area taken up with a denser power grid, it left a smaller area for signal routing.

This left only the M3 layer (third metal layer) as a horizontal routing resource for signals. Due to the aforementioned gross margin target, the resulting design had a very high utilization goal; that is, it would require us to come up with ways to use the area that remained more efficiently. There were limitations on the placement of flash and other hard blocks due to backward pin compatibility requirements; that is, access to particular pin-outs limited where we could place and route our resources because of conflicts with those blocks.

As would be the case with any highly resistive technology, many of the custom routes we came up with had to use more than three layers for routing to compensate for stringent resistance requirements.

In addition to the many routing challenges, the combined effect of all these unfavorable factors would led to more and more timing miscorrelation between the pre-route to post-route stages.

Pre and Post-Route Timing surprises
At the Post-CTS Trial Route stage we first started noticing the congestion issue after the setup fixing stage, where we observe the following numbers:

Overflow: 27078 = 16483 (2.11% H) + 10595 (1.58% V)

However the timing scenario was quite manageable with a total negative slack (TNS) of ~5ns and a worst negative slack (WNS) of ~450ps.

We were able to safely ignore the above congestion number because of our experience with earlier designs done in 90nm and smaller. In smaller and in less resistive copper-based geometries, timing problems can be avoided because the router can be programmed to hop the metal intelligently by going through vias to either upper or lower layers. As a result, routing and timing is only marginally deteriorated. We knew we would have to avoid such situations at 180nm since the slightest of hops would result in lot of vias. Because the vias are laid down in less conductive aluminum, they are highly resistive and can prove catastrophic to timing.

But when we finally moved to detail-route the design, the timing scenario changed drastically. In spite of having met timing in the post-CTS stage, the post route timing showed a Total Negative Slack of 4000ns while Worst Negative Slack jumped to ~5ns. That is not what we usually see with the 90nm and smaller geometry designs.

What went wrong
Clearly something went drastically wrong in the routing. Usually such bad timing would happen because of the long detours in the routing networks we had created. So our next step was to run routing with the avoid_detour option for all nets in the chip layout. But that only helped us marginally. On further analysis of our design we realized that the trialRoute engine in our EDA tool suite is not very sophisticated in assigning the available routing resources to the nets. Rather than come up with a more intelligent way of dealing with potential routing congestion, it had instead estimated a straight connection. This left it to the tool’s detail-router engine to deal with the congestion. So to compensate and approximate a workable solution, it jogged and hopped to complete the routing, introducing many more vias – and higher resistance – in the process. This dysfunction led to mis-correlation in the timing . Figure 1 shows a trial-routed net:

Click on image to enlarge.

Figure 1: A trial-routed net

When detail-routed, the same net looked like this (Figure 2):

Click on image to enlarge.

Figure 2: The same net detail-routed

As is evident from the picture, the slew values on net degraded drastically, resulting in higher cell delays. On recording the RC values on trialroute and detailroute engines, the following results were reported:

Trial Route Estimation
Number of capacitance : 32
Net capacitance : 0.666722 pF
Number of resistance : 35
Total resistance : 1128.740422 Ohm

Actual Routing Results
Number of capacitance : 929
Net capacitance : 0.817383 pF
Number of resistance : 928
Total resistance : 8167.449317 Ohm

You should pay particular notice to the capacitance numbers above, which deteriorated only by 23% due to a net increase in metal layer length in the design of ~25%. While that was within reasonable and expected outcomes, what surprised us was that the resistance increased a whopping 8 times. This is a totally different outcome than we would have expected if we were using design rules for the smaller 90 nm node, since the copper metal interconnects there are much less resistive and no amount of hopping of the wires by the detail-routing engine would lead to results as bad as this.