You don't hear much about RTL synthesis or design for test (DFT) these days, probably because many people think these are long-solved problems. But according to speakers at the Cadence Front-End Design Summit Dec. 5, 2013, both RTL synthesis and DFT are rapidly evolving to keep up with the demands of advanced IC process nodes and "Giga-gate" chips.
Ankush Sood (right), R&D director at Cadence, gave a presentation titled "Addressing Physical Challenges Early in RTL Synthesis." He talked about the need for "physically aware" synthesis, the technologies and optimizations it requires, and synthesis capabilities needed for advanced nodes, such as layer-assignment optimization.
Mike Vachon, engineering group director for Encounter Test at Cadence, offered a presentation titled "Scalable Test Methodology for Large SoC Designs." After a brief review of Cadence Encounter Test products, he discussed test approaches for Giga-gate designs including distributed processing for automatic test pattern generation (ATPG), test point insertion, and hierarchical test using IEEE 1500 based wrappers.
The one-day FED Summit also included speakers from Cisco, Qualcomm, Texas Instruments, and Omnivision, in addition to Cadence R&D. Presentations will be available at a later time.
Let's get physical
Sood's talk was aimed at showing why physically aware synthesis is needed, and why it has to occur earlier in the design flow than it has so far. "We have been talking about physical synthesis for 20 years, but a lot of it has been about physical optimization, not really physical synthesis," he said. "We want to make the entire flow physically aware from the first time you try to optimize anything to the point of handoff."
So why the need for early physically aware synthesis? One reason is that wires dominate delays at 45nm and below, and wire-load models become increasingly inaccurate. Another important reason comes down to cost. "Every extra dollar in the front end can save ten dollars later on," Sood said. "You can spend a lot less time trying to optimize later."
The Cadence RTL Compiler 13.1 supports GHz performance with power savings, congestion optimization, and tight correlation to the Encounter Digital Implementation System. The release includes physically aware structuring, physically aware mapping, and physically aware DFT. The release also supports a hierarchical flow using interface logic models (ILMs).
One key technology is cluster placement. Sood showed a comparison of early cluster-based placement to the detailed placement in Encounter, and the correlation appeared to be very close. Cluster-based placement, of course, runs much faster.
Sood discussed case studies in which:
- Physically aware structuring on a congested SoC improved total negative slack, area, wire length, and power
- Physically aware mapping improved worst negative slack, total negative slack, and congestion
- Physically aware synthesis on a CPU core merged registers into multi-bit cells in order to lower power
In his discussion of advanced nodes, Sood noted that different metal stacks have different resistance and capacitance characteristics. Designers need to predict where the interconnect will go. In one comparison he showed, M1 resistance/length (ohms/micron) was 20.56 while M9 resistance/length was 0.05. Conclusion: Buffering and gate optimization is not enough—you need to optimize wire topology and layer assignment as well.
Other requirements that Sood cited for advanced node synthesis include leakage optimization, improved slew degradation estimation, and advanced on-chip variation (AOCV) support.
DFT for Giga-gate designs
Mike Vachon (left) focused on test generation for very large designs. He noted that Cadence is now seeing designs as large as 300 million instances, and designs of that size drive some new requirements. DFT is not only about coverage and data volume, but also tool capacity and run time—"challenges we haven't faced in the past when designs were much smaller."
Many design teams run ATPG flat—but for large designs, run times are becoming increasingly unmanageable, stretching into days or weeks. One approach is to wrap and isolate cores using the IEEE-1500 Embedded Core Test standard. In this way, engineers can generate patterns for one core at a time, and do some interconnect testing at the top level.
However, ATPG turnaround times will still be a challenge. Cadence offers a distributed (parallel processing) capability for ATPG that can make a big difference. Vachon showed how four CPUs can achieve a 3.7X speedup compared to one CPU, and how 16 CPUs can achieve a 14.3X speedup. A 32-way CPU mode, combined with pattern compaction, was able to achieve a nearly 50% pattern reduction.
Test point insertion can reduce pattern count and improve coverage—but it also adds area. A capability called Random Resistant Fault Analysis (RRFA), targeted at logic built-in self test (LBIST), looks at portions of a design that would be difficult to test with random patterns. It then suggests locations for inserting test points.
Finally, Vachon talked about the importance of hierarchical test generation. If you have two identical cores, you can generate the same test patterns once and apply them to both instances. To achieve this result, Cadence uses IEEE 1500 core-based testing with hierarchical test compression. Vachon showed how the hierarchical flow works within the RTL Compiler cockpit and the Encounter Test ATPG product.
Hierarchy is a choice. "For those who want to run very large designs flat, we're ensuring that our run-time capacity is continuing to scale," Vachon concluded. "But we are also investing heavily in hierarchical test to allow customers to decompose test problems with very large designs, and to scale back the complexity and run times."