The 32nm and 28nm process nodes, the most advanced nodes currently in production, pose formidable challenges in complexity, power management, variability, and manufacturability. A recent ARM TechCon paper authored by Cadence and Samsung described a methodology that can resolve those challenges. And it's not just theoretical - the paper also showed how the methodology was applied to a groundbreaking HD digital camera system-on-chip (SoC) developed by semiconductor startup Ambarella.
The paper is titled "Creating an Effective 32/28nm ARM SoC Design Methodology," and is authored by Moo Young Park, director at Samsung; Wei Lii Tan, senior product marketing manager at Cadence; and Ankur Gupta, director of product engineering at Cadence. Gupta presented the paper at ARM TechCon Oct. 25. (Note: Proceedings are available to conference attendees at the ARM TechCon web site).
First, a few words about the Ambarella A7L SoC, a chip that promises to usher in a new generation of HD video enabled digital cameras. Ambarella announced availability of the chip in September 2011, and Cadence and Samsung followed up Oct. 25 with an announcement of their collaboration on the chip. The A7L was designed using Samsung's 32nm low-power, high-k metal gate (HKMG) technology and ARM 32nm libraries along with the Cadence Encounter Digital Implementation System and Encounter RTL Compiler. The chip contains an ARM 1136, several million logic gates, and a number of high-speed mixed-signal blocks.
Working side-by-side, engineers from Cadence, Samsung and Ambarella were able to achieve a 95% power savings during power shutoff mode and a 60% average power savings over operation and sleep modes. The end result: an SoC that supports full 1080p HD H.264 video at 60 frames per second for fluid motion even during fast moving sports scenes, and can capture up to thirty 16-megapixel still images per second. For more details on the chip see Steve Leibson's recent EDA360 Insider posts.
A 32/28nm Methodology
Gupta began his presentation by noting 32/28nm challenges in several key areas. One is the complexity brought about by larger gate counts. Another is leakage power, although this can be reduced with HKMG technology. Timing variability becomes a big concern because of high interconnect and via resistance and variation, as well as the impact of stress on timing. To model variability, it is necessary to model not only individual transistor delays but to consider the placement of transistors near one another. Finally, design for manufacturability (DFM) becomes more challenging, and designers must cope with over 200 new routing rules.
The slide below shows a 32/28nm design methodology that can resolve these challenges. Many of the capabilities shown here were used to design the Ambarella chip. Some key aspects of this flow include physically-aware synthesis, clock concurrent optimization, design rule checking (DRC) and DFM aware routing, fast DFM analysis, and an ability to prune the number of corners that need to be analyzed during the design.
Cadence, Samsung, and Ambarella actually completed two tapeouts - the T32 test chip came first, followed by the A7L Media Processor. Gupta noted that the A7L represented several "firsts," including the first 32nm external customer design project done by Samsung and the first dual row I/O architecture applied to a 32nm process. To reduce chip size, Samsung developed a smaller pitch I/O power cell. To reduce leakage, a dual-Vt layout flow was used.
Synthesis, Power, Clocking, and More
Gupta then took a more detailed look at the methodology that was used to build the Ambarella chip. Key points included the following:
- Physical synthesis is very important at 32/28nm. Wiring needs to be aware of congestion and routing layers, and wire-load models don't support that.
- Power shutoff is a powerful technique for minimizing leakage but is complicated to implement. It requires additional logic such as isolation cells and always-on buffers. It's important to capture power intent. The Ambarella project used the Common Power Format (CPF) to do so.
- Clock tree synthesis should not be rigidly based on "zero skew." The T32 and A7L projects used a more flexible concept called "useful skew." Better yet is clock concurrent optimization, which was not available in time for these projects but is offered by Cadence today. This approach combines clock tree synthesis with physical optimization and provides significant power, performance and area benefits. A new Chip Design Magazine article has detailed information.
- Corners need to be "pruned" during design to avoid excessive analysis times. The T32 test chip had four multi-mode/multi-corner "views" while the A7L SoC had six. But there were around 20 signoff views. Timing violations at final signoff were fixed using an ECO script.
- Using higher routing layers at 32/28nm can help with timing closure. Higher layers have lower resistance and lower RC delay constants. In the T32 and A7L, higher routing layers were utilized to close timing for critical paths.
- Lithography analysis is needed at 32/28nm because even DRC-clean locations may not print correctly. Cadence worked with Samsung to identify lithography hot spots.
- Finally, advanced node support must extend to the custom/analog environment. "For advanced nodes, custom and library design is where the challenges start," Gupta said.
The result speaks for itself. The A7L, which is available today, met performance targets and achieved first-time success using Cadence digital implementation tools, the Samsung 32nm LP HKMG process, and ARM libraries and power management kit. It also achieved a significant power savings. Where Ambarella has gone before, others will surely follow.