Home > Community > Blogs > Industry Insights > arm cto at dac 2012 the truth about semiconductor scaling
Login with a Cadence account.
Not a member yet?
Create a permanent login account to make interactions with Cadence more conveniennt.

Register | Membership benefits
Get email delivery of the Industry Insights blog (individual posts).


* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

ARM CTO at DAC 2012: The Truth About Semiconductor Scaling

Comments(2)Filed under: EDA, Industry Insights, ARM, DAC, low power, verification, Power, stacked die, 3D, power management, 3D-IC, DAC keynote, microcontrollers, Cortex-M0, scaling, DAC 2012, Muller, Mike Muller, semiconductor scaling, validation, ARM CTO, energy harvesting, treshold, ARM1

As process nodes shrink, semiconductor scaling more or less follows the predictions of Moore's Law - but there are some surprising twists and turns. In a keynote speech at the Design Automation Conference (DAC 2012) June 5, Mike Muller, co-founder and CTO of ARM, compared the original ARM1 processor of 1985 to the recent ARM Cortex-M0 microcontroller to show what has and hasn't changed. He also talked about future technologies including 3D stacking.

The keynote was titled "Scaling for 2020 Solutions," and a video recording is available at the DAC web site. (Note: The video starts with the opening session at DAC June 5, which featured some acknowledgements and awards presentations. Keep advancing and you'll get to the keynote).

In preparation for the talk, Muller dug out the original documentation and layout files for the ARM1, from a design effort that started in 1983. This was no small task, as it involved finding a way to read Exabyte files, translate variable length records, and parse an obsolete layout format. However, Muller was able to produce a GDSII file from the documentation.

Have Things Really Improved?

The ARM1 was a 32-bit RISC processor with a simple architecture. Fast forward to 2012, and the Cortex-M0 is ARM's smallest, lowest power, and most energy efficient processor, and is particularly suited for analog/mixed-signal applications.

So what's changed? The ARM1 was produced in a 3 micron process, had 6K gates, and took 6 man-years to design. The Cortex-M0 is a 20nm processor that has 8K gates and took 11 man-years to design. The team size was about the same. But there's one big difference; it took 6 months to do layout for the ARM1, and 32 minutes to do layout for the M0.

"The transition from full custom to RTL is what's really changed in the design process, and it's the only thing that's really changed," Muller said.

Muller figured that 26 years (1985 to 2011) equals 13 process generations, and he noted that the M0 is about 1/10,000th the size of the ARM1. That's about right on target for Moore's Law. However, performance should double every two generations. That means the M0 should have 64X the performance of the ARM1, but alas, the true figure is closer to 16X (12.5 MHz for the ARM1, 200 MHz for the M0). Moreover, the 5V of the ARM1 should have gone to 8mV for the M0, but the M0 actually uses 950mV.

Muller went to his R&D people to find out why. As he reported, he was told that "what hasn't scaled is the threshold voltages, and they haven't scaled because of leakage. The threshold hasn't gone down and therefore the margin you have left for switching is very low." This, in turn, slows the expected performance increase.

The Verification Quandary

Another interesting comparison has to do with verification. Muller calculated that designers used about 2K CPU hours on ARM1 verification, and 1.5 million CPU hours on M0 verification. "Do the math, and that's about a factor of 3 million less efficient," he said. "And it's all driven by constrained random [test generation]." He acknowledged that today there's a better understanding that simulation cycles are far cheaper than silicon respins, and he noted that a complex chip like the Cortex-A15 really does need constrained random verification. "But all those CPU hours, I sometimes wonder - are they well spent?"

There was far more in Muller's keynote than a look at scaling since 1985. Looking at the future of 32-bit microcontrollers, he talked about 3D stacked die implementations, and said that "even simple systems are going to turn into stacks because you're going to pick the right technology for the right application and you join them together in a 3D stack." For example, you might have a layer for power management that has very low leakage, and "180nm is the right answer for that."

Muller sees two different kinds of low power designs in the near future. One type uses batteries, and another harvests energy from the environment (from solar energy, for instance). Harvesting is potentially infinite, but only the battery can run in "burst" mode when it's needed. The result - two very different power management challenges.

Muller also had a request for the EDA community. "I think we kind of got it wrong," he said, "because RTL isn't formal. I would like our designers to be able to design something once, push the button, recompile, and know that what comes out is what they intended." But formal technology is currently "on the side" and confined to verification. Muller would like to bring it into design.

With an eye to 2020, Muller concluded that "I think the future is a continuing explosion of devices, of form factors, of things embedded in your eye, into the concrete structures around you, in portable devices, and in large mainframe computers, all interconnected with services. And that's what takes you to a future where quality of life will become the big issue."

Richard Goering




By CZ Chen on June 7, 2012
Hints for EDA are, when Moore's Law is expanded as a scaling ruler in logarithmic scale: RTL synthesis has saved million layout hours; TLM verification (with formal-aware) has to save billion chip respin dollars.

By CZ Chen on June 7, 2012
Hints for EDA are, when Moore's Law is expanded as a scaling ruler in logarithmic scale: RTL synthesis has saved million layout hours; TLM verification (with formal-aware) has to save billion chip respin dollars.

Leave a Comment

E-mail (will not be published)
 I have read and agree to the Terms of use and Community Guidelines.
Community Guidelines
The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.