In the early 2000s we hit a power "wall" and decided to scale it by putting multiple processor cores on a single chip. But the multi-core era is running into limitations, and it's time to start planning for a "new era" in which design innovation will fuel performance growth, according to Design Automation Conference (DAC 2012) keynote speaker Joshua Friedrich (right), senior manager of POWER technology development in IBM's Server and Technology Group.
Friedrich gave the first part of a two-speaker keynote titled "Designing High Performance Systems-on-Chip" June 6. Brad Heaney, Intel Architecture group product manager, gave the second part, which showed how Intel developed the 3rd Generation Intel Core Processor (code named Ivy Bridge). Both speeches are available in a video recording at the DAC web site. This blog post focuses on Friedrich's talk.
While multi-core architectures are providing tremendous benefits, it's important to realize that multi-core performance growth will face limitations in the near future, Friedrich said. "This will require designers to innovate more rather than just include more cores in a design in order to make effective use of the transistors Moore's Law will give us," he said. He spoke of three areas that need innovation now - the hardware/software boundary, heterogeneous IP for processor subsystems, and emerging system-level technologies.
The Good Old Days
Friedrich started his talk with a look at the "good old days" of Denard scaling, which provided exponential single-threaded performance growth into the mid-2000s. All transistor dimensions went down by a predictable factor at each node, operating voltage went down, and frequency went up. During this time IBM went from the 1 GHz POWER4 processor to the 5 GHz POWER6 (produced in 65nm SOI technology).
But "physics brought this era of single-threaded performance to a close," Friedrich said. "Somewhere around the 90nm node we were no longer able to keep scaling the oxide thickness, and therefore couldn't drop the voltage further. Passive power was on pace to exceed active power. Something had to change and it did - frequency stepped back, and single-threaded performance growth slowed greatly."
Since there was no longer much value in merely shrinking dimensions of transistors, new innovations such as low-k interconnects, high-k metal gates, eDRAM, and strained silicon appeared. But the shift was underway, Friedrich said, from single-threaded performance and frequency to a new focus on multi-core design and throughput. IBM in fact led the charge with the dual-core POWER4, which was introduced in 2001. More recently, the POWER7 achieved a 5X increase in throughput per socket by increasing cores from 2 to 8, with only slightly improved thread performance.
Limitations of Multi-core
In spite of multi-core advancements, Friedrich said that "some fundamental limiters are beginning to surface that indicate that the growth we can expect to achieve from multi-core will begin to slow in the future." These include:
- Rising lithography and processing costs
- Yield challenges
- Power is scaling more slowly than area, making it difficult to power additional cores
- Socket bandwidth limitations means that processors are often stalled
- Inherent limitations in software parallelism means many applications get limited benefits
Friedrich didn't directly mention the difficulty of programming multi-core devices, which many people see as the biggest obstacle.
So what's needed to get the best use out silicon? "We believe the answer lies in a new era of system-focused performance growth driven by designer innovation rather than simply by technology," Friedrich said. He identified three key areas of focus as listed below.
- System-level technologies including 3D packaging, silicon photonics, FPGA accelerators with low-latency connections, and heavy use of flash and solid state drives in mobile consumer devices.
- Integration of heterogeneous IP with processors. This could involve the integration of I/O subsystems, including elements such as SAS controllers, PCIe links, and Ethernet adapters. It could also include voltage regulation.
- Innovation along the hardware/software boundary. This includes techniques like dynamic code optimization, which profiles hardware and provides real-time feedback to software to identify areas that can be recompiled or re-optimized to yield additional performance gains.
Friedrich had much to say about how EDA tools can help. One way is to help designers manage complexity, especially with new technologies like double patterning or 3D packaging. Another is to leverage statistical timing rather than relying on multiple fixed-corner timing runs. Friedrich also called for leveraging sequential synthesis techniques like retiming.
Processor design, Friedrich said, needs a methodology that "combines the best elements of an ASIC style design approach alongside the traditional elements of processor design." Such an approach is already underway at IBM, which is shifting away from hand-placed, hand-routed designs to "more of a synthesis based approach," he said. IBM has been able to limit the number of custom blocks and to reduce the number of partitions. "In our most recent design we reduced the number of blocks by about 30% and we plan to achieve a 5-10X reduction over the next two generations in the number of hierarchical partitions," he said.
"We need EDA tools to provide more productivity to enable designer innovation, so our focus can shift away from technology and implementation and towards creating features and functions," Friedrich concluded.
Industry Insights blog posts about DAC 2012
ARM CTO at DAC 2012: The Truth About Semiconductor Scaling
DAC 2012 Panel - Can One System Model Serve Everybody?
DAC 2012: EDA Industry Celebrates 10 Years of OpenAccess
TSMC-Cadence Collaboration Helps Clarify 3D-IC Ecosystem
Gary Smith at DAC 2012: Multi-Platform Design and the $40M System on Chip