The Wednesday keynote speech at the Design Automation Conference offered a strong argument for general-purpose graphical processing units (GPGPUs) as the best way to accelerate EDA and other compute-intensive applications. But whether GPGPUs will prove to be a better solution than more conventional multicore architectures is a difficult question to answer.
The speaker was William Dally, chief scientist at nVidia and professor of engineering at Stanford University. He’s a compelling speaker – this is one keynote that held the audience’s attention. His keynote was titled “The End of Denial Architecture and the Rise of Throughput Computing,” and that’s a pretty good summary of the content of the speech.
Few would disagree with Dally’s opening statement. Citing decreasing performance improvements in single-threaded processors, he noted that “we can’t afford to be in denial about the shift to parallelism.” All performance gains in the future will come from parallelism, he stated. And he argued that efficiency will come from “locality.” Moving a word across a die is very expensive in terms of energy, so it’s best to keep things local, he said.
Dally said that single-threaded processors are in “denial” about parallelism and locality. They provide two illusions. First, they try to exploit instruction level parallelism (ILP), which has limited scalability. Secondly, flat memory denies locality and provides the “illusion of caching,” which turns out to be inefficient if the working set of data doesn’t fit into the cache.
Dally then went on to argue that “latency-optimized processors” are improving in performance very slowly, while “throughput-optimized processors” are improving at 70 percent per year. He cited the nVidia GeForce, which has 240 scalar processors, as an example of a “throughput” processor. At this point, however, I could have used a clearer definition of these terms and some more examples of what he would include in either category.
Dally turned the discussion to EDA, and here is where it gets more controversial. He cited the obvious need for parallel computing platforms for EDA applications, but said that multicore chips comprised of 4 or 8 “latency-optimized processors” are a “slippery slope” that cannot come close to delivering the performance-per-watt of a “throughput-optimized” architecture with hundreds of processors. “Going parallel on a latency-optimized processor is besides the point. You’re not getting the gains of parallelism if you do that,” he said.
Why, then, are there so few EDA applications on GPGPUs today, and why are most EDA developers targeting what Dally would probably call “latency-optimized” multicore architectures? To get another perspective I asked Tom Spyrou, distinguished engineer at Cadence, whom I interviewed previously about his work in porting the Encounter Digital Implementation System to multicore platforms.
Tom noted that GPGPUs work better on a subset of parallel data problems, but most EDA applications have significant data manipulations, which limits the maximum speedup to a given level no matter how many CPUs are involved. 16-core CPUs are coming down in price and will soon be as cheap as today’s dual-core CPUs. GPUs, meanwhile, require a rewrite and some special coding skills.
On the other hand, as U.C. Berkeley professor Kurt Keutzer noted in a recent interview, if you want “manycore” (more than 32 CPUs) parallelism, GPGPUs are the commercial platforms available today. And programming environments are available. While the CUDA programming environment was developed by nVidia, OpenCL aims to create an open standard for programming GPGPUs.
My take is that EDA developers will write applications for whatever compute platforms engineers use. Whether future platforms will be based on “latency-optimized” multicore devices or “throughput-optimized” GPGPUs remains to be seen. It’s not just a question of performance-per-watt, or whether you have 32 processors or 500. The ultimate question is going to be the effective speedup-per-dollar.
But no matter which direction future compute platforms take, Dally is absolutely right about one thing – it’s time to give up “denial” and move forward into an era of parallelism.