Home > Community > Blogs > Industry Insights > dac report gpus or multicore for eda applications
 
Login with a Cadence account.
Not a member yet?
Create a permanent login account to make interactions with Cadence more conveniennt.

Register | Membership benefits
Get email delivery of the Industry Insights blog (individual posts).
 

Email

* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

DAC Report: GPUs Or Multicore For EDA Applications?

Comments(5)Filed under: Industry Insights, DAC, Multicore, parallelism, GPU

The Wednesday keynote speech at the Design Automation Conference offered a strong argument for general-purpose graphical processing units (GPGPUs) as the best way to accelerate EDA and other compute-intensive applications. But whether GPGPUs will prove to be a better solution than more conventional multicore architectures is a difficult question to answer.

The speaker was William Dally, chief scientist at nVidia and professor of engineering at Stanford University. He’s a compelling speaker – this is one keynote that held the audience’s attention. His keynote was titled “The End of Denial Architecture and the Rise of Throughput Computing,” and that’s a pretty good summary of the content of the speech.

Few would disagree with Dally’s opening statement. Citing decreasing performance improvements in single-threaded processors, he noted that “we can’t afford to be in denial about the shift to parallelism.” All performance gains in the future will come from parallelism, he stated. And he argued that efficiency will come from “locality.” Moving a word across a die is very expensive in terms of energy, so it’s best to keep things local, he said.

Dally said that single-threaded processors are in “denial” about parallelism and locality. They provide two illusions. First, they try to exploit instruction level parallelism (ILP), which has limited scalability. Secondly, flat memory denies locality and provides the “illusion of caching,” which turns out to be inefficient if the working set of data doesn’t fit into the cache.

Dally then went on to argue that “latency-optimized processors” are improving in performance very slowly, while “throughput-optimized processors” are improving at 70 percent per year. He cited the nVidia GeForce, which has 240 scalar processors, as an example of a “throughput” processor. At this point, however, I could have used a clearer definition of these terms and some more examples of what he would include in either category.

Dally turned the discussion to EDA, and here is where it gets more controversial. He cited the obvious need for parallel computing platforms for EDA applications, but said that multicore chips comprised of 4 or 8 “latency-optimized processors” are a “slippery slope” that cannot come close to delivering the performance-per-watt of a “throughput-optimized” architecture with hundreds of processors. “Going parallel on a latency-optimized processor is besides the point. You’re not getting the gains of parallelism if you do that,” he said.

Why, then, are there so few EDA applications on GPGPUs today, and why are most EDA developers targeting what Dally would probably call “latency-optimized” multicore architectures? To get another perspective I asked Tom Spyrou, distinguished engineer at Cadence, whom I interviewed previously about his work in porting the Encounter Digital Implementation System to multicore platforms.

Tom noted that GPGPUs work better on a subset of parallel data problems, but most EDA applications have significant data manipulations, which limits the maximum speedup to a given level no matter how many CPUs are involved. 16-core CPUs are coming down in price and will soon be as cheap as today’s dual-core CPUs. GPUs, meanwhile, require a rewrite and some special coding skills.

On the other hand, as U.C. Berkeley professor Kurt Keutzer noted in a recent interview, if you want “manycore” (more than 32 CPUs) parallelism, GPGPUs are the commercial platforms available today. And programming environments are available. While the CUDA programming environment was developed by nVidia, OpenCL aims to create an open standard for programming GPGPUs.

My take is that EDA developers will write applications for whatever compute platforms engineers use. Whether future platforms will be based on “latency-optimized” multicore devices or “throughput-optimized” GPGPUs remains to be seen. It’s not just a question of performance-per-watt, or whether you have 32 processors or 500. The ultimate question is going to be the effective speedup-per-dollar.

But no matter which direction future compute platforms take, Dally is absolutely right about one thing – it’s time to give up “denial” and move forward into an era of parallelism.

Richard Goering

Comments(5)

By Gary Dare on August 1, 2009
Hi again, Richard ... from the abstract of William Daily's talk, published in advance of DAC, I had the impression that he would advocate parallelism not in the form of MIMD (i.e., multicore, with general purpose processors) but with coprocessors (GPU's, in this case) on a single computing element ("chip").  Thanks to you and other reporters/bloggers, more useful details despite that I am in ... Winnipeg, Manitoba, Canada! (:
My impression now is that Professor Daily is advocating an approach beyond VLIW, to implement a processor that is, itself, a SIMD machine with a number of GPU's.  That would be fine but as your colleague Tom Spyrou points out, GPU's are custom processors optimized for a certain set of algorithms to efficiently solve a certain class of problems.  As is the case with DSP's.  A super-duper processor implemented from a SIMD machine with GPU's and/or DSP's would probably not fare much better than a general purpose chip (e.g., Intel, PowerPC, ARM, etc.) if faced with a different problem they are not targeted for, say ... search!

By Richard Goering on August 3, 2009
Excellent points, Gary. The question now is which EDA applications will fit within that "subset" of problems that GPUs are ideally suited for, and how much speedup they'll provide for how much cost. In many cases, general-purpose 16 and 32-processor multicore ICs may be good enough.

By Gary Dare on August 3, 2009
Thanks, Richard and hello again!  Without the benefit of a recording or presentation from Professor Daily's talk, I can only go by impressions from various reporters like yourself.  The term 'general purpose graphics processing unit' seems almost at odds (like George Carlin's joke, 'military ... intelligence?' ... with all due respect to those who have served, of course). From your suggestion, maybe someone needs to come up with a processor containing a VLIW (or SIMD) machine with EDA-friendly custom processors.  Which would render their utility limited outside of those applications but, may, bring a revival of the term 'engineering workstation'! (:

But seriously, I think that any processor customization renders it to a limited class of problems with the same underlying algorithms.  A processor targeted for fluid dynamics, for example, could be used by those who study traffic (e.g., Northwestern University's Traffic Institute) where fluid dynamics concepts have been applied to model rush hours!

I wonder if such a debate is raging on some computer architecture blogs out there?  Maybe some computer architecture experts or dilettantes out in the EDA audience might want to weigh in ...

Thanks to you and to Cadence for this site! (:


By Wayne on May 3, 2010
When will nVidia and/or AMD design their own ICs on a GP-GPU EDA platform?

By basem on May 11, 2012
nowadays we have more processor cores per IC from AMD (well, they have max of 8 modules on a single chip each with two cores sharing specific resources). Does that mean that nowadays we can run faster on AMD than on Intel?

Leave a Comment


Name
E-mail (will not be published)
Comment
 I have read and agree to the Terms of use and Community Guidelines.
Community Guidelines
The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.