Home > Community > Blogs > Industry Insights > eda workshop debate erupts over parallel programming
 
Login with a Cadence account.
Not a member yet?
Create a permanent login account to make interactions with Cadence more convenient.

Register | Membership benefits
Get email delivery of the Industry Insights blog (individual posts).
 

Email

* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

EDA Workshop Debate Erupts Over Parallel Programming

Comments(4)Filed under: EDA, Industry Insights, Multicore, parallelism, Electronic Design Processes, Madden, Spyrou, programming, parallel, Amdahl’s Law, EDP, GroeneveldThe Electronic Design Processes (EDP) workshop may be a small, technical, IEEE-sponsored event, but that didn't stop a lively debate over the feasibility of parallel processing from erupting last week. The debate comes as EDA vendors, including Cadence, are working hard to port their software to multicore processor platforms.

Presentations started with a somewhat skeptical presentation by Patrick Groeneveld of Magma Design Automation, and then moved on to a really, really skeptical presentation by Patrick Madden of the State University of New York (SUNY) Binghamton. These were followed by what I'd call a balanced, hands-on presentation from Tom Spyrou of Cadence. (He has been working to parallelize the Encounter Digital Implementation system, and I interviewed him last year about the challenges that entails).

If there was a silent presence in the room, it was Gene Amdahl, whose "Amdahl's Law" reminds us that the gains you get from parallelizing code are sharply limited by any sections that cannot be parallelized. As Patrick Groeneveld pointed out, if you can parallelize 50 percent of the code, you can only get a maximum 2X theoretical speedup. The reality is probably closer to 0.8X. Since all multi-tool EDA flows include some non-parallelizable code, no one should expect a 7X speedup on 8 processors.

Tom Spyrou (right) answers a question from Patrick Groeneveld (orange shirt) at the EDP Workshop April 8.

Patrick Groeneveld - The "Last Resort"

"Parallelism is the last thing I look at if I have a problem in my flow," said Patrick, who serves as Magma's chief technologist. "I will try everything else first because I know parallelism has some problems." Sometimes, he noted, parallelizing code can actually cause a slowdown because of the "hidden costs and extra overhead" of making code parallel.

Patrick identified six problems he's experienced in trying to parallelize EDA software:

  • Contention arbitration. Two processors are running and both want to use a shared resource. They write to memory at the same time, and - crash! This is typically fixed by putting locks in the code, which leads to the next problem.
  • Easy to lock up performance. Locking every read "trashes your multi-threaded performance."
  • Finding intelligent partitions. You want to chop up the chip and send each to a different processor, which works for some problems, but "partitioning is evil" for synthesis and optimization.
  • Repeatability is the silent killer. If you want repeatability (getting the same result from the same operation), you will lose processing efficiency in parallelism.
  • Load distribution. It is very hard to estimate workloads in advance, but it's necessary to schedule CPU resources.
  • The sheer complication of things. Not many programmers are capable of writing parallel code. Bottom line: "parallelism works but gains are limited. Don't be disappointed."

Patrick Madden: Advocates are "Delusional"

Anyone who was a little depressed from Patrick Groeneveld's presentation was not helped by Patrick Madden, whose first words were that Groeneveld was "way too optimistic" about parallel programming. I was not surprised. In a panel (with the real Gene Amdahl) that I wrote about a couple of years ago, Madden said that multicore programming is "the end of the world as far as I'm concerned."

Madden, however, is working on a problem that should be important to parallel programmers - speeding up the "hard serial bottlenecks" that cannot be parallelized, and thus give rise to the limitations of Amdahl's Law. And where do we find such bottlenecks? In papers and books by parallel programming experts, he suggested. He told of one Design Automation Conference (DAC) paper that presented a 100-fold speedup with a parallelized algorithm. It later turned out that a time-worn serial algorithm actually did a faster job of solving the same problem, with no special hardware required.

"I would think that when I pick up a technical paper from a major university, things should be correct," Madden said. "Instead there are monumental flaws presenting a new technology that's orders of magnitude slower than the existing solution." With a slide including a photo of Bernie Madoff, he suggested that some of the published parallel processing research is fraudulent.

Audience members reacted, saying that Madden was "too skeptical," focusing on a small amount of bad research, and not giving new programming methods a chance to catch up. "If I do video, a lot of stuff is embarrassingly parallel and that's how it's done," one said. Madden agreed, but went on to suggest that some parallel programming advocates are "delusional" and are seeing what they wish to see after taking grant money to do parallel computing research.

Tom Spyrou: Yes, We Can Parallelize Legacy Code

It may have been a hard act to follow, but Tom Spyrou spoke next, in a calm presentation that showed how legacy EDA code can be parallelized within the context of Amdahl's Law. Spyrou focused on the "fine grained" problems that are most difficult to parallelize.

"For multicore, with 4 to 8 cores, there are shortcuts you can take to get legacy code to run in parallel," he said. "When it comes to manycore, 128 cores and all that, I think it will have to be written from the ground up. I think a lot of fine-grained problems aren't going to scale, just like [Madden] said."

Still, some clever things can be done today. One is to create a "parallel reducer server" for parasitic reduction. The idea is to take a net and reduce it to a math transfer function, using parallel calls to the server. There are no threads or thread safety issues, and the approach provides data locality, which will be important for manycore.

Another example is parallel noise analysis. It is hard to make legacy code thread-safe, but you can use a Copy on Write fork() construct as a short-term solution. It shares only "read" memory with its parent, but if memory is modified, it is automatically duplicated in the child.

While some easy-to-parallelize applications are more scalable, Syprou said that the overall Encounter netlist-to-GDSII flow is now running about 2X faster on 4 cores, thanks to the work that's been done to parallelize legacy code.

My Perspective

I think there's some truth in each of these presentations. Legacy serial code is really hard to parallelize, and not everything should be parallelized. University research may well focus too much on parallel programming and not enough on serial bottlenecks. But let's not throw up our hands and say it can't be done. Let's get to work and do what we can to speed up EDA software for multicore platforms, just as Spyrou has been doing since 2006 at Cadence.

 

Richard Goering

Comments(4)

By theASICguy on April 13, 2010
Hi Richard,
Great summary!
One interesting thought that was presented was that the real bottleneck is actually moving the data around. That is, if you could locate many processors close to a shared memory, that would help increase performance. Today, a lot of time is spent moving data over a network. Perhaps there are some novel architectures that will arise to help make this more efficient in the future.
Harry

By rgoering on April 14, 2010
Agreed! Perhaps one of those "novel architectures" is 3D ICs, with stacked die including logic and memory. This was also a hot topic at the EDP workshop.

By Chris Kingsley on April 15, 2010
Locating the processors close to a shared memory is what the Intel memory architecture does.  Then you get bus contention.  So you put in caches so you don't have to go to the bus so often.  Then you put in stuff to keep the caches coherent.  In the end you have a huge monster.
Or you can go the AMD way, and give each processor its own memory, and let the processors communicate to get to memory attached to other processors.  I don't know how (or if) he operating system figures out which pages one process works on most and assigns/moves them to the right processor.  In any case, you end up sending vast quantities of messages between processors to handle the shared memory, and you end up with a huge monster.
An advantage of a Message Passing style of programming is that the expensive part of the program, the communication, is made explicit.  The disadvantage is that the communication is made explicit...

By Minesh B. Amin on April 15, 2010
It is astonishing to read about what Patrick Madden had
to say, and of all places, at a conference attended by the
big/well connected group of people.
I find working on commercial parallel (especially fine-grain)
capabilities to be truly liberating and satisfying mainly
because there is a heavy premium on technical/intellectual
honesty. This is because, unlike in the serial world,
any parallel capability is its own competitor.
All the End-User needs to do is run experiments with
1,10,20,40,80,etc cores while leaving all other parameters
exactly the same, compute the speedups, efficiencies,
and iso-efficiencies, and judge for themselves what the
trend-lines are.
And, if the trend-lines are not favorable, there is nothing
the software vendors can do but acknowledge the problems.

Leave a Comment


Name
E-mail (will not be published)
Comment
 I have read and agree to the Terms of use and Community Guidelines.
Community Guidelines
The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.