The Electronic Design Processes (EDP) workshop
may be a small, technical, IEEE-sponsored event, but that didn't stop a lively
debate over the feasibility of parallel processing from erupting last week. The
debate comes as EDA vendors, including Cadence, are working hard to port their
software to multicore processor platforms.
Presentations started with a
somewhat skeptical presentation by Patrick Groeneveld of Magma Design
Automation, and then moved on to a really, really skeptical presentation by
Patrick Madden of the State University of New York (SUNY) Binghamton. These
were followed by what I'd call a balanced, hands-on presentation from Tom
Spyrou of Cadence. (He has been working to parallelize the Encounter Digital
Implementation system, and I interviewed
him last year about the challenges that entails).
If there was a silent
presence in the room, it was Gene Amdahl, whose "Amdahl's Law" reminds us that
the gains you get from parallelizing code are sharply limited by any sections
that cannot be parallelized. As Patrick Groeneveld pointed out, if you can
parallelize 50 percent of the code, you can only get a maximum 2X theoretical
speedup. The reality is probably closer to 0.8X. Since all multi-tool EDA flows
include some non-parallelizable code, no one should expect a 7X speedup on 8 processors.
Tom Spyrou (right) answers a question from Patrick
Groeneveld (orange shirt) at the EDP Workshop April 8.
Patrick Groeneveld - The "Last Resort"
"Parallelism is the last
thing I look at if I have a problem in my flow," said Patrick, who serves as
Magma's chief technologist. "I will try everything else first because I know
parallelism has some problems." Sometimes, he noted, parallelizing code can
actually cause a slowdown because of the "hidden costs and extra overhead" of
making code parallel.
Patrick identified six
problems he's experienced in trying to parallelize EDA software:
arbitration. Two processors are
running and both want to use a shared resource. They write to memory at
the same time, and - crash! This is typically fixed by putting locks in
the code, which leads to the next problem.
to lock up performance. Locking
every read "trashes your multi-threaded performance."
intelligent partitions. You
want to chop up the chip and send each to a different processor, which
works for some problems, but "partitioning is evil" for synthesis and
is the silent killer. If you
want repeatability (getting the same result from the same operation), you
will lose processing efficiency in parallelism.
distribution. It is very hard
to estimate workloads in advance, but it's necessary to schedule CPU
sheer complication of things.
Not many programmers are capable of writing parallel code. Bottom line: "parallelism
works but gains are limited. Don't be disappointed."
Patrick Madden: Advocates are "Delusional"
Anyone who was a little
depressed from Patrick Groeneveld's presentation was not helped by Patrick
Madden, whose first words were that Groeneveld was "way too optimistic" about
parallel programming. I was not surprised. In a panel (with the real Gene
Amdahl) that I wrote about
a couple of years ago, Madden said that multicore programming is "the end of
the world as far as I'm concerned."
Madden, however, is working
on a problem that should be important to parallel programmers - speeding up the
"hard serial bottlenecks" that cannot be parallelized, and thus give rise to
the limitations of Amdahl's Law. And where do we find such bottlenecks? In
papers and books by parallel programming experts, he suggested. He told of one
Design Automation Conference (DAC) paper that presented a 100-fold speedup with
a parallelized algorithm. It later turned out that a time-worn serial algorithm
actually did a faster job of solving the same problem, with no special hardware
"I would think that when I
pick up a technical paper from a major university, things should be correct,"
Madden said. "Instead there are monumental flaws presenting a new technology
that's orders of magnitude slower than the existing solution." With a slide
including a photo of Bernie Madoff, he suggested that some of the published
parallel processing research is fraudulent.
Audience members reacted,
saying that Madden was "too skeptical," focusing on a small amount of bad
research, and not giving new programming methods a chance to catch up. "If I do
video, a lot of stuff is embarrassingly parallel and that's how it's done," one
said. Madden agreed, but went on to suggest that some parallel programming
advocates are "delusional" and are seeing what they wish to see after taking
grant money to do parallel computing research.
Tom Spyrou: Yes, We Can Parallelize Legacy Code
It may have been a hard act
to follow, but Tom Spyrou spoke next, in a calm presentation that showed how
legacy EDA code can be parallelized within the context of Amdahl's Law. Spyrou focused
on the "fine grained" problems that are most difficult to parallelize.
"For multicore, with 4 to 8
cores, there are shortcuts you can take to get legacy code to run in parallel,"
he said. "When it comes to manycore, 128 cores and all that, I think it will
have to be written from the ground up. I think a lot of fine-grained problems
aren't going to scale, just like [Madden] said."
Still, some clever things
can be done today. One is to create a "parallel reducer server" for parasitic
reduction. The idea is to take a net and reduce it to a math transfer function,
using parallel calls to the server. There are no threads or thread safety
issues, and the approach provides data locality, which will be important for
Another example is parallel
noise analysis. It is hard to make legacy code thread-safe, but you can use a
Copy on Write fork() construct as a short-term solution. It shares only "read"
memory with its parent, but if memory is modified, it is automatically
duplicated in the child.
easy-to-parallelize applications are more scalable, Syprou said that the
overall Encounter netlist-to-GDSII flow is now running about 2X faster on 4
cores, thanks to the work that's been done to parallelize legacy code.
I think there's some truth
in each of these presentations. Legacy serial code is really hard to
parallelize, and not everything should be parallelized. University research may
well focus too much on parallel programming and not enough on serial
bottlenecks. But let's not throw up our hands and say it can't be done. Let's
get to work and do what we can to speed up EDA software for multicore platforms,
just as Spyrou has been doing since 2006 at Cadence.