Home > Community > Blogs > Industry Insights > porting eda applications to multicore part 1
 
Login with a Cadence account.
Not a member yet?
Create a permanent login account to make interactions with Cadence more conveniennt.

Register | Membership benefits
Get email delivery of the Industry Insights blog (individual posts).
 

Email

* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

Porting EDA Applications To Multicore -- Part 1

Comments(2)Filed under: Industry Insights, Virtuoso, EDI 8.1, Multicore, Encounter, poll

The EDA industry is gearing up for what may be its largest retooling ever – retrofitting or rewriting applications to run on next-generation multicore platforms. An inside look at how Cadence ported the Encounter Digital Implementation System (EDI) to parallel processing illustrates some of the challenges, solutions and benefits.

In December 2008 Cadence introduced EDI 8.1, an IC implementation suite that offers parallel processing across the design flow. Tom Spyrou, distinguished engineer at Cadence, led the effort to parallelize the Encounter tools and is still working on that today.

First, the good news. Nearly all the tools within EDI are multi-threaded or “super-threaded.” (A super-threaded tool runs on multiple workstations, each of which may have multiple CPUs). This includes RTL synthesis, placement, routing, timing analysis, signal-integrity analysis, metal fill, and design rule checking. A fine-grained partitioning saves the user from having to partition anything manually.

But not quite everything is multi-threaded. At this time, the Encounter R&D team is still working on parallelizing floorplanning and physical optimization. Much of this will show up in a release later this year. Optimization takes up about 50 percent of the flow, so it’s an important piece.

The bad news in all this is Amdahl’s Law, which imposes kind of a speed limit on parallel processing systems. It says that the overall speed of an application is limited by the portions that aren’t parallelized. Thus, if 90 percent of a program is parallelized but 10 percent is not, you get at most a 10X speedup.

EDI is a suite comprised of many different tools. While individual tools run significantly faster when multi-threaded, the full flow from netlist to routed design was about 23 percent faster on 4 CPUs in the initial 8.1 release. Tom’s goal is a 2X speedup for the full flow on 4 CPUs by the end of the year. That’s about as good as it gets for full EDA flows right now, he said.

What does it take to port a large CAD application to multicore? At a panel I moderated in 2007, analyst Gary Smith said it might take three years (that panel also included Gene Amdahl, who remarked that he never intended to formulate a “law”). This may not be far off. Tom has been working on the Encounter suite since late 2006, although with a small team of people.

Fortunately, Tom said, EDI was more of a retrofit than a rewrite. But things will change when we get into the “manycore” realm of more than 16 to 32 CPUs. One problem is Amdahl’s Law – with that many CPUs, you’d better parallelize 95-98 percent of your application. Another is that each processor will have its own cache, forcing programmers to “micro-manage” caches and avoid bottlenecks between main memory and cache. Tom’s conclusion about manycore: “It’s a rewrite. There are no clever processing techniques that get you there.”

Some clever processing techniques were used for EDI, however. One of the programming challenges had to do with legacy non-thread-safe code. To cope with this, Tom’s team deployed “lightweight” (meaning low memory) multiple processes. This took some memory optimization work, but it works just as well as pure multi-threading, he said.

Debugging race conditions was a big challenge for Tom’s team. “There is so much going on at the same time that you have to program for debuggability,” he said. Fault tolerance was another issue – what if a machine goes down or hangs?

Porting to multicore “is an art more than a science,” Tom said. “The first step is a detailed understanding of how your legacy code works. Look at places where a lot of CPU time is taken, and focus on parallelizing that part. You want the most parallelization for the least pertubation of the code. It’s an iterative brainstorming process.”

My take: This work may be more important than we think. We have some of the most complex software in the world right here in the EDA industry. If we can make it run well on multicore and manycore platforms, that bodes well for a multicore future. If not – then we’ll have to ask who, if anyone, will actually be able to program these platforms.

In part two, we’ll look at how parallel processing was applied to a different set of challenges in the Cadence Virtuoso Accelerated Parallel Simulator.

 

Richard Goering

Comments(2)

By Mark Johnstone on May 1, 2009
It is important to remember that Amdahl’s law speaks to the percent of time spent in code, not the number of lines of code.  So, if 90% of the time is spent in 100 lines out of a 1 million line of code program, then you only need to parallelize those 100 lines to get up to a 10X speedup.  In the case of chip optimization (referenced in the blog), the “magic” spot is the timing engine; it should be both incremental and parallel.
It is also important to remember that the runtime of EDA tools is a function of the size of the design; and the size of the design is doubling every 2 years!  So, a tool that spends 90% of its time in one block of code for a given design will spend 95% of its time in that code 2 years from now and 96.5% of its time in that code 4 years from now!  When looked at from this point of view, Amdahl’s law isn’t such a limiting factor.

By Tom Spyrou on May 6, 2009
I agree that it is the time spent in code not the number of lines. If the post read otherwise it was not intentional. The timing engine is definately a key piece that we are focused on.
I also agree that over time the tough algorithmic pieces may begin to dominate the runtime more than they do for today's designs. Right now we are focused on driving the engineering team to an easily  measurable goal : for a given design how well does it scale with more cpus.

Leave a Comment


Name
E-mail (will not be published)
Comment
 I have read and agree to the Terms of use and Community Guidelines.
Community Guidelines
The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.