Can EDA vendors parallelize millions of lines of legacy code, or do they need to rewrite everything in order to run on multicore and many-core platforms? In a Dec. 8 interview for Intel Software Network TV, Tom Spyrou, distinguished engineer at Cadence, described how legacy code can be parallelized for 4 or 8 processors. But scaling beyond that will require some significant re-coding, he said.
Intel Software Network TV runs a weekly Parallel Programming Talk at 8 am Tuesday Pacific time. In addition to watching on line, you can listen to it live or download it from BlogTalk Radio. Hosts are Aaron Tersteeg, Intel multicore community manager, and Clay Breshears, master of the parallel universe (they have some interesting titles at Intel). The program looks like a great resource for parallel programmers.
As I noted in a previous blog, Tom has been working to parallelize the Encounter Digital Implementation System for the past three years. He is active in the parallel programming community, and he writes a blog for the Intel Software Network.
Why write the blog, and participate in the interview? “Intel has a large program in place to educate developers and fund others to educate developers worldwide in parallel programming,” Tom said. “Since we have made a lot of progress in parallel computing at Cadence and have been able to retrofit existing applications to some degree, this is interesting for Intel and their software community.”
In the Dec. 8 Intel TV interview, Tom noted that large legacy software applications don’t have to be rewritten to run on today’s multicore processors. The trick, he said, is to find pieces of code that are amenable to parallelization without rewriting the whole application. With luck, you can identify pieces that take up 30% or 40% of the run time and show a big performance increase in a short period of time. After that it gets much harder, and as described in Amdahl’s Law, any remaining code that is not parallelized will greatly limit the overall performance increase.
While you can “buy yourself some time” using tricks and techniques to parallelize legacy code, if you want to scale much beyond 8 processors you will have to rethink applications, Tom said. With many-core processors, Amdahl’s Law will become a severe bottleneck if there is any serial code at all.
Tom also made the following points in the 20-minute interview:
Start by profiling and optimizing serial code. “If you parallelize inefficient code, you’ll get inefficient parallel code,” he noted.
- Message passing is a good technique because it stays localized. A message passing library is helpful but not required – you can also use low-level sockets.
- Most code libraries are not thread safe. But if operations are independent, you can run in multiple processes.
- Debugging parallel programs is very hard. The best approach is to keep the architecture as simple as possible.
- Shared memory programming works well for EDA applications such as place and route because algorithms traverse the entire design multiple times.
- Automatic thread analysis tools have not been useful for Cadence.
- The end user needs to tell the load management tool how many CPUs to reserve.
- A static pre-allocation of tasks to CPUs doesn’t work. Dynamic scheduling is the key to getting scalability out of parallel programming.
EDA software is some of the most complex in the world, so if it can be parallelized, other applications can. Cadence software helps Intel and other companies build multicore chips – so it’s only fitting that Cadence’s experience with parallel programming should help Intel’s developer community make use of those chips.