As noted in part one of this blog series, porting the Encounter Digital Implementation System (EDI) to multicore platforms was a challenging task. That’s no less true for Cadence’s efforts to parallelize the Virtuoso Spectre Circuit Simulator, although the challenges were somewhat different.
In December 2008 Cadence introduced the Virtuoso Accellerated Parallel Simulator (APS), a multicore-ready version of Spectre that was part of the Multi-Mode Simulation Solution (MMSIM) 7.1 release. APS includes DC analysis, transient analysis, and transient noise, and provides the same level of accuracy as Spectre, but does not include all of Spectre’s analysis modes.
With EDI, engineers faced the challenge of parallelizing a collection of many different tools. With APS, the focus was on a single tool, but in some ways the challenge was more difficult, said Dan Feng, senior architect. He noted that Spice simulators have two primary sources of computation – device evaluation and the matrix solution. Device evaluation is fairly easy to parallelize, and Cadence had already done some of that work with the previous Spectre Turbo release.
But APS also parallelizes the matrix solution, which is considerably more difficult. “The matrix is usually very, very sparse,” Dan said. “It is hard to cluster things to keep all the CPUs busy.” Despite the difficulty, he said, about 90 percent of the APS code has now been parallelized.
APS currently supports multi-threading, and distributed processing is coming in a future release. “Right now the sweet spot is 4 cores,” Dan said. With four cores, he noted, users can expect about a 2.5X speedup over single-threaded performance on circuits with thousands of components for the full tool flow. With eight cores, that goes up to about a 3.3X speedup. The benefit of going from 4 to 8 cores can range from 20 to 60 percent depending on the circuit. As a side benefit of rearchitecturing the code for parallelization, APS runs significantly faster in single-threaded mode on large designs.
Just as Amdahl’s Law loomed large over the EDI parallelization effort, it was a major issue for APS, Feng said. Any areas of code that remain sequential will greatly slow the overall improvement that can be gained from multi-threading. This problem will worsen as the number of cores increases. To get a good performance boost from 16 cores, Dan noted, you probably need to parallelize 95 percent of the code. Managing memory bandwidth limitations will also be challenge at 16 cores and beyond. Even so, Dan said, “I think it is hopeful. Even now we see some benefit at 16 cores.”
Dan noted that a lot of the legacy Spectre code is sequential, and some had to be rewritten to parallelize it and make it thread-safe. The multi-threading work has been ongoing for about a year and a half. One major challenge is debugging. “I think the utilities available to assist debugging of multi-threaded applications are still very premature,” Dan said.
Now, it’s the little things that count. Dan said there are still some small, sequential areas of code that need to be parallelized. Some might comprise a tenth of a percent of the overall program, but if you have 50 or 100 such code fragments, it’s still a fair amount of work – and a significant potential improvement, given Amdahl’s Law.
“I think a lot of things can be done to further parallelize the code so we see better scaling,” Dan said. “It’s probably an endless process. You never quite reach 100 percent.”