Transaction-based acceleration can speed up simulation hundreds of times, but you need to develop a good strategy to take full advantage of it, according to a paper authored by Cadence and Broadcom and presented at the recent DVCon conference. The paper detailed Broadcom's experience using transaction-based acceleration with the Cadence Palladium hardware verification platform.
The paper was titled "Transaction-Based Acceleration - Strong Ammunition in Any Verification Arsenal." It was presented by Chandrasekhar Poorna, principal engineer at Broadcom. This and other archived papers will be available at the DVCon web site April 18.
Some quick background: With simulation acceleration, the design under test (DUT) is synthesized into hardware while a simulation testbench runs on a workstation. While basic acceleration traditionally involves a signal-level interface between the software running the testbench and the hardware acceleration platform, transaction-based acceleration brings this interface up to the transaction level, greatly reducing communication overhead and improving performance even further.
As a verification methodology, it provides a bridge between simulation and in-circuit emulation. I provided some further background information on transaction-based acceleration in a previous blog post.
Verification Challenges at Several Levels
Poorna began his talk with a look at verification challenges and approaches at the sub-block, block, and chip levels. RTL simulation is reasonably effective at the sub-block (less than 1 million gates) level, but at the block level (5-20M gates) long debug cycles begin to hamper productivity. At the chip level (over 30M gates) performance greatly limits the types of tests that can be run.
In-circuit emulation (ICE) comes into play at the chip level, but it requires all blocks to be ready for synthesis, requires real drivers and firmware, and provides a different debug environment from simulation. FPGA prototyping, more suited for application software validation, has long bring-up times and limited debug. Something else is needed, but what? After reviewing some possible solutions, Poorna discussed the value of transaction-based acceleration.
As shown below, signal-based acceleration uses a bit-by-bit data exchange between the testbench and the DUT. Transaction-based acceleration uses the Accellera Standard Co-Emulation Modeling Interface (SCE-MI 2.0) standard in conjunction with bus-functional models. This greatly reduces communications channel overhead.
Source: Cadence/Broadcom DVCon 2011 paper
Poorna listed several requirements that Broadcom has for acceleration. These include an ability to reuse and leverage the existing testbench environment (with features such as randomization and metrics), match the debug visibility of RTL simulation, and achieve portability across simulators and hardware platforms vendors by using the SCE-MI and SystemVerilog standards.
"There was a big gap between simulation and ICE, and transaction-based acceleration allows us to bridge that gap," Poorna said. "We can run our existing [testbench] environment, and verify and provide a cleaner database to the guys going into ICE. It cuts down our time in going from simulation to emulation and as a result, there are fewer bugs at that [ICE] stage."
The following chart shows how Broadcom applies various verification technologies including transaction-based acceleration and ICE.
Source: Cadence/Broadcom DVCon 2011 paper
Preparing for Transaction-Based Acceleration
Poorna described Broadcom's "plan of attack" for getting the best use out of transaction-based acceleration. Steps include:
- Isolate the interfaces that enable communication between the testbench and DUT
- Define a data transfer strategy for each interface, choosing SystemVerilog DPI functions or SCE-MI transaction pipes
- Tune period and phase relationships between different clocks
- Make sure clocks reside on hardware platform only
- Move bus-level functionality and data checking to monitors
What really counts are results, and Poorna had them. A 32M gate chip-level design with 1,600 packets took 16 hours to run on a third-party SystemVerilog simulator. When accelerated with Palladium, the simulation ran in 250 seconds -- a 230X speedup. The compile time of 1.5 hours was fairly short. A block-level example showed a 292X speedup over simulation.
"Because things run very fast, our debug time is extremely low," Poorna commented. He also noted that the Palladium handles both acceleration and ICE. He noted, however, that there's a learning curve and that "you need to invest some time to learn the methodology and the tools. But once you pass that initial curve, things will be more incremental and can be managed more easily."
A listing of other blog posts about DVCon can be found here.