200X. That's the number with which Moshe Berkovich, senior engineer at fabless semiconductor provider CSR, started a 15-minute talk that is now a recorded presentation on Cadence.com. And 200X is the performance improvement that his team was able to get by adopting a "hybrid" methodology for software development that combines simulation acceleration and virtual prototyping.
Berkovich was a speaker at the Cadence DAC Theater at the Design Automation Conference (DAC 2014) in June, where over 40 speakers -- mostly customers and partners -- offered informal presentations. Audio recordings and slides are now available for most of those talks, including Berkovich's presentation, at the Cadence DAC microsite.
Both virtual prototyping and emulation/acceleration allow pre-silicon software development. In the hybrid use model, fast CPU models run on a workstation as part of the Cadence Virtual System Platform (VSP), while RTL for most of the rest of the SoC hardware runs on the Palladium XP emulator. But in reality, it's not as simple as that. As Berkovich's team discovered, you have to know what to place on the VSP side and what goes in the emulator, and understand how the various SoC blocks will interact.
No One Platform Fits All
In his presentation, Berkovich first reviewed the challenges of pre-silicon software development. He noted that complex systems on chip (SoCs) need massive amounts of software development, that meeting time to market demands stable working software, and that hardware/software co-simulation is critical for verification. A software engineer needs a development platform with "very specific requirements," including high frequency (ability to run long tests), debug capabilities, and fast bring-up times.
Berkovich reviewed the advantages and limitations of existing hardware/software development platforms. Here are a few key points:
- Simulation acceleration with Palladium XP runs up to 1.5MHz, offers fast bring-up times, and has good debug capabilities. However, it's too slow for software development by itself - it would take over 30 minutes to boot Linux.
- Virtual prototyping as provided by VSP runs up to 100MHz, and offers good debug capabilities and early availability. However, timing is not accurate and transaction-level modeling (TLM) semiconductor IP may not be available.
- FPGA prototyping can reach speeds of 25MHz with CSR designs, but bring-up is slow and debug is limited.
In view of these limitations, CSR engineers worked with Cadence to implement a Palladium-VSP hybrid approach. "You take off-the-shelf TLM models and put them on the VSP side," Berkovich said. "You put the rest of the hardware in the Palladium and let it run. You still maintain the debug capabilities you had with Palladium and you do it fast and have a quick turnaround time."
However, it's not that simple. What if the CPU needs to address DDR memory frequently, and the CPU is on the VSP side and the DDR is running in Palladium? If the DDR is running at 1.5MHz, all the information that's passed between the CPU and the DDR will slow down in order to synchronize to that rate. The remedy is to take the DDR and move it to the VSP side as well. Now there is very fast (100MHz) communication between the CPU and the DDR.
There may still be a bottleneck, however, because there are other blocks still in Palladium that must be accessed by the CPU and/or DDR. One solution is to instantiate two identical DDR IP blocks - one on the VSP side, and one on the Palladium side. A "back door interface" connects them and ensures coherency. Further, other blocks frequently accessed by the CPU - such as the UART - can run on the VSP side as well.
A Different Approach
Berkovich noted that the hybrid approach results in some important "differences" that hardware and software development teams must understand as they write tests. For example, the modeled CPU is not cycle-accurate, so performance tests will come out differently. Also, CPU models may have some limitations, such as only supporting in-order execution.
The proof is in the numbers, and Berkovich shared some impressive results. First, he noted that a "compressed" Linux boot took about 22 minutes on the Palladium XP alone and 16 seconds using the hybrid method. Next, a "video test" involving a video block on the Palladium side took 18 minutes in Palladium XP and 20 seconds using the hybrid approach. Finally, a full Linux boot with all features enabled took more than an hour on Palladium and less than 20 seconds on the hybrid.
"These tests showed us that the hybrid is excellent for CPU-centric designs," Berkovich concluded. "You can think about the hybrid as a bridge between two technologies - the Palladium simulation accelerator and the VSP. This bridge gave us a very fast platform with excellent debug capabilities. For us, the hybrid gave us the best of both worlds."
To listen to this presentation and see the slides, click here and scroll down to 1:30 pm Wednesday, June 4. No registration is needed.
Related Blog Posts
Palladium XP II - Two New Use Models for Hardware/Software Verification
Designer View: New Emulation Use Models Employ Virtual Targets
Q&A: New Directions for Hardware-Assisted Verification