Home > Community > Blogs > Industry Insights > cdnlive paper preview rtl performance analysis of arm interconnect ip
 
Login with a Cadence account.
Not a member yet?
Create a permanent login account to make interactions with Cadence more conveniennt.

Register | Membership benefits
Get email delivery of the Industry Insights blog (individual posts).
 

Email

* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

CDNLive Paper Preview: RTL Performance Analysis of ARM Interconnect IP

Comments(0)Filed under: Industry Insights, ARM, RTL, TLM, Simulation, AMBA, CDNlive, CCI, CDN Live, interconnect, performance analysis, Interconnect Workbench, CDNLive 2013, cache coherent interconnect, traffic models, system IP, CoreLink, Orme, AMBA Designer, Heaton, interconnect IP

System on chip (SoC) interconnect must meet the performance requirements of increasingly demanding, complex chips -- but traditional modeling and verification techniques don't shed much light on bandwidth and latency.  A new approach to analyzing and debugging performance with ARM system IP (interconnect) will be presented Tuesday, March 12, at CDNLive Silicon Valley in Santa Clara, California.

The new approach will be discussed by William Orme, strategic marketing manager for ARM, and Nick Heaton, senior solutions architect at Cadence, in session DVSY101 Tuesday at 4:45 pm. The approach uses the ARM® AMBA® Designer to generate RTL interconnect, and then uses the Cadence® Interconnect Workbench (see previous blog post here) to automatically analyze bandwidth and latency across hundreds of simulations.

An interconnect performance analysis solution is needed because adequate performance is essential for successful delivery of SoCs. Heterogeneous multi-core architectures are becoming very complex, and interconnects need to intelligently manage traffic from different processors sharing the same memory system. A typical ARM CoreLinkTM CCI-400 Cache Coherent Interconnect, for example, can consume hundreds of thousands of gates. The complexity of the SoC requires new features such as quality of service (QoS), multiple power and clock domains, dynamic queuing, multi-processor support, and traffic management, with many different traffic generators to be set up and verified for performance.

What Doesn't Work

So, why not just ask system architects to run interconnect performance analysis at the transaction-level modeling (TLM) level? According to Heaton, the models at this level aren't accurate enough, and building accurate models would make them very slow. TLM models "abstract the behavior of the interconnect and the memory system completely away," he noted. "You can't measure the way those systems really behave without being cycle accurate."

Thus, what's needed is RTL. And the people who are asked to do the interconnect performance analysis, Heaton said, are most typically RTL verification engineers. "They have been pushed into this kind of analysis and they are struggling," he said.

But pure RTL simulation doesn't really work for interconnect performance analysis, either. Simulation, Heaton said, lets users see a waveform over time. With complex SoC interconnect, there could be hundreds of memory transactions ongoing at any one time, and trying to pick out problems by staring at waveforms is "almost impossible." The better approach is to look at statistical distributions.

What Does Work

"Let's not look at one simulation," Heaton said. "Let's look at 100 simulations that are variations of the same scenario, and do a latency distribution. You can very quickly spot outliers, which are occasional transactions that are taking a lot longer than other ones. You can pick them out and debug them. If it was pure RTL simulation, you'd never find it."

While the AMBA Designer generates RTL for the interconnect, the rest of the SoC doesn't need to be in RTL. For main memory, you could use an approximately timed (AT) model. You would typically replace processors with approximate traffic models. "The traffic analogy is useful because there are a lot of different masters and traffic generators," Orme said. "You have to manage all this competing traffic and make sure no one loses out."

Today's interconnect is highly configurable, and the AMBA Designer can generate RTL for a new configuration at the push of a button. The Interconnect Workbench takes metadata from the AMBA Designer tool and automatically generates testbenches for ARM system IP. In a traditional flow, engineers would have to hand-code the testbench. "You can be running simulations in under an hour without writing a line of code," Heaton said.

The end result is that users can run simulations for different implementations, configurations, and use cases, and get a clearer picture of the impact of design decisions on the way the system will perform. They can make tradeoffs to find the optimal implementation options for the various use cases.

For further information about CDNLive, click here. If you're reading this after March 12 and would like to know more, you can see an ARM guest blog plus video here and a Cadence Industry Insights blog here.

Richard Goering

 

Comments(0)

Leave a Comment


Name
E-mail (will not be published)
Comment
 I have read and agree to the Terms of use and Community Guidelines.
Community Guidelines
The Cadence Design Communities support Cadence users and technologists interacting to exchange ideas, news, technical information, and best practices to solve problems and get the most from Cadence technology. The community is open to everyone, and to provide the most value, we require participants to follow our Community Guidelines that facilitate a quality exchange of ideas and information. By accessing, contributing, using or downloading any materials from the site, you agree to be bound by the full Community Guidelines.