Clock concurrent optimization (CCOpt) is a new technology that runs clock tree synthesis (CTS) concurrently with physical optimization. It claims significant improvements in performance, power, and area - but the only way to really quantify such claims is through customer experience with real designs. At CDNLive! Silicon Valley 2012 Koen Lampaert, associate technical director at Broadcom, shared the results of two experiments his company ran with CCOpt.
Acquired from Azuro last year, and now available with the Encounter Digital Implementation System 11.1, the Cadence CCOpt technology uses a timing window-driven engine to optimize timing paths and clocks simultaneously. It's thus a departure from the traditional CTS approach, which focuses on minimizing skew. Potential benefits include a 10% improvement in performance and total power, 30% reduction in clock power and area, and 30% reduction in IR drop. For more information about CCOpt and how it works, see the references at the end of this post.
Broadcom has been using CCOpt for its processor designs for about two years. In the experiments described at CDNLive!, the first design was an ARM Cortex-A9 block-level core, while the second was a hierarchical dual-core Cortex-A9 design including cache. Both are taped-out cores, according to Lampaert, and they run in the GHz range. They were originally 40nm designs, but since then Broadcom has moved to more advanced nodes.
Automating Custom Design "Tricks"
Lampaert is part of the team that designs ARM microprocessors at Broadcom, and it's not an easy task. "The main challenge is that we have to design a lot of these processors," he said. "Because every business unit has its own requirements, we have to do a specific core for each business unit. Our design schedules are very tight, on the order of 6 months or so for an entire processor. So, people expect custom performance on an ASIC design schedule."
Consequentially, Lampaert said, his group has been "looking into ways of automating the tricks that designers usually apply manually." And that's a capability that CCOpt provides.
In traditional CTS, Lampaert said, designers defined the target skew and defined the buffers and inverters the tool was allowed to use. Sometimes they manually defined "useful" skews (a technique for slack redistribution) and asked the tool to implement them. However, he noted, "it is very time consuming and difficult to determine what the exact skew should be. The minute you change something in your design, you have to redo the exercise. It's really something that has to be automated."
A traditional flow will run CTS and then follow it up with a separate optimization step to fix any problems. With CCOpt, in contrast, the CTS and the optimization occur simultaneously, Lampaert noted. "It shifts the problem from just looking at a critical path to looking at an entire portion of the design. It looks at a chain of critical paths and optimizes across the entire chain. And that's the way custom designers have always looked at it. It's just that, in the ASIC design flow, that was kind of lost."
Results of CCOpt Runs
The first Broadcom experiment compared a "base run" without CCOpt to a run with CCOpt. With CCOpt, there was a 6% performance improvement versus the incumbent flow. The drop in failing endpoints from 42 to 1, Lampaert said, shows that it was a lot easier to close timing with CCOpt. Power was the same, so the main message here is that CCOpt provided higher performance without increasing power consumption.
Experiment #1 - Block-level Cortex-A9 processor core
Not shown in the above table is the IR drop reduction. Average IR drop went from 25mV without CCOpt to 21mV with CCOpt, and sigma IR drop went from 4mV without CCOpt to 3.5mV with CCOpt.
The second Broadcom experiment ran a similar comparison on a hierarchical dual-core design. Here the performance increase was 8% over the incumbent flow. The number of failing endpoints dropped dramatically, total negative slack improved, and power was essentially the same compared to the conventional flow.
Experiment #2 - Hierarchical dual-core design
Lampaert's presentation noted several other aspects of CCOpt, including a critical chain analysis report, clock tree visualization, and an ability to selectively add margin to endpoints. CCOpt is "fairly easy to use," he said. "Once you set up the libraries and the input it doesn't require a lot of intervention." In the future, he would like to see CCOpt technology extend into physical synthesis and routing.
CDNLive! Silicon Valley 2012 proceedings are available here for conference attendees.
Further Information on Clock Concurrent Optimization
Industry Insights blog: Why Cadence Bought Azuro - A Closer Look
Industry Insights blog: Q&A: Former Azuro CEO Explains Clock Concurrent Optimization
Chip Design Magazine article: Clock Concurrent Optimization Reshapes IC Physical Design Flow