I recently saw a blog post written by a competitor on a purportedly neutral EDA blog, that called for a re-tooling of the RTL-to-GDSII flow. The argument was that for designs 20M gates or larger, you needed to synthesize at the chip-level, and synthesize in conjunction with placement. It also goes on to describe the verification problem of chips this size.
It is clear that synthesis needs to work in conjunction with placement. This is why we have over 50 customers now using our RTL Compiler Physical solution, which combines high-capacity global synthesis with production placement and fast routing estimation technology. This enables RTL Compiler's unique global synthesis to work in conjunction with real physical interconnect timing to speed closure to your overall performance, power, and area goals. And it can even prevent or fix congestion issues.
But what about those chips that are 20M gates or larger? Our experiences with RC show that it's runtimes scale linearly with design size, and we have had designers run artificial testcases of 20M instances (1 instance =~ 4 gates). But in production? We find that most teams like to match their synthesis runs to their physical partitions. Otherwise if you synthesize at a higher capacity, you'll need to partition your constraints afterwards. And we're constantly told that nobody waits until they have 20M gates-equivalent of RTL before they start synthesis. What is your approach?
The biggest challenge here is verification - functional verification consumes typically 60% of a chip project's hardware engineering costs. How do you verify 20M gates-equivalent of RTL? I think we are going to find that the answer is "you don't". The way to verify a chip this large is to move up a level of abstraction to Transaction Level Modeling (TLM) using a language like SystemC. I can't give a better argument for this than the experiences that Casio had, improving their verification/debug cycle by 50% using Cadence's C-to-Silicon, which includes RTL Compiler under-the-hood to bridge the flow from SystemC to implementation.
This is very similar to the crossroads that the industry faced two decades ago when it was becoming too cumbersome to design and simulate hundreds of thousands of gates at the gate-level. The solution was not to re-tool schematic capture and simulation - we moved up a level of abstraction. We need to change our mindset from incrementally tweaking the RTL-GDSII flow and re-define the problem as TLM-GDSII.
Jack Erickson