Last week, Professor Jan Rabaey of the University of California at Berkeley gave a great keynote at Cadence's Low Power Technology Summit that called for changes to the conventional solutions for power reduction.
One of the points he made was that today's designs are over-designed and over-constrained with additional margin. One of the reasons is variability -- which results in the creation of timing margins to ensure that process variability or environmental conditions do not cause the chip to malfunction.
In the design process, margins are constantly added to ensure that timing goals will be met in physical implementation without schedule-killing iterations. After all, schedules are very aggressive and also high-visibility to management. So in RTL synthesis, you add margin to protect yourself from the additional delays due to wires since you don't yet know what those will be. As wire delay becomes a more and more significant component of the delays in critical paths, this margin grows. Of course the cost of this is generally larger area and power consumption. This is why we have physical synthesis -- to bring these back under control.
The same dynamic exists in high-level synthesis. HLS moves everything up another level of abstraction, where it is creating the micro-architectures that will have a much larger effect on area and power than will the gate sizes and local structures that RTL synthesis creates. However being further removed from the physical details also increases uncertainties around quality of results (QoR) measurements. Uncertainty leads to adding more margin. Since timing is usually a "hard" constraint, it is timing margin that is added, at the expense of area and power.
The solution to reducing this uncertainty is similar to how RTL synthesis reduced the timing uncertainty -- just embed the downstream engines. In other words, RTL synthesis has embedded placement and varying levels of detail of routing in order to better measure timing, enabling designers to reduce their margins, which in turn lets synthesis optimize for area and power.
In the case of HLS, this means embedding RTL Compiler logic synthesis into C-to-Silicon Compiler HLS. As with physical synthesis, the characterization that the embedded downstream engine performs has to be design-dependent: in the case of RTL synthesis, you need wire delay that is design-dependent; in the case of HLS, you need resource-level timing information that is sensitive to the context of the surrounding design. The only way to do this is to embed production synthesis, since resource characterization will need to be aware of how the logic will be constructed downstream.
This shows why it is vital to embed downstream production engines. Yes, such embedding delivers more predictable closure, which is important from a schedule management point of view. However using more accurate timing information enables optimization to live at the edge with respect to the timing constraints and hence perform more aggressive area and power optimization. It is widely stated in the industry that approximately 80% of digital power consumption is determined at the RTL stage or earlier -- this is where HLS lives, so it is critical that your HLS tool does not over-margin timing because it does not know what will happen downstream. Utilizing a tool like C-to-Silicon Compiler with embedded production synthesis will not only deliver more predictable timing closure, but will also trim those timing margins and save you area and power.