The notion that your ability to analyze power dissipation more
accurately as your design proceeds down the levels of abstraction from
system-level, to RTL, and to gate-level and transistor-level netlist has existed
unchallenged for too long. Well, would I be tilting at windmills to challenge
I could bore you all with the math, but fundamentally,
dynamic power boils down to be a function of two things -- characterization and
switching activity. Characterization means accurately measuring and modeling
what happens when a transistor switches -- it's a function of Vdd2,
R, C (and increasingly L). Switching activity depends on the frequency and duty
cycle at which the switching happens, for each of the transistors in the
circuit of interest. We can reduce that to some extent by using the lowest
clock frequency that gets the job done, and turning the clock off when not
needed -- also known as clock gating.
It stands to reason then, that only when I have a placed
and routed netlist and know all of my transistors and wires, and extracted all
the RC values, do I have any kind of accuracy. Right? Wrong. Characterization
no longer seems to be the problem designers are struggling with -- it's the
activity. What vectors can I run on the transistor netlist? What are all the system
modes to generate realistic activity in today's multi-function devices? Am I
replicating those with my vectors or just running test patterns, or using
statistical methods, which bear scant relation to real-life operation of the
device? Is the real silicon even the "gold standard"? What vectors did I use to
measure the "real" power when it came back from the fab?
Another facet of the problem is introduced when you consider
leakage power. As we moved below 65nm, leakage became a real issue in many
design types, and can even be the dominant dissipation mode for mobile devices.
Leakage power is different from dynamic power, depending only on the transistor
and related voltages.It's not just Vdd since leakage goes very
non-linear when signal or supply voltages get close to Vth. It's
also non-linear with process and temperature variations.
So all of this complicates
characterization of my circuitry, right? Yes, but it complicates the activity
piece even more. Let me explain. The best way to manage leakage is to turn the
circuit off -- known as power gating or Power Shut Off (PSO). So now, as well as
regular designs that use tried and trusted methods to reduce dynamic power like
clock gating and multi-voltage thresholds (MVT), we also have what we'd term "advanced low power designs"
which split the design into separately-supplied power domains so we can apply
PSO, or MSV (multiple supply voltages) or even DVFS (Dynamic Voltage and
Now we have a multitude of different power modes which
need very complex (and much longer) vectors to place the chip into that power
mode and provide traffic representative of the system mode or combination of
modes to which each power mode corresponds. Yes, the number of PVT corners are
increasing, complicating the characterization, but the number of system modes
is increasing even more! I have worked with customers who painstakingly worked
out the data bandwidths for various parts of the chip to come up with vectors
for maybe 30 different modes for power analysis. Even this might be missing a
lot. Consider that these power saving techniques do not come for free. There is
overhead associated with switching power domains off and bringing them back
on-line. Unless a power mode endures for a certain time, you may be wasting
power, not saving it, by turning idle circuitry off. You might even want to
speed up computation, so you can turn off for longer. Hence you really need to
start considering all the combinations of moving
between those 30 modes to really get an accurate picture.
So you can see that accuracy is in the eye of the beholder.
Transistor-level measurements are not any more accurate or representative than,
say, much-maligned microprocessor benchmarks are of real microprocessor
performance. Both give an accurate measure in inaccurate circumstances. The
answer of course is to run the real system software, with device-level
How can that be done before we get the chip back? Cadence has
an interesting solution. Check out the Dynamic Power Analysis capability
of the Incisive Palladium platform if you haven't already. You can run as much real system-level
activity as you wish, and the design is characterized using RTL Compiler technology
under the hood to map to your real cell library. It's still an estimate of
course -- the implementation is not exactly the same as the real chip would be --
but it may be the closest you'll get until you have the actual silicon running
the actual application software. Which of course is probably too late.