Home > Community > Forums > Logic Design > Properly optimizing enable to clock gating enable

Email

* Required Fields

Recipients email * (separate multiple addresses with commas)

Your name *

Your email *

Message *

Contact Us

* Required Fields
First Name *

Last Name *

Email *

Company / Institution *

Comments: *

 Properly optimizing enable to clock gating enable 

Last post Wed, Apr 12 2006 8:04 AM by archive. 10 replies.
Started by archive 04 Apr 2006 11:13 AM. Topic has 10 replies and 3865 views
Page 1 of 1 (11 items)
Sort Posts:
  • Tue, Apr 4 2006 11:13 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    Properly optimizing enable to clock gating enable Reply

    Hi,

    I am curious to know how you handle the path to clock gating enable  pin.

    I have a design containing multiple level of clock gating on the clock network and when synthesizing using ideal clock all my patht o enable pins have a full  cycle but when My CTS will introduced the latency of the network will reduce the available time to reach those pins.

    I understand I could perform post CTS optimization but I would prefer a more robust method to constraint those in ideal clock mode.

    I'am thinking of the 2 following approach and would like to hear from you if you have use them or if you have used any others.

    - max_delay to enable pins equal to (clock period - expected network latency post clock gating element)
    - defining generated clock after each clock gating element with different latency

    Those 2 methods have the inconvenience of requiring a lot of data management  :(

    Thanks for your help,
    Eric.


    Originally posted in cdnusers.org by evenditti
    • Post Points: 5
  • Fri, Apr 7 2006 7:22 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply
    Firstly, I’m assuming you’ve got a latch on each gate to prevent glitches?

    Do all of your clock-gates have simple names that are unique (eg RC_CG_* like Rtl-Compiler uses)? If so, then you could “max-delay” constrain them in Rtl-Compiler by finding every gating cell’s latch “d” pin using “find” and wildcards (such that you embed the find inside the command setting the max-delay). Now use “write_sdc”. If the constraint is applied right, you’ll have one line in the synth scripts and one per gate in the sdc.

    Then you just need to pick the value for the constraint that is a compromise between over-constraining gates that easily meet timing, and only leaving a manageable number of fails to fix by hand after CTS.

    Of course the real solution is to architect the system/design/software for low-power better in the first place, but I’d run on too long if I started that one!

    Originally posted in cdnusers.org by crispy_duck
    • Post Points: 5
  • Sat, Apr 8 2006 8:51 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Hi,

    I think the solution can be in the design and not in the implementation. Example of solutions are:

    1. The enable generation logic must be simple, so timing can be met
    2.  The enable logic to the clock gater at the base of the clock tree should be multi-cycle.

    During synthesis, there is no way (at least for the current technology) to predict the clock latency of the clock gater. So if you happen to have timing closue issue on some clock gaters, then you could re-syntheiss with additional constraint for those clock gaters, or specify the FFs driven by those clock gaters not to be clock gated.

    RC has better command than set_max_delay. Use "path_adjust" command.


    Regards,
    Eng Han


    Originally posted in cdnusers.org by EngHan
    • Post Points: 0
  • Mon, Apr 10 2006 3:24 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    RC has better command than set_max_delay. Use "path_adjust" command.

    I was thinking that using a max_delay would give a set of constraints for layout tools to use for opt pre-cts/post-place (rather than trying to do it post-cts) as well as for synthesis. For that, you need a command in the SDC that the layout tool understands and supports (max_delay seemed most obvious to me).

    CD


    Originally posted in cdnusers.org by crispy_duck
    • Post Points: 0
  • Tue, Apr 11 2006 7:59 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Hi,

    set_clock latency at the clock gating cell might be better than set_max_delay. Note that set_max_delay (as well as set_clolck_latency) are both affect by the clock skew. With set_max_delay, this constraints has to be removed after cts. With set_clock_latency, it is ignored after cts. I remember reading from somewhere that different tool understand "set_max_delay" different as the defination of this command is not clear. Not sure if this is still the case now.

    I like path_adjust because it does not depend on clock skew.

    Regards,
    Eng Han


    Originally posted in cdnusers.org by EngHan
    • Post Points: 0
  • Tue, Apr 11 2006 2:05 PM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Eng Han,

    You're right about removing the timing constraints after CTS if you follow my suggestion. It's a good point, and I should have said that in the first place (our flow uses tweaked SDCs between pre and post CTS, I just forgot about that "minor" detail!!).

    "path_adjust" sounds good, but how many tools other than RC support it? I'd not come across it before, might have to have a play in the morning (well morning for us europeans!!).

    I also note your comment about the timing of the gating (either have simple logic or false-path it), that sort of thing is something (in my opinion) you need to design in from the architecting level. For example, by false/multicycle constraint, you make the gating non-immediate (switch may or may not pass clock this cycle). This gating is then only suitable for on/off switching (eg turning a block on for use then off again after some period of time). This behaviour may need even the s/w guys to account for. By contrast the gating done for localised power control is immediate (the clock must pass on the next cycle). The only way this enable can meet timing is to have simple logic gating a small register bank (so the tree size after the gate is minimised).

    Or perhaps we could just persuade the world to stick to mains-powered devices and stop using these damn batteries ;-)

    CD


    Originally posted in cdnusers.org by crispy_duck
    • Post Points: 0
  • Wed, Apr 12 2006 6:47 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Hi Crispy Duck,

    Before the technical discussion, I am surprise that you are in Europe as "Crispy Duck" sound Asian. A surprise for you too; I am in Paris, but will be in Asia next month.

    You bring up a good point to explain why depending on who you talk to, some designers want the clock tree  to be after the clock gater (obviously to save power), and some designers want the clock tree to be before the clock gater (obviously to meet timing).

    In the design that I have here, the clock gaters at the base of the clock tree are hand instantiated, and there is always a pair of FFs (like the synchroniser) to drive the enable of the clock gaters. In this way, the logic from the FF to the clock gator is just a wire, and meeting  timing become easy.

    It is tricky to decide what should be the latency for the pair of FFs (in front of the clock gator). Ideally, they should have shorter latency. However, if you want to include them in the scan chain, then it is better to balance them. Also depend on where you place the clock gater. If it is placed next to the PLL, and is miles away from the core (and somehow you decide to place the pair of FF next to the clock gaters), then it is better to not balance the latnecy, and also exclude them from the scan chain (too many if here...).

    Now, back to the original question. If the designer does not know the impact of clock gaters on timing closures, the backend engineer will suffer; and the quality of the layout will be bad. The new RC 6.1 has some feature that can merge/split the enable condition of clock gaters. This might help, or make thing worse. Also, if timing closue due to clock gater inserted by the tool is a problem, then use a smaller fan-out for the clock gater (and don't do declone after that. "declone" is actually merging clock gaters together...). This will move the clock gater "near" to the FF, and thus have similiar clock latency.


    Regards,
    Eng Han

    PS: CD, could you send me a mail at enghan@eda-utilities.com. Would like to introduce some of the works I am doing to a experienced backend engineer like you.


    Originally posted in cdnusers.org by EngHan
    • Post Points: 0
  • Wed, Apr 12 2006 7:32 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Thanks everybody for your participation.

    A lot of good things have been said here and I would agree that good planning i.e. a clock gating aware architecture is the best path to success. However we don't always have that and even when you do have a FF drivign the enable it is no guarantee that your path will be as fast as it could as this path will meet timing without problem in synthesis and thing like arearecover migth actually slow this path down as there is a large positive slack. Granted post route optimization should be able to optimize this fairly easily but it will be better to have it as fast as possible to start with.

    In addition in many physical implementation the clock gating cells are duplicated based on where they are on the clock tree i.e.  havign a single instance of a clock gating near the root of the clock is good  for something control with a signal like IP_ON_OFF but not too practical if the enable is generated deep in the block. So depending on the physical distribution of the flop you want to gate the cloning strategy of your clock gating cell might differ and tool can now handle that in the backend the part they don't handle very well is the enable logic and mostly because there is no "universal constraints" to define those signal and constrained them

    We all agree that if you can have a backend estimate of your clock latency and strategy and a stable it is not too hard to solve this problem the problem is who ever get that and has a chance to go back all the way back to the syntheisis. In General at that point my manager is pushing to tape it out and screw it if it is not the best/most robust solution

    So from what I have read so far I take:

    • max_delay is a good sytnhesis only solution (i.e.  this can be interpreted differently by different tool )
    • path_adjust is  a good option to explore in RC. Does anybody knows if FE will understand that correctly even when switching from pre to post CTS timing?
    • Most accurate solution is to model the latency at each of the clock gating element and let the tool do the calculation however I am not sure this doesn't results in too many clocks and doesn't have a negative impact on your run time. Advantage is that this should be properly handled in pre vs post CTS timing .
    Now my wish is to get a set_afap constraint that could be used on those signals.

    On using false path on enable I will say be very carefull and this is fine for global  on/off signal for which you don't care when they occur, for thing that need to be cycled accurate (typically controllng write to config register, internal memoreis, ...) this is the best way to end up with gate level simulation not matching your functional  simulation despite the fact that your formal equivalency pass!! Ever been beaten by that one due to the clock latency makign your enable being seen one cycle too late?

    Eric.
     


    Originally posted in cdnusers.org by evenditti
    • Post Points: 0
  • Wed, Apr 12 2006 7:42 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    Ever been beaten by that one due to the clock latency making your enable being seen one cycle too late?
     

    Hence my comment about needing to design/structure this from scratch. if the system is designed to cope with the uncertainty, then it shouldn't be a problem (BOCTAOE)

    By the way, this structuring becomes even more important when you start to consider multiple-supply voltages, because turning the power onto a block is far slower than turning a clock on, and if your s/w can’t handle this delay/prediction for clocks, you’ll never manage it for voltage!!).

    CD


    Originally posted in cdnusers.org by crispy_duck
    • Post Points: 0
  • Wed, Apr 12 2006 7:51 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    CD,

    Well I have not yet seen a chip trying to do cycle accurate power up/down but I have seen that done (and done it myself) for clock fior more than 10 years.  As you say the latency through power switches is far too great.

    However I do agree that good planing is the key to all this unfortunately it is often quite difficult to achieve down to the lower level when re-using soft IPs  coming from internal team or outside. In general we end up witha  well define chip level clock architecture to go to the various IP with limited number of clock gating. isolation cells and level shifter. Where we start havign problems is when we reach the soft IP blocks which have all been written following different guidelines (Today's guidelines are not the one used 1 or 2 years ago) and unfortunately as the management see those IPs are done there are no ressources to re-open and fix those.

    But yes good architecture is the key to a happy inplementation engineer (especially if it comes with detail clock network diagram and balancing requirement)

    How many night did I dream about this? ....


    Originally posted in cdnusers.org by evenditti
    • Post Points: 0
  • Wed, Apr 12 2006 8:04 AM

    • archive
    • Top 75 Contributor
    • Joined on Fri, Jul 4 2008
    • Posts 88
    • Points 4,930
    RE: Properly optimizing enable to clock gating enable Reply

    But yes good architecture is the key to a happy implementation engineer (especially if it comes with detail clock network diagram and balancing requirement)


    I've been working more with the architects on our latest project to ensure the clocking, power and other items (usually left until the implementers get started) get addressed earlier. To early to say if it has helped, but atleast I could "guide" the guys in a direction I though feasible rather than their initial guess as what we could do!


    How many night did I dream about this? ....


    I tend to find better things to dream about ;-)

    CD


    Originally posted in cdnusers.org by crispy_duck
    • Post Points: 0
Page 1 of 1 (11 items)
Sort Posts:
Started by archive at 04 Apr 2006 11:13 AM. Topic has 10 replies.