This is a classical design challenge and a common source of difficulty. I'm glad you raised the question, even if it is a tough subject to tackle in a forum like this. It's tough because it cuts across many areas of functionality in the tool: placement, cts, and optimization.
Two things I'd mention for your consideration:
"ckCloneGate" will attempt to push the integrated clock gating cell (ICG) down as close as possible to the flops. If successful this will reduce the amount of skew you're seeing for paths ending on enable pins of ICGs.
If you're using 8.1, you could try the "setOptMode -clkGateAware true" option (Note: This options serves a different purpose than setPlaceMode -clkGateAware!) which automatically models the earlier arrival time of the clock to the ICG clock pin, which gives preCTS optimization the ability to see these violations and potentially fix them if possible. If you're using a release newer than 8.1, you could achieve similar with a script in the gifts (<install>/share/fe/gift/scripts/tcl/) directory: "userSetClkLatToIcgCkPins.tcl".
Maybe you could consider these options and share some additional information on the nature of the problem you're seeing. Specifically, do you think the problem is a matter of preCTS optimization not seeing these violations? Or whether clock gate cloning is needed to push the ICGs farther down the tree?
Hope this helps,