Thank you for your hints, that was very helpful - we considered the ThroughPin spec to be mandatory for every generated clock, this does not seem to be correct.
We checked the clock trace, it shows that the registers in questions are sinks of the clock tree, so you are right, the ThroughPin does not seem to be necessary here. However, we do not reach a setup WNS better than -3 ns for the generated clock during postCTS and postRoute optimizations - even though there is enough space and the divided clock is really slow.
Do you have any other suggestions where we might locate the problem? According to the clock trace, the clock tree does not seem to be the problem.