Andrew, Frank,
Let me get back to my original claim before I answer your two latest comments. Let's leave the filtration aside for now.
My claim was that it is enough to look at the first harmonic's jitter when analyzing a square wave's jitter.
Using what I just learnt on ponise -> jitter -> pm, I thought of the following experiment: Let us take the "Jee" result of PM jitter simulation as a reference point, with integration at the range [1K, Fc/2] and maxsideband=fullspectrum. Next, let us see if my claim holds, by running a "sources" simulation with maxsideband=1 at the range [1K,300*Fc], then integrate phase noise and extract the phase jitter. If results are close, that would be an indication for the correctness of my claim (I will speak about theory later).
So, I ran the simulation on the crystal oscillator that I currently design (fullspectrum resulted in 3).
For the PM option, jitter was 1.9ps for one of the edges and 1.96ps for the other.
For the "sources" option, jitter was 1.94ps.
I also tried a third option: "sources", fullspectrum, integration [1K, Fc/2]. Result was 1.2ps.
So I think that this is a strong indication to my claim's correctness.
Back to thoery, qouting:
If maxsideband is 1, you are only including the noise contributions from
sidebands +/-1 and 0 (so you will get very high up-converted flicker
noise if you sweep near to the carrier frequency). You won't get noise
contributions from higher frequencies "folded" by the harmonics of the
carrier because you simply are not including them. Sweeping over a wide
frequency range is NOT the same as including lots of sidebands - you are
sweeping the output frequency yes, but you've removed the simulator's
ability to compute all the transfer functions from the noise sources to
the output - you've only included three noise transfer functions.
I definitely agree that folding the high frequency content of the base harmonic noise is not the same as integrating the high frequency content of the high order harmonics, that were folded by the higher harmonics into the frequency range of interest (say [0,Fc/2]). BUT this is done on purpose, assuming that the claim is correct. A nice illustration of this is below, taken from spectreRF user guide.

So the point is that I believe that (1) only one skirt should be taken and (2) that skirt is folded by the sampling nature of the signal's edges hence its high frequency content should be integrated.