We investigate the dependence of the site frequency spectrum (SFS) on the
topological structure of genealogical trees. We show that basic population
genetic statistics - for instance estimators of θ or neutrality tests
such as Tajima's D - can be decomposed into components of waiting times
between coalescent events and of tree topology. Our results clarify the
relative impact of the two components on these statistics. We provide a
rigorous interpretation of positive or negative values of an important class of
neutrality tests in terms of the underlying tree shape. In particular, we show
that values of Tajima's D and Fay and Wu's H depend in a direct way on a
peculiar measure of tree balance which is mostly determined by the root balance
of the tree. We present a new test for selection in the same class as Fay and
Wu's H and discuss its interpretation and power. Finally, we determine the
trees corresponding to extreme expected values of these neutrality tests and
present formulae for these extreme values as a function of sample size and
number of segregating sites.Comment: 23 pages, 8 figure