1 research outputs found
On the variance of internode distance under the multispecies coalescent
We consider the problem of estimating species trees from unrooted gene tree
topologies in the presence of incomplete lineage sorting, a common phenomenon
that creates gene tree heterogeneity in multilocus datasets. One popular class
of reconstruction methods in this setting is based on internode distances, i.e.
the average graph distance between pairs of species across gene trees. While
statistical consistency in the limit of large numbers of loci has been
established in some cases, little is known about the sample complexity of such
methods. Here we make progress on this question by deriving a lower bound on
the worst-case variance of internode distance which depends linearly on the
corresponding graph distance in the species tree. We also discuss some
algorithmic implications