We apply the theory of markov random fields on trees to derive a phase
transition in the number of samples needed in order to reconstruct phylogenies.
We consider the Cavender-Farris-Neyman model of evolution on trees, where all
the inner nodes have degree at least 3, and the net transition on each edge is
bounded by e. Motivated by a conjecture by M. Steel, we show that if 2 (1 - 2
e) (1 - 2e) > 1, then for balanced trees, the topology of the underlying tree,
having n leaves, can be reconstructed from O(log n) samples (characters) at the
leaves. On the other hand, we show that if 2 (1 - 2 e) (1 - 2 e) < 1, then
there exist topologies which require at least poly(n) samples for
reconstruction.
Our results are the first rigorous results to establish the role of phase
transitions for markov random fields on trees as studied in probability,
statistical physics and information theory to the study of phylogenies in
mathematical biology.Comment: To appear in Transactions of the AM