Maximum likelihood estimation of species trees and anomaly zone detection using ranked gene trees

Abstract

A phylogenetic tree represents the evolutionary relationships among a set of organisms. Gene trees can be used to reconstruct phylogenetic trees. The methods in this dissertation focus on the gene tree topologies with emphasis on ranked gene tree topologies. A ranked tree depicts the order in which nodes appear in the tree together with topological relationships among gene lineages. One challenge that arises during phylogenetic inference is the existence of the anomaly zones, the regions of branch-length space in the species tree that can produce gene trees that have topologies differing from the species tree topology but are more probable than the gene tree matching the species tree. In this work, we show how the parameters of a constant-rate birth-death process used to simulate species trees affect the probability that the species tree lies in the anomaly zone. We prove that the probability that a species tree is in an anomaly zone approaches 1 as the number of species and the birth rate go to infinity in a pure birth process. We propose a heuristic approach to infer whether species trees lie in the different types of anomaly zones trees when it is intractable to compute the entire distribution of gene tree topologies. In this dissertation, we develop the first maximum likelihood (ML) method that infers a species tree from the ranked gene trees. We introduce the software PRANC, which can compute the probabilities of ranked gene trees under the coalescent process and infer an ML species tree. We propose methods to estimate a starting tree to be able to locate the ML species tree quickly. To illustrate the methods proposed, we analyze two experimental studies of skinks and gibbons

    Similar works