STOCHASTIC OPTIMIZATION FOR TROPICAL PRINCIPAL COMPONENT ANALYSIS OVER TREE SPACES

Abstract

A known challenge in the rapidly growing area of phylogenomics is the lack of tools to analyze the large volume of genome data. Genomic data includes information on the evolution, structure and mapping of genomes. Phylogenetic trees are branching diagrams that show the evolutionary history of species and their genes. Gene trees show the evolutionary history of a particular gene. To analyze evolutionary history from genomic data, we reduce the dimensionality of gene trees, overcoming high dimensional analytical challenges. Through the vectorization of pairwise distances between each combination of two leaves within a phylogenetic tree, we utilize a tropical principle component analysis: a principal component analysis (PCA) in terms of a tropical metric. We project gene trees onto a two-dimensional space using a tropical PCA, a tropical convex hull that minimizes the sum of residuals between each gene tree in the dataset and its projection onto the tropical convex hull over the tree space, which is the set of all possible gene trees. Since computing a tropical PCA for the given dataset is computationally time intensive, we implement a Markov Chain Monte Carlo Metropolis-Hastings algorithm to effectively and efficiently estimate the tropical PCA. Utilizing simulation and real-world data, we implement our tropical PCA algorithm and visualize the results in two-dimensional plots, the results of which look promising and demonstrate our algorithm's strengths.http://archive.org/details/stochasticoptimi1094562731Major, United States ArmyApproved for public release; distribution is unlimited

    Similar works