Search CORE

7 research outputs found

Tropical Principal Component Analysis and its Application to Phylogenetics

Author: Yoshida Ruriko
Zhang Leon
Zhang Xu
Publication venue
Publication date: 14/10/2017
Field of study

Principal component analysis is a widely-used method for the dimensionality reduction of a given data set in a high-dimensional Euclidean space. Here we define and analyze two analogues of principal component analysis in the setting of tropical geometry. In one approach, we study the Stiefel tropical linear space of fixed dimension closest to the data points in the tropical projective torus; in the other approach, we consider the tropical polytope with a fixed number of vertices closest to the data points. We then give approximative algorithms for both approaches and apply them to phylogenetics, testing the methods on simulated phylogenetic data and on an empirical dataset of Apicomplexa genomes.Comment: 28 page

arXiv.org e-Print Archive

Calhoun, Institutional Archive of the Naval Postgraduate School

Maximum Likelihood Estimation of Log-Concave Densities on Tree Space

Author: Sei Tomonari
Takazawa Yuki
Publication venue
Publication date: 22/11/2022
Field of study

Phylogenetic trees are key data objects in biology, and the method of phylogenetic reconstruction has been highly developed. The space of phylogenetic trees is a nonpositively curved metric space. Recently, statistical methods to analyze the set of trees on this space are being developed utilizing this property. Meanwhile, in Euclidean space, the log-concave maximum likelihood method has emerged as a new nonparametric method for probability density estimation. In this paper, we derive a sufficient condition for the existence and uniqueness of the log-concave maximum likelihood estimator on tree space. We also propose an estimation algorithm for one and two dimensions. Since various factors affect the inferred trees, it is difficult to specify the distribution of sample trees. The class of log-concave densities is nonparametric, and yet the estimation can be conducted by the maximum likelihood method without selecting hyperparameters. We compare the estimation performance with a previously developed kernel density estimator numerically. In our examples where the true density is log-concave, we demonstrate that our estimator has a smaller integrated squared error when the sample size is large. We also conduct numerical experiments of clustering using the Expectation-Maximization (EM) algorithm and compare the results with k-means++ clustering using Fr\'echet mean.Comment: 41 pages, 10 figure

arXiv.org e-Print Archive

系統樹の空間における形状制約つき密度推定 (ベイズ法と統計的推測)

Author: 清智也
髙澤祐槻
Publication venue: 'Research Institute for Mathematical Sciences, Kyoto University'
Publication date: 01/06/2022
Field of study

近年，推測された系統樹の集合をある距離空間上に埋め込み統計的に分析する研究がなされている．本稿では，この系統樹空間における統計手法のいくつかを紹介した上で，ノンパラメトリック密度推定手法の一つである対数凹最尤推定を系統樹の空間に応用することを考える．この推定量の存在条件，一意性に関する結果と低次元の場合の計算アルゴリズムを示し，既存のカーネル密度推定の手法と数値的に精度比較を行う

Kyoto University Research Information Repository

STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE

Author: Weyenberg Grady S.
Publication venue: UKnowledge
Publication date: 01/01/2015
Field of study

This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists. For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics. The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method. A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method

University of Kentucky

Facility Location in the Phylogenetic Tree Space

Author: Botte Marco
Publication venue
Publication date: 28/02/2019
Field of study

Georg-August-University Göttingen