7 research outputs found
Tropical Principal Component Analysis and its Application to Phylogenetics
Principal component analysis is a widely-used method for the dimensionality
reduction of a given data set in a high-dimensional Euclidean space. Here we
define and analyze two analogues of principal component analysis in the setting
of tropical geometry. In one approach, we study the Stiefel tropical linear
space of fixed dimension closest to the data points in the tropical projective
torus; in the other approach, we consider the tropical polytope with a fixed
number of vertices closest to the data points. We then give approximative
algorithms for both approaches and apply them to phylogenetics, testing the
methods on simulated phylogenetic data and on an empirical dataset of
Apicomplexa genomes.Comment: 28 page
Maximum Likelihood Estimation of Log-Concave Densities on Tree Space
Phylogenetic trees are key data objects in biology, and the method of
phylogenetic reconstruction has been highly developed. The space of
phylogenetic trees is a nonpositively curved metric space. Recently,
statistical methods to analyze the set of trees on this space are being
developed utilizing this property. Meanwhile, in Euclidean space, the
log-concave maximum likelihood method has emerged as a new nonparametric method
for probability density estimation. In this paper, we derive a sufficient
condition for the existence and uniqueness of the log-concave maximum
likelihood estimator on tree space. We also propose an estimation algorithm for
one and two dimensions. Since various factors affect the inferred trees, it is
difficult to specify the distribution of sample trees. The class of log-concave
densities is nonparametric, and yet the estimation can be conducted by the
maximum likelihood method without selecting hyperparameters. We compare the
estimation performance with a previously developed kernel density estimator
numerically. In our examples where the true density is log-concave, we
demonstrate that our estimator has a smaller integrated squared error when the
sample size is large. We also conduct numerical experiments of clustering using
the Expectation-Maximization (EM) algorithm and compare the results with
k-means++ clustering using Fr\'echet mean.Comment: 41 pages, 10 figure
系統樹の空間における形状制約つき密度推定 (ベイズ法と統計的推測)
近年,推測された系統樹の集合をある距離空間上に埋め込み統計的に分析する研究がなされている.本稿では,この系統樹空間における統計手法のいくつかを紹介した上で,ノンパラメトリック密度推定手法の一つである対数凹最尤推定を系統樹の空間に応用することを考える.この推定量の存在条件,一意性に関する結果と低次元の場合の計算アルゴリズムを示し,既存のカーネル密度推定の手法と数値的に精度比較を行う
STATISTICS IN THE BILLERA-HOLMES-VOGTMANN TREESPACE
This dissertation is an effort to adapt two classical non-parametric statistical techniques, kernel density estimation (KDE) and principal components analysis (PCA), to the Billera-Holmes-Vogtmann (BHV) metric space for phylogenetic trees. This adaption gives a more general framework for developing and testing various hypotheses about apparent differences or similarities between sets of phylogenetic trees than currently exists.
For example, while the majority of gene histories found in a clade of organisms are expected to be generated by a common evolutionary process, numerous other coexisting processes (e.g. horizontal gene transfers, gene duplication and subsequent neofunctionalization) will cause some genes to exhibit a history quite distinct from the histories of the majority of genes. Such “outlying” gene trees are considered to be biologically interesting and identifying these genes has become an important problem in phylogenetics.
The R sofware package kdetrees, developed in Chapter 2, contains an implementation of the kernel density estimation method. The primary theoretical difficulty involved in this adaptation concerns the normalizion of the kernel functions in the BHV metric space. This problem is addressed in Chapter 3. In both chapters, the software package is applied to both simulated and empirical datasets to demonstrate the properties of the method.
A few first theoretical steps in adaption of principal components analysis to the BHV space are presented in Chapter 4. It becomes necessary to generalize the notion of a set of perpendicular vectors in Euclidean space to the BHV metric space, but there some ambiguity about how to best proceed. We show that convex hulls are one reasonable approach to the problem. The Nye-PCA- algorithm provides a method of projecting onto arbitrary convex hulls in BHV space, providing the core of a modified PCA-type method