112 research outputs found
Tropical Principal Component Analysis and its Application to Phylogenetics
Principal component analysis is a widely-used method for the dimensionality
reduction of a given data set in a high-dimensional Euclidean space. Here we
define and analyze two analogues of principal component analysis in the setting
of tropical geometry. In one approach, we study the Stiefel tropical linear
space of fixed dimension closest to the data points in the tropical projective
torus; in the other approach, we consider the tropical polytope with a fixed
number of vertices closest to the data points. We then give approximative
algorithms for both approaches and apply them to phylogenetics, testing the
methods on simulated phylogenetic data and on an empirical dataset of
Apicomplexa genomes.Comment: 28 page
Tropical Geometry of Phylogenetic Tree Space: A Statistical Perspective
Phylogenetic trees are the fundamental mathematical representation of
evolutionary processes in biology. As data objects, they are characterized by
the challenges associated with "big data," as well as the complication that
their discrete geometric structure results in a non-Euclidean phylogenetic tree
space, which poses computational and statistical limitations. We propose and
study a novel framework to study sets of phylogenetic trees based on tropical
geometry. In particular, we focus on characterizing our framework for
statistical analyses of evolutionary biological processes represented by
phylogenetic trees. Our setting exhibits analytic, geometric, and topological
properties that are desirable for theoretical studies in probability and
statistics, as well as increased computational efficiency over the current
state-of-the-art. We demonstrate our approach on seasonal influenza data.Comment: 28 pages, 5 figures, 1 tabl
Maximum gradient embeddings and monotone clustering
Let (X,d_X) be an n-point metric space. We show that there exists a
distribution D over non-contractive embeddings into trees f:X-->T such that for
every x in X, the expectation with respect to D of the maximum over y in X of
the ratio d_T(f(x),f(y)) / d_X(x,y) is at most C (log n)^2, where C is a
universal constant. Conversely we show that the above quadratic dependence on
log n cannot be improved in general. Such embeddings, which we call maximum
gradient embeddings, yield a framework for the design of approximation
algorithms for a wide range of clustering problems with monotone costs,
including fault-tolerant versions of k-median and facility location.Comment: 25 pages, 2 figures. Final version, minor revision of the previous
one. To appear in "Combinatorica
On metric Ramsey-type phenomena
The main question studied in this article may be viewed as a nonlinear
analogue of Dvoretzky's theorem in Banach space theory or as part of Ramsey
theory in combinatorics. Given a finite metric space on n points, we seek its
subspace of largest cardinality which can be embedded with a given distortion
in Hilbert space. We provide nearly tight upper and lower bounds on the
cardinality of this subspace in terms of n and the desired distortion. Our main
theorem states that for any epsilon>0, every n point metric space contains a
subset of size at least n^{1-\epsilon} which is embeddable in Hilbert space
with O(\frac{\log(1/\epsilon)}{\epsilon}) distortion. The bound on the
distortion is tight up to the log(1/\epsilon) factor. We further include a
comprehensive study of various other aspects of this problem.Comment: 67 pages, published versio
Fat Polygonal Partitions with Applications to Visualization and Embeddings
Let be a rooted and weighted tree, where the weight of any node
is equal to the sum of the weights of its children. The popular Treemap
algorithm visualizes such a tree as a hierarchical partition of a square into
rectangles, where the area of the rectangle corresponding to any node in
is equal to the weight of that node. The aspect ratio of the
rectangles in such a rectangular partition necessarily depends on the weights
and can become arbitrarily high.
We introduce a new hierarchical partition scheme, called a polygonal
partition, which uses convex polygons rather than just rectangles. We present
two methods for constructing polygonal partitions, both having guarantees on
the worst-case aspect ratio of the constructed polygons; in particular, both
methods guarantee a bound on the aspect ratio that is independent of the
weights of the nodes.
We also consider rectangular partitions with slack, where the areas of the
rectangles may differ slightly from the weights of the corresponding nodes. We
show that this makes it possible to obtain partitions with constant aspect
ratio. This result generalizes to hyper-rectangular partitions in
. We use these partitions with slack for embedding ultrametrics
into -dimensional Euclidean space: we give a -approximation algorithm for embedding -point ultrametrics
into with minimum distortion, where denotes the spread
of the metric, i.e., the ratio between the largest and the smallest distance
between two points. The previously best-known approximation ratio for this
problem was polynomial in . This is the first algorithm for embedding a
non-trivial family of weighted-graph metrics into a space of constant dimension
that achieves polylogarithmic approximation ratio.Comment: 26 page
Online Embeddings
13th International Workshop, APPROX 2010, and 14th International Workshop, RANDOM 2010, Barcelona, Spain, September 1-3, 2010. ProceedingsWe initiate the study of on-line metric embeddings. In such an embedding we are given a sequence of n points X = x [subscript 1],...,x [subscript n] one by one, from a metric space M = (X,D). Our goal is to compute a low-distortion embedding of M into some host space, which has to be constructed in an on-line fashion, so that the image of each x i depends only on x [subscript 1],...,x [subscript i] . We prove several results translating existing embeddings to the on-line setting, for the case of embedding into ℓ [subscript p] spaces, and into distributions over ultrametrics
STOCHASTIC OPTIMIZATION FOR TROPICAL PRINCIPAL COMPONENT ANALYSIS OVER TREE SPACES
A known challenge in the rapidly growing area of phylogenomics is the lack of tools to analyze the large volume of genome data. Genomic data includes information on the evolution, structure and mapping of genomes. Phylogenetic trees are branching diagrams that show the evolutionary history of species and their genes. Gene trees show the evolutionary history of a particular gene. To analyze evolutionary history from genomic data, we reduce the dimensionality of gene trees, overcoming high dimensional analytical challenges. Through the vectorization of pairwise distances between each combination of two leaves within a phylogenetic tree, we utilize a tropical principle component analysis: a principal component analysis (PCA) in terms of a tropical metric. We project gene trees onto a two-dimensional space using a tropical PCA, a tropical convex hull that minimizes the sum of residuals between each gene tree in the dataset and its projection onto the tropical convex hull over the tree space, which is the set of all possible gene trees. Since computing a tropical PCA for the given dataset is computationally time intensive, we implement a Markov Chain Monte Carlo Metropolis-Hastings algorithm to effectively and efficiently estimate the tropical PCA. Utilizing simulation and real-world data, we implement our tropical PCA algorithm and visualize the results in two-dimensional plots, the results of which look promising and demonstrate our algorithm's strengths.http://archive.org/details/stochasticoptimi1094562731Major, United States ArmyApproved for public release; distribution is unlimited
- …