39,338 research outputs found
Phylogenetic Trees and Their Analysis
Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant\u27s Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with perfect data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data
The invariances of power law size distributions
Size varies. Small things are typically more frequent than large things. The
logarithm of frequency often declines linearly with the logarithm of size. That
power law relation forms one of the common patterns of nature. Why does the
complexity of nature reduce to such a simple pattern? Why do things as
different as tree size and enzyme rate follow similarly simple patterns? Here I
analyze such patterns by their invariant properties. For example, a common
pattern should not change when adding a constant value to all observations.
That shift is essentially the renumbering of the points on a ruler without
changing the metric information provided by the ruler. A ruler is shift
invariant only when its scale is properly calibrated to the pattern being
measured. Stretch invariance corresponds to the conservation of the total
amount of something, such as the total biomass and consequently the average
size. Rotational invariance corresponds to pattern that does not depend on the
order in which underlying processes occur, for example, a scale that additively
combines the component processes leading to observed values. I use tree size as
an example to illustrate how the key invariances shape pattern. A simple
interpretation of common pattern follows. That simple interpretation connects
the normal distribution to a wide variety of other common patterns through the
transformations of scale set by the fundamental invariances.Comment: Added appendix discussing the lognormal distribution, updated to
match version 2 of published version at F1000Researc
Enhanced negative type for finite metric trees
Finite metric trees are known to have strict 1-negative type. In this paper
we introduce a new family of inequalities that quantify the extent of the
"strictness" of the 1-negative type inequalities for finite metric trees. These
inequalities of "enhanced 1-negative type" are sufficiently strong to imply
that any given finite metric tree must have strict p-negative type for all
values of p in an open interval that contains the number 1. Moreover, these
open intervals can be characterized purely in terms of the unordered
distribution of edge weights that determine the path metric on the particular
tree, and are therefore largely independent of the tree's internal geometry.
From these calculations we are able to extract a new non linear technique for
improving lower bounds on the maximal p-negative type of certain finite metric
spaces. Some pathological examples are also considered in order to stress
certain technical points.Comment: 35 pages, no figures. This is the final version of this paper sans
diagrams. Please note the corrected statement of Theorem 4.16 (and hence
inequality (1)). A scaling factor was omitted in Version #
- …