39,338 research outputs found

    Phylogenetic Trees and Their Analysis

    Full text link
    Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant\u27s Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with perfect data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data

    The invariances of power law size distributions

    Full text link
    Size varies. Small things are typically more frequent than large things. The logarithm of frequency often declines linearly with the logarithm of size. That power law relation forms one of the common patterns of nature. Why does the complexity of nature reduce to such a simple pattern? Why do things as different as tree size and enzyme rate follow similarly simple patterns? Here I analyze such patterns by their invariant properties. For example, a common pattern should not change when adding a constant value to all observations. That shift is essentially the renumbering of the points on a ruler without changing the metric information provided by the ruler. A ruler is shift invariant only when its scale is properly calibrated to the pattern being measured. Stretch invariance corresponds to the conservation of the total amount of something, such as the total biomass and consequently the average size. Rotational invariance corresponds to pattern that does not depend on the order in which underlying processes occur, for example, a scale that additively combines the component processes leading to observed values. I use tree size as an example to illustrate how the key invariances shape pattern. A simple interpretation of common pattern follows. That simple interpretation connects the normal distribution to a wide variety of other common patterns through the transformations of scale set by the fundamental invariances.Comment: Added appendix discussing the lognormal distribution, updated to match version 2 of published version at F1000Researc

    Enhanced negative type for finite metric trees

    Get PDF
    Finite metric trees are known to have strict 1-negative type. In this paper we introduce a new family of inequalities that quantify the extent of the "strictness" of the 1-negative type inequalities for finite metric trees. These inequalities of "enhanced 1-negative type" are sufficiently strong to imply that any given finite metric tree must have strict p-negative type for all values of p in an open interval that contains the number 1. Moreover, these open intervals can be characterized purely in terms of the unordered distribution of edge weights that determine the path metric on the particular tree, and are therefore largely independent of the tree's internal geometry. From these calculations we are able to extract a new non linear technique for improving lower bounds on the maximal p-negative type of certain finite metric spaces. Some pathological examples are also considered in order to stress certain technical points.Comment: 35 pages, no figures. This is the final version of this paper sans diagrams. Please note the corrected statement of Theorem 4.16 (and hence inequality (1)). A scaling factor was omitted in Version #
    • …
    corecore