1,938 research outputs found

    MRL and SuperFine+MRL: new supertree methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.</p> <p>Results</p> <p>We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.</p> <p>Conclusions</p> <p>SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.</p

    Learning Latent Tree Graphical Models

    Get PDF
    We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efficient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our first algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures) on much smaller subsets of variables. This results in more accurate and efficient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world datasets by modeling the dependency structure of monthly stock returns in the S&P index and of the words in the 20 newsgroups dataset

    Phylogenetic Trees and Their Analysis

    Full text link
    Determining the best possible evolutionary history, the lowest-cost phylogenetic tree, to fit a given set of taxa and character sequences using maximum parsimony is an active area of research due to its underlying importance in understanding biological processes. As several steps in this process are NP-Hard when using popular, biologically-motivated optimality criteria, significant amounts of resources are dedicated to both both heuristics and to making exact methods more computationally tractable. We examine both phylogenetic data and the structure of the search space in order to suggest methods to reduce the number of possible trees that must be examined to find an exact solution for any given set of taxa and associated character data. Our work on four related problems combines theoretical insight with empirical study to improve searching of the tree space. First, we show that there is a Hamiltonian path through tree space for the most common tree metrics, answering Bryant\u27s Challenge for the minimal such path. We next examine the topology of the search space under various metrics, showing that some metrics have local maxima and minima even with perfect data, while some others do not. We further characterize conditions for which sequences simulated under the Jukes-Cantor model of evolution yield well-behaved search spaces. Next, we reduce the search space needed for an exact solution by splitting the set of characters into mutually-incompatible subsets of compatible characters, building trees based on the perfect phylogenies implied by these sets, and then searching in the neighborhoods of these trees. We validate this work empirically. Finally, we compare two approaches to the generalized tree alignment problem, or GTAP: Sequence alignment followed by tree search vs. Direct Optimization, on both biological and simulated data

    Unexpected high abyssal ophiuroid diversity in polymetallic nodule fields of the Northeast Pacific Ocean, and implications for conservation

    Get PDF
    The largest and commercially appealing mineral deposits can be found in the abyssal seafloor of the Clarion-Clipperton Zone (CCZ), a polymetallic nodule province, in the NE Pacific Ocean, where experimental mining is due to take place. In anticipation of deep-sea mining impacts, it has become essential to rapidly and accurately assess biodiversity. For this reason, ophiuroid material collected during seven scientific cruises from five exploration license areas within CCZ, one area protected from mining (APEI3, Area of Particular Environmental Interest) in the periphery of CCZ and the DIS-turbance and re-COLonisation (DISCOL) Experimental Area (DEA), in the SE Pacific Ocean, was examined. Specimens were genetically analysed using a fragment of the mitochondrial cytochrome c oxidase subunit I (COI). Maximum Likelihood and Neighbour Joining trees were constructed, while four tree-based and distance-based methods of species delineation (ABGD, BINs, GMYC, mPTP) were employed to propose Secondary Species Hypotheses (SSHs) within the ophiuroids collected. The species delimitations analyses concordant results revealed the presence of 43 deep-sea brittle stars SSHs, revealing an unexpectedly high diversity and showing that the most conspicuous invertebrates in abyssal plains have been so far considerably under-estimated. The number of SSHs found in each area varied from 5 (IFREMER area) to 24 (BGR area), while 13 SSHs were represented by singletons. None of the SSHs was found to be present in all 7 areas, while the majority of species (44.2 %) had a single-area presence (19 SSHs). The most common species were Ophioleucidae sp. (Species 29), Amphioplus daleus (Species 2) and Ophiosphalma glabrum (Species 3), present in all areas except APEI3. The biodiversity patterns could be mainly attributed to POC fluxes that could explain the highest species numbers found in BGR (German contractor area) and UKSRL (UK contractor area) areas. The five exploration contract areas belong to a mesotrophic province, while in contrary the APEI3 is located in an oligotrophic province which could explain the lowest diversity as well as very low similarity with the other six study areas. Based on these results the representativeness and the appropriateness of APEI3 to meet its purpose of preserving the biodiversity of the CCZ fauna are questioned. Finally, this study provides the foundation for biogeographic and functional analyses that will provide insight into the drivers of species diversity and its role in ecosystem function

    Unexpected high abyssal ophiuroid diversity in polymetallic nodule fields of the Northeast Pacific Ocean, and implications for conservation

    Get PDF
    The largest and commercially appealing mineral deposits can be found in the abyssal seafloor of the Clarion-Clipperton Zone (CCZ), a polymetallic nodule province, in the NE Pacific Ocean, where experimental mining is due to take place. In anticipation of deep-sea mining impacts, it has become essential to rapidly and accurately assess biodiversity. For this reason, ophiuroid material collected during seven scientific cruises from five exploration license areas within CCZ, one area protected from mining (APEI3, Area of Particular Environmental Interest) in the periphery of CCZ and the DIS-turbance and re-COLonisation (DISCOL) Experimental Area (DEA), in the SE Pacific Ocean, was examined. Specimens were genetically analysed using a fragment of the mitochondrial cytochrome c oxidase subunit I (COI). Maximum Likelihood and Neighbour Joining trees were constructed, while four tree-based and distance-based methods of species delineation (ABGD, BINs, GMYC, mPTP) were employed to propose Secondary Species Hypotheses (SSHs) within the ophiuroids collected. The species delimitations analyses concordant results revealed the presence of 43 deep-sea brittle stars SSHs, revealing an unexpectedly high diversity and showing that the most conspicuous invertebrates in abyssal plains have been so far considerably under-estimated. The number of SSHs found in each area varied from 5 (IFREMER area) to 24 (BGR area), while 13 SSHs were represented by singletons. None of the SSHs was found to be present in all 7 areas, while the majority of species (44.2 %) had a single-area presence (19 SSHs). The most common species were Ophioleucidae sp. (Species 29), Amphioplus daleus (Species 2) and Ophiosphalma glabrum (Species 3), present in all areas except APEI3. The biodiversity patterns could be mainly attributed to POC fluxes that could explain the highest species numbers found in BGR (German contractor area) and UKSRL (UK contractor area) areas. The five exploration contract areas belong to a mesotrophic province, while in contrary the APEI3 is located in an oligotrophic province which could explain the lowest diversity as well as very low similarity with the other six study areas. Based on these results the representativeness and the appropriateness of APEI3 to meet its purpose of preserving the biodiversity of the CCZ fauna are questioned. Finally, this study provides the foundation for biogeographic and functional analyses that will provide insight into the drivers of species diversity and its role in ecosystem function

    Phylogeography of Pogonomyrmex barbatus and P. rugosus Harvester Ants: A Complex Regional History of Ancient Vicariance and Recent Expansion in Arid- Adapted Insects, and Implications for the Success of Cryptic Hybrid Lineages with GCD

    Get PDF
    abstract: Here I present a phylogeographic study of at least six reproductively isolated lineages of harvester ants within the Pogonomyrmex barbatus and P. rugosus species group. The genetic and geographic relationships within this clade are complex: four of the identified lineages are divided into two pairs, and each pair has evolved under a mutualistic system that necessitates sympatry. These paired lineages are dependent upon one another because interlineage matings within each pair are the sole source of hybrid F1 workers; these workers build and sustain the colonies, facilitating the production of the reproductive caste, which results solely from intralineage fertilizations. This system of genetic caste determination (GCD) maintains genetic isolation among these closely related lineages, while simultaneously requiring co-expansion and emigration as their distributions have changed over time. Previous studies have also demonstrated that three of the four lineages displaying this unique genetic caste determination phenotype are of hybrid origin. Thus, reconstructing the phylogenetic and geographic history of this group allows us to evaluate past insights and plan future inquiries in a more complete historical biogeographic context. Using mitochondrial DNA sequences sampled across most of the morphospecies' ranges in the U.S. and Mexico, I employed several methods of phylogenetic and DNA sequence analysis, along with comparisons to geological, biogeographic, and phylogeographic studies throughout the sampled regions. These analyses on Pogonomyrmex harvester ants reveal a complex pattern of vicariance and dispersal that is largely concordant with models of late Miocene, Pliocene, and Pleistocene range shifts among various arid-adapted taxa in North America.Dissertation/ThesisM.S. Biology 201

    Aboveground biomass density models for NASA's Global Ecosystem Dynamics Investigation (GEDI) lidar mission

    Get PDF
    NASA's Global Ecosystem Dynamics Investigation (GEDI) is collecting spaceborne full waveform lidar data with a primary science goal of producing accurate estimates of forest aboveground biomass density (AGBD). This paper presents the development of the models used to create GEDI's footprint-level (similar to 25 m) AGBD (GEDI04_A) product, including a description of the datasets used and the procedure for final model selection. The data used to fit our models are from a compilation of globally distributed spatially and temporally coincident field and airborne lidar datasets, whereby we simulated GEDI-like waveforms from airborne lidar to build a calibration database. We used this database to expand the geographic extent of past waveform lidar studies, and divided the globe into four broad strata by Plant Functional Type (PFT) and six geographic regions. GEDI's waveform-to-biomass models take the form of parametric Ordinary Least Squares (OLS) models with simulated Relative Height (RH) metrics as predictor variables. From an exhaustive set of candidate models, we selected the best input predictor variables, and data transformations for each geographic stratum in the GEDI domain to produce a set of comprehensive predictive footprint-level models. We found that model selection frequently favored combinations of RH metrics at the 98th, 90th, 50th, and 10th height above ground-level percentiles (RH98, RH90, RH50, and RH10, respectively), but that inclusion of lower RH metrics (e.g. RH10) did not markedly improve model performance. Second, forced inclusion of RH98 in all models was important and did not degrade model performance, and the best performing models were parsimonious, typically having only 1-3 predictors. Third, stratification by geographic domain (PFT, geographic region) improved model performance in comparison to global models without stratification. Fourth, for the vast majority of strata, the best performing models were fit using square root transformation of field AGBD and/or height metrics. There was considerable variability in model performance across geographic strata, and areas with sparse training data and/or high AGBD values had the poorest performance. These models are used to produce global predictions of AGBD, but will be improved in the future as more and better training data become available

    Paedomorphosis, Secondary Woodiness, and Insular Woodiness in Plants.

    Get PDF
    The related concepts of paedomorphosis in the secondary xylem, insular woodiness, and secondary woodiness are reviewed and evaluated in order to clearly distinguish the phenomenon involved, and provide a firm foundation for future research in this area. The theory of paedomorphosis refers to the occurrence of certain juvenile xylem characteristics, such as scalariform perforation plates and lateral wall pitting, in the secondary xylem of shrubby, suffrutescent, pachycaulous, and lianoid growth forms. Paedomorphic characteristics are often found in insular woody species, a fact that has caused paedomorphosis to be associated with secondary woodiness. The anatomy of the secondary xylem in Xanthorhiza simplicissima (Ranunculaceae), Coreopsis gigantea (Asteraceae), and Mahonia bealei (Berberidaceae) is described in order to provide specific data for discussion. These species serve as test cases for the presence of paedomorphosis, and the evolution of secondary woodiness. The secondary xylem of all three species was found to have a degree of paedomorphosis, with Coreopsis having the greatest number of paedomorphic characteristics, Xanthorhiza having an intermediate number, and Mahonia possessing only a single characteristic. Plotting the occurrence of the character states woody and nonwoody on phylogenetic trees containing these taxa shows that Coreopsis is secondarily woody, while the ancestry of the other two species cannot be unambiguously established. These results must, however, be considered preliminary as the occurrence of secondary growth in many “herbaceous” plants often goes unreported. Although paedomorphosis is often associated with secondary woodiness, there are examples of paedomorphic wood in primitively woody taxa. One conclusion is that the degree of paedomorphosis may be a better indicator of the mechanical requirements of the shoot then of its evolutionary history
    corecore