73 research outputs found

    Prediction of missing sequences and branch lengths in phylogenomic data

    Get PDF
    This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordDiego Darriba, Michael Weiß, Alexandros Stamatakis; Prediction of missing sequences and branch lengths in phylogenomic data, Bioinformatics, Volume 32, Issue 9, 1 May 2016, Pages 1331–1337, is available online at: https://doi.org/10.1093/bioinformatics/btv768[Abstract] Motivation: The presence of missing data in large-scale phylogenomic datasets has negative effects on the phylogenetic inference process. One effect that is caused by alignments with missing per-gene or per-partition sequences is that the inferred phylogenies may exhibit extremely long branch lengths. We investigate if statistically predicting missing sequences for organisms by using information from genes/partitions that have data for these organisms alleviates the problem and improves phylogenetic accuracy. Results: We present several algorithms for correcting excessively long branch lengths induced by missing data. We also present methods for predicting/imputing missing sequence data. We evaluate our algorithms by systematically removing sequence data from three empirical and 100 simulated alignments. We then compare the Maximum Likelihood trees inferred from the gappy alignments and on the alignments with predicted sequence data to the trees inferred from the original, complete datasets. The datasets with predicted sequences showed one to two orders of magnitude more accurate branch lengths compared to the branch lengths of the trees inferred from the alignments with missing data. However, prediction did not affect the RF distances between the trees

    LP-VIcode: a program to compute a suite of variational chaos indicators

    Get PDF
    An important point in analyzing the dynamics of a given stellar or planetary system is the reliable identification of the chaotic or regular behavior of its orbits. We introduce here the program LP-VIcode, a fully operational code which efficiently computes a suite of ten variational chaos indicators for dynamical systems in any number of dimensions. The user may choose to simultaneously compute any number of chaos indicators among the following: the Lyapunov Exponents, the Mean Exponential Growth factor of Nearby Orbits, the Slope Estimation of the largest Lyapunov Characteristic Exponent, the Smaller ALignment Index, the Generalized ALignment Index, the Fast Lyapunov Indicator, the Orthogonal Fast Lyapunov Indicator, the dynamical Spectra of Stretching Numbers, the Spectral Distance, and the Relative Lyapunov Indicator. They are combined in an efficient way, allowing the sharing of differential equations whenever this is possible, and the individual stopping of their computation when any of them saturates.Instituto de Astrofísica de La PlataFacultad de Ciencias Astronómicas y Geofísica

    LP-VIcode: a program to compute a suite of variational chaos indicators

    Get PDF
    An important point in analyzing the dynamics of a given stellar or planetary system is the reliable identification of the chaotic or regular behavior of its orbits. We introduce here the program LP-VIcode, a fully operational code which efficiently computes a suite of ten variational chaos indicators for dynamical systems in any number of dimensions. The user may choose to simultaneously compute any number of chaos indicators among the following: the Lyapunov Exponents, the Mean Exponential Growth factor of Nearby Orbits, the Slope Estimation of the largest Lyapunov Characteristic Exponent, the Smaller ALignment Index, the Generalized ALignment Index, the Fast Lyapunov Indicator, the Orthogonal Fast Lyapunov Indicator, the dynamical Spectra of Stretching Numbers, the Spectral Distance, and the Relative Lyapunov Indicator. They are combined in an efficient way, allowing the sharing of differential equations whenever this is possible, and the individual stopping of their computation when any of them saturates.Instituto de Astrofísica de La PlataFacultad de Ciencias Astronómicas y Geofísica

    High-performance computing selection of models of DNA substitution for multicore clusters

    Get PDF
    [Abstract] This paper presents the high-performance computing (HPC) support of jModelTest2, the most popular bioinformatic tool for the statistical selection of models of DNA substitution. As this can demand vast computational resources, especially in terms of processing power, jModelTest2 implements three parallel algorithms for model selection: (1) a multithreaded implementation for shared memory architectures; (2) a message-passing implementation for distributed memory architectures, such as clusters; and (3) a hybrid shared/distributed memory implementation for clusters of multicore nodes, combining the workload distribution across cluster nodes with a multithreaded model optimization within each node. The main limitation of the shared and distributed versions is the workload imbalance that generally appears when using more than 32 cores, a direct consequence of the heterogeneity in the computational cost of the evaluated models. The hybrid shared/distributed memory version overcomes this issue reducing the workload imbalance through a thread-based decomposition of the most costly model optimization tasks. The performance evaluation of this HPC application on a 40-core shared memory system and on a 528-core cluster has shown high scalability, with speedups of the multithreaded version of up to 32, and up to 257 for the hybrid shared/distributed memory implementation. This can represent a reduction in the execution time of some analyses from 4 days down to barely 20 minutes. The implementation of the three parallel execution strategies of jModelTest2 presented in this paper are available under a GPL license at http://code.google.com/jmodeltest2.European Research Council; ERC-2007-Stg 203161-PHYGENOM to D.P.Ministerio de Ciencia y Educación; BFU2009-08611 to D.P.Ministerio de Ciencia y Educación; TIN2010-16735 to R.D

    jmodeltest.org: selection of nucleotide substitution models on the cloud

    Get PDF
    This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record Jose Manuel Santorum, Diego Darriba, Guillermo L. Taboada, David Posada; jmodeltest.org: selection of nucleotide substitution models on the cloud, Bioinformatics, Volume 30, Issue 9, 1 May 2014, Pages 1310–1311, https://doi.org/10.1093/bioinformatics/btu032[Abstract] The selection of models of nucleotide substitution is one of the major steps of modern phylogenetic analysis. Different tools exist to accomplish this task, among which jModelTest 2 (jMT2) is one of the most popular. Still, to deal with large DNA alignments with hundreds or thousands of loci, users of jMT2 need to have access to High Performance Computing clusters, including installation and configuration capabilities, conditions not always met. Here we present jmodeltest.org, a novel web server for the transparent execution of jMT2 across different platforms and for a wide range of users. Its main benefit is straightforward execution, avoiding any configuration/execution issues, and reducing significantly in most cases the time required to complete the analysis. Availability and implementation:jmodeltest.org is accessible using modern browsers, such as Firefox, Chrome, Opera, Safari and IE from http://jmodeltest.org. User registration is not mandatory, but users wanting to have additional functionalities, like access to previous analyses, have the possibility of opening a user account.European Research Council; 2007-Stg 203161- PHYGENOMMinisterio de Ciencia e Innovación; TIN2010-1673

    Does the choice of nucleotide substitution models matter topologically?

    Get PDF
    Background: In the context of a master level programming practical at the computer science department of the Karlsruhe Institute of Technology, we developed and make available an open-source code for testing all 203 possible nucleotide substitution models in the Maximum Likelihood (ML) setting under the common Akaike, corrected Akaike, and Bayesian information criteria. We address the question if model selection matters topologically, that is, if conducting ML inferences under the optimal, instead of a standard General Time Reversible model, yields different tree topologies. We also assess, to which degree models selected and trees inferred under the three standard criteria (AIC, AICc, BIC) differ. Finally, we assess if the definition of the sample size (#sites versus #sites × #taxa) yields different models and, as a consequence, different tree topologies. Results: We find that, all three factors (by order of impact: nucleotide model selection, information criterion used, sample size definition) can yield topologically substantially different final tree topologies (topological difference exceeding 10 %) for approximately 5 % of the tree inferences conducted on the 39 empirical datasets used in our study. Conclusions: We find that, using the best-fit nucleotide substitution model may change the final ML tree topology compared to an inference under a default GTR model. The effect is less pronounced when comparing distinct information criteria. Nonetheless, in some cases we did obtain substantial topological differences

    ModelTest-NG: A New and Scalable Tool for the Selection of DNA and Protein Evolutionary Models

    Get PDF
    ModelTest-NG is a reimplementation fromscratch of jModelTest and ProtTest, two popular tools for selecting the best-fit nucleotide and amino acid substitution models, respectively. ModelTest-NG is one to two orders of magnitude faster than jModelTest and ProtTest but equally accurate and introduces several new features, such as ascertainment bias correction, mixture, and free-rate models, or the automatic processing of single partitions

    The Wall Lizards of the Balkan Peninsula: Tackling Questions at the Interphase of Phylogenomics and Population Genomics

    Get PDF
    [Abstract] Wall lizards of the genus Podarcis (Sauria, Lacertidae) are the predominant reptile group in southern Europe, including 24 recognized species. Mitochondrial DNA data have shown that, with the exception of P. muralis, the Podarcis species distributed in the Balkan peninsula form a species group that is further sub-divided into two subgroups: the one of “P. tauricus” consisting of P. tauricus, P. milensis, P. gaigeae, and P. melisellensis, and the other of “P. erhardii” comprising P. erhardii, P. levendis, P. cretensis, and P. peloponnesiacus. In an attempt to explore the Balkan Podarcis phylogenomic relationships, assess the levels of genetic structure and to re-evaluate the number of extant species, we employed phylogenomic and admixture approaches on ddRADseq (double digested Restriction site Associated DNA sequencing) genomic data. With this efficient Next Generation Sequencing approach, we were able to obtain a large number of genomic loci randomly distributed throughout the genome and use them to resolve the previously obscure phylogenetic relationships among the different Podarcis species distributed in the Balkans. The obtained phylogenomic relationships support the monophyly of both aforementioned subgroups and revealed several divergent lineages within each subgroup, stressing the need for taxonomic re-evaluation of Podarcis’ species in Balkans. The phylogenomic trees and the species delimitation analyses confirmed all recently recognized species (P. levendis, P. cretensis, and P. ionicus) and showed the presence of at least two more species, one in P. erhardii and the other in P. peloponnesiacus.This study was funded by NSFR 2007-2013 programme for development, European Social Fund, Operational Programme, Education and Lifelong Learning investing in knowledge society, Ministry of Education and Religious Affairs, Managing Authority, Co-financed by Greece and the European Union. Part of this work was funded by the Klaus Tschira Foundation, by the Ministry of Science and Innovation of Spain (PID2019-104184RB-I00 / AEI / 10.13039/501100011033), and by the Xunta de Galicia and FEDER funds of the EU under the Centro de Investigación de Galicia accreditation 2019-2022 (ED431G 2019/01)Xunta de Galicia; ED431G 2019/0

    La fuerza de las palabras. Conceptos que ganan elecciones

    Get PDF
    La finalidad de este estudio fue realizar una demostración del poder que las palabras pueden llegar a tener en el contexto de la publicidad institucional o política. El marco de estudio fue la comunicación política en España, y el estudio se basó en los discursos de los líderes de las diferentes fuerzas políticas en 2015. La principal dificultad en el transcurso de este trabajo fue la celeridad con que se suceden los cambios y la multiplicidad de las informaciones. Esto constituía una traba a la hora de realizar los análisis de los discursos, ya que diariamente aparecían nuevas intervenciones de los diferentes líderes de cada partido. Y por otro lado, las principales fuerzas políticas sufrieron algunos cambios en el transcurso de la investigación. Además, a la hora de analizar los discursos, sólo PP y PSOE poseían en su página web un espacio con todas sus intervenciones; mientras que tanto Podemos como Ciudadanos tan sólo almacenaban resúmenes de sus intervenciones públicas, por lo que el análisis de estos discursos resultó más complejo al tener que analizar el vídeo del discurso, y no disponer del material escrito

    The Phylogenetic Likelihood Library

    Get PDF
    [Abstract] We introduce the Phylogenetic Likelihood Library (PLL), a highly optimized application programming interface for developing likelihood-based phylogenetic inference and postanalysis software. The PLL implements appropriate data structures and functions that allow users to quickly implement common, error-prone, and labor-intensive tasks, such as likelihood calculations, model parameter as well as branch length optimization, and tree space exploration. The highly optimized and parallelized implementation of the phylogenetic likelihood function and a thorough documentation provide a framework for rapid development of scalable parallel phylogenetic software. By example of two likelihood-based phylogenetic codes we show that the PLL improves the sequential performance of current software by a factor of 2–10 while requiring only 1 month of programming time for integration. We show that, when numerical scaling for preventing floating point underflow is enabled, the double precision likelihood calculations in the PLL are up to 1.9 times faster than those in BEAGLE. On an empirical DNA dataset with 2000 taxa the AVX version of PLL is 4 times faster than BEAGLE (scaling enabled and required).DFG, German Research Foundation; STA/860-4. F.I.-C.DFG, German Research Foundation; STA/860-3DFG, German Research Foundation; STA/860-2. L.-T.N.University of Vienna; I059-NAustrian Science Fund; I760-B1
    corecore