241 research outputs found

    Exploring parallel MPI fault tolerance mechanisms for phylogenetic inference with RAxML-NG

    Get PDF
    Motivation Phylgenetic trees are now routinely inferred on large scale high performance computing systems with thousands of cores as the parallel scalability of phylogenetic inference tools has improved over the past years to cope with the molecular data avalanche. Thus, the parallel fault tolerance of phylogenetic inference tools has become a relevant challenge. To this end, we explore parallel fault tolerance mechanisms and algorithms, the software modifications required and the performance penalties induced via enabling parallel fault tolerance by example of RAxML-NG, the successor of the widely used RAxML tool for maximum likelihood-based phylogenetic tree inference. Results We find that the slowdown induced by the necessary additional recovery mechanisms in RAxML-NG is on average 1.00 ± 0.04. The overall slowdown by using these recovery mechanisms in conjunction with a fault-tolerant Message Passing Interface implementation amounts to on average 1.7 ± 0.6 for large empirical datasets. Via failure simulations, we show that RAxML-NG can successfully recover from multiple simultaneous failures, subsequent failures, failures during recovery and failures during checkpointing. Recoveries are automatic and transparent to the user

    RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference

    Get PDF
    Motivation: Phylogenies are important for fundamental biological research, but also have numerous applications in biotechnology, agriculture and medicine. Finding the optimal tree under the popular maximum likelihood (ML) criterion is known to be NP-hard. Thus, highly optimized and scalable codes are needed to analyze constantly growing empirical datasets. // Results: We present RAxML-NG, a from-scratch re-implementation of the established greedy tree search algorithm of RAxML/ExaML. RAxML-NG offers improved accuracy, flexibility, speed, scalability, and usability compared with RAxML/ExaML. On taxon-rich datasets, RAxML-NG typically finds higher-scoring trees than IQTree, an increasingly popular recent tool for ML-based phylogenetic inference (although IQ-Tree shows better stability). Finally, RAxML-NG introduces several new features, such as the detection of terraces in tree space and the recently introduced transfer bootstrap support metric. // Availability and implementation: The code is available under GNU GPL at https://github.com/amkozlov/raxml-ng. RAxML-NG web service (maintained by Vital-IT) is available at https://raxml-ng.vital-it.ch/

    Mathematical Problems in Molecular Evolution and Next Generation Sequencing

    Get PDF
    The focus of this work is the development of new mathematical methods for problems in phylogenetic tree inferences. In the first part we solve several problems related to so-called partitioned alignments. In the second part we demonstrate how to calculate all identical subtrees of a given labeled tree. We make use of this to implement an efficient method for avoiding redundant likelihood operations during phylogenetic tree inferences

    Load-Balance and Fault-Tolerance for Massively Parallel Phylogenetic Inference

    Get PDF

    Dynamics of Genome Rearrangement in Bacterial Populations

    Get PDF
    Genome structure variation has profound impacts on phenotype in organisms ranging from microbes to humans, yet little is known about how natural selection acts on genome arrangement. Pathogenic bacteria such as Yersinia pestis, which causes bubonic and pneumonic plague, often exhibit a high degree of genomic rearrangement. The recent availability of several Yersinia genomes offers an unprecedented opportunity to study the evolution of genome structure and arrangement. We introduce a set of statistical methods to study patterns of rearrangement in circular chromosomes and apply them to the Yersinia. We constructed a multiple alignment of eight Yersinia genomes using Mauve software to identify 78 conserved segments that are internally free from genome rearrangement. Based on the alignment, we applied Bayesian statistical methods to infer the phylogenetic inversion history of Yersinia. The sampling of genome arrangement reconstructions contains seven parsimonious tree topologies, each having different histories of 79 inversions. Topologies with a greater number of inversions also exist, but were sampled less frequently. The inversion phylogenies agree with results suggested by SNP patterns. We then analyzed reconstructed inversion histories to identify patterns of rearrangement. We confirm an over-representation of “symmetric inversions”—inversions with endpoints that are equally distant from the origin of chromosomal replication. Ancestral genome arrangements demonstrate moderate preference for replichore balance in Yersinia. We found that all inversions are shorter than expected under a neutral model, whereas inversions acting within a single replichore are much shorter than expected. We also found evidence for a canonical configuration of the origin and terminus of replication. Finally, breakpoint reuse analysis reveals that inversions with endpoints proximal to the origin of DNA replication are nearly three times more frequent. Our findings represent the first characterization of genome arrangement evolution in a bacterial population evolving outside laboratory conditions. Insight into the process of genomic rearrangement may further the understanding of pathogen population dynamics and selection on the architecture of circular bacterial chromosomes

    Algorithmic Advancements and Massive Parallelism for Large-Scale Datasets in Phylogenetic Bayesian Markov Chain Monte Carlo

    Get PDF
    Datasets used for the inference of the "tree of life" grow at unprecedented rates, thus inducing a high computational burden for analytic methods. First, we introduce a scalable software package that allows us to conduct state of the art Bayesian analyses on datasets of almost arbitrary size. Second, we derive a proposal mechanism for MCMC that is substantially more efficient than traditional branch length proposals. Third, we present an efficient algorithm for solving the rogue taxon problem

    Evolutionary genomics : statistical and computational methods

    Get PDF
    This open access book addresses the challenge of analyzing and understanding the evolutionary dynamics of complex biological systems at the genomic level, and elaborates on some promising strategies that would bring us closer to uncovering of the vital relationships between genotype and phenotype. After a few educational primers, the book continues with sections on sequence homology and alignment, phylogenetic methods to study genome evolution, methodologies for evaluating selective pressures on genomic sequences as well as genomic evolution in light of protein domain architecture and transposable elements, population genomics and other omics, and discussions of current bottlenecks in handling and analyzing genomic data. Written for the highly successful Methods in Molecular Biology series, chapters include the kind of detail and expert implementation advice that lead to the best results. Authoritative and comprehensive, Evolutionary Genomics: Statistical and Computational Methods, Second Edition aims to serve both novices in biology with strong statistics and computational skills, and molecular biologists with a good grasp of standard mathematical concepts, in moving this important field of study forward

    On the intelligent management of sepsis in the intensive care unit

    Get PDF
    The management of the Intensive Care Unit (ICU) in a hospital has its own, very specific requirements that involve, amongst others, issues of risk-adjusted mortality and average length of stay; nurse turnover and communication with physicians; technical quality of care; the ability to meet patient's family needs; and avoid medical error due rapidly changing circumstances and work overload. In the end, good ICU management should lead to an improvement in patient outcomes. Decision making at the ICU environment is a real-time challenge that works according to very tight guidelines, which relate to often complex and sensitive research ethics issues. Clinicians in this context must act upon as much available information as possible, and could therefore, in general, benefit from at least partially automated computer-based decision support based on qualitative and quantitative information. Those taking executive decisions at ICUs will require methods that are not only reliable, but also, and this is a key issue, readily interpretable. Otherwise, any decision tool, regardless its sophistication and accuracy, risks being rendered useless. This thesis addresses this through the design and development of computer based decision making tools to assist clinicians at the ICU. It focuses on one of the main problems that they must face: the management of the Sepsis pathology. Sepsis is one of the main causes of death for non-coronary ICU patients. Its mortality rate can reach almost up to one out of two patients for septic shock, its most acute manifestation. It is a transversal condition affecting people of all ages. Surprisingly, its definition has only been standardized two decades ago as a systemic inflammatory response syndrome with confirmed infection. The research reported in this document deals with the problem of Sepsis data analysis in general and, more specifically, with the problem of survival prediction for patients affected with Severe Sepsis. The tools at the core of the investigated data analysis procedures stem from the fields of multivariate and algebraic statistics, algebraic geometry, machine learning and computational intelligence. Beyond data analysis itself, the current thesis makes contributions from a clinical point of view, as it provides substantial evidence to the debate about the impact of the preadmission use of statin drugs in the ICU outcome. It also sheds light into the dependence between Septic Shock and Multi Organic Dysfunction Syndrome. Moreover, it defines a latent set of Sepsis descriptors to be used as prognostic factors for the prediction of mortality and achieves an improvement on predictive capability over indicators currently in use.La gestió d'una Unitat de Cures Intensives (UCI) hospitalària presenta uns requisits força específics incloent, entre altres, la disminució de la taxa de mortalitat, la durada de l'ingrès, la rotació d'infermeres i la comunicació entre metges amb al finalitad de donar una atenció de qualitat atenent als requisits tant dels malalts com dels familiars. També és força important controlar i minimitzar els error mèdics deguts a canvis sobtats i a la presa ràpida de deicisions assistencials. Al cap i a la fi, la bona gestió de la UCI hauria de resultar en una reducció de la mortalitat i durada d'estada. La presa de decisions en un entorn de crítics suposa un repte de presa de decisions en temps real d'acord a unes guies clíniques molt restrictives i que, pel que fa a la recerca, poden resultar en problemes ètics força sensibles i complexos. Per tant, el personal sanitari que ha de prendre decisions sobre la gestió de malalts crítics no només requereix eines de suport a la decisió que siguin fiables sinó que, a més a més, han de ser interpretables. Altrament qualsevol eina de decisió que no presenti aquests trets no és considerarà d'utilitat clínica. Aquesta tesi doctoral adreça aquests requisits mitjançant el desenvolupament d'eines de suport a la decisió per als intensivistes i es focalitza en un dels principals problemes als que s'han denfrontar: el maneig del malalt sèptic. La Sèpsia és una de les principals causes de mortalitats a les UCIS no-coronàries i la seva taxa de mortalitat pot arribar fins a la meitat dels malalts amb xoc sèptic, la seva manifestació més severa. La Sèpsia és un síndrome transversal, que afecta a persones de totes les edats. Sorprenentment, la seva definició ha estat estandaritzada, fa només vint anys, com a la resposta inflamatòria sistèmica a una infecció corfimada. La recerca presentada en aquest document fa referència a l'anàlisi de dades de la Sèpsia en general i, de forma més específica, al problema de la predicció de la supervivència de malalts afectats amb Sèpsia Greu. Les eines i mètodes que formen la clau de bòveda d'aquest treball provenen de diversos camps com l'estadística multivariant i algebràica, geometria algebraica, aprenentatge automàtic i inteligència computacional. Més enllà de l'anàlisi per-se, aquesta tesi també presenta una contribució des de el punt de vista clínic atès que presenta evidència substancial en el debat sobre l'impacte de l'administració d'estatines previ a l'ingrès a la UCI en els malalts sèptics. També s'aclareix la forta dependència entre el xoc sèptic i el Síndrome de Disfunció Multiorgànica. Finalment, també es defineix un conjunt de descriptors latents de la Sèpsia com a factors de pronòstic per a la predicció de la mortalitat, que millora sobre els mètodes actualment més utilitzats en la UCI

    16th SC@RUG 2019 proceedings 2018-2019

    Get PDF
    • …
    corecore