38 research outputs found

    Distinguishing regional from within-codon rate heterogeneity in DNA sequence alignments

    Get PDF
    We present an improved phylogenetic factorial hidden Markov model (FHMM) for detecting two types of mosaic structures in DNA sequence alignments, related to (1) recombination and (2) rate heterogeneity. The focus of the present work is on improving the modelling of the latter aspect. Earlier papers have modelled different degrees of rate heterogeneity with separate hidden states of the FHMM. This approach fails to appreciate the intrinsic difference between two types of rate heterogeneity: long-range regional effects, which are potentially related to differences in the selective pressure, and the short-term periodic patterns within the codons, which merely capture the signature of the genetic code. We propose an improved model that explicitly distinguishes between these two effects, and we assess its performance on a set of simulated DNA sequence alignments

    Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data

    Get PDF
    Objective: To develop a conceptual prediction model framework containing standardized steps and describe the corresponding open-source software developed to consistently implement the framework across computational environments and observational healthcare databases to enable model sharing and reproducibility. Methods: Based on existing best practices we propose a 5 step standardized framework for: (1) transparently defining the problem; (2) selecting suitable datasets; (3) constructing variables from the observational data; (4) learning the predictive model; and (5) validating the model performance. We implemented this framework as open-source software utilizing the Observational Medical Outcomes Partnership Common Data Model to enable convenient sharing of models and reproduction of model evaluation across multiple observational datasets. The software implementation contains default covariates and classifiers but the framework enables customization and extension. Results: As a proof-of-concept, demonstrating the transparency and ease of model dissemination using the software, we developed prediction models for 21 different outcomes within a target population of people suffering from depression across 4 observational databases. All 84 models are available in an accessible online repository to be implemented by anyone with access to an observational database in the Common DataModel format. Conclusions: The proof-of-concept study illustrates the framework's ability to develop reproducible models that can be readily shared and offers the potential to perform extensive external validation of models, and improve their likelihood of clinical uptake. In future work the framework will be applied to perform an "all-by-all" prediction analysis to assess the observational data prediction domain across numerous target populations, outcomes and time, and risk settings

    PASTA: Ultra-Large Multiple Sequence Alignment

    Full text link
    In this paper, we introduce a new and highly scalable algorithm, PASTA, for large-scale multiple sequence alignment estimation. PASTA uses a new technique to produce an alignment given a guide tree that enables it to be both highly scalable and very accurate. We present a study on biological and simulated data with up to 200,000 sequences, showing that PASTA produces highly accurate alignments, improving on the accuracy of the leading alignment methods on large datasets, and is able to analyze much larger datasets than the current methods. We also show that trees estimated on PASTA alignments are highly accurate – slightly better than SATe ́ trees, but with substantial improvements rela-tive to other methods. Finally, PASTA is very fast, highly parallelizable, and requires relatively little memory

    Alignment-Free Phylogenetic Reconstruction

    Get PDF
    14th Annual International Conference, RECOMB 2010, Lisbon, Portugal, April 25-28, 2010. ProceedingsWe introduce the first polynomial-time phylogenetic reconstruction algorithm under a model of sequence evolution allowing insertions and deletions (or indels). Given appropriate assumptions, our algorithm requires sequence lengths growing polynomially in the number of leaf taxa. Our techniques are distance-based and largely bypass the problem of multiple alignment

    The global response to the COVID-19 pandemic: how have immunology societies contributed?

    Get PDF
    The COVID-19 pandemic is shining a spotlight on the field of immunology like never before. To appreciate the diverse ways in which immunologists have contributed, Nature Reviews Immunology invited the president of the International Union of Immunological Societies and the presidents of 15 other national immunology societies to discuss how they and their members responded following the emergence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)

    Implementation of the COVID-19 vulnerability index across an international network of health care data sets: collaborative external validation study

    Get PDF
    Background: SARS-CoV-2 is straining health care systems globally. The burden on hospitals during the pandemic could be reduced by implementing prediction models that can discriminate patients who require hospitalization from those who do not. The COVID-19 vulnerability (C-19) index, a model that predicts which patients will be admitted to hospital for treatment of pneumonia or pneumonia proxies, has been developed and proposed as a valuable tool for decision-making during the pandemic. However, the model is at high risk of bias according to the "prediction model risk of bias assessment" criteria, and it has not been externally validated.Objective: The aim of this study was to externally validate the C-19 index across a range of health care settings to determine how well it broadly predicts hospitalization due to pneumonia in COVID-19 cases.Methods: We followed the Observational Health Data Sciences and Informatics (OHDSI) framework for external validation to assess the reliability of the C-19 index. We evaluated the model on two different target populations, 41,381 patients who presented with SARS-CoV-2 at an outpatient or emergency department visit and 9,429,285 patients who presented with influenza or related symptoms during an outpatient or emergency department visit, to predict their risk of hospitalization with pneumonia during the following 0-30 days. In total, we validated the model across a network of 14 databases spanning the United States, Europe, Australia, and Asia.Results: The internal validation performance of the C-19 index had a C statistic of 0.73, and the calibration was not reported by the authors. When we externally validated it by transporting it to SARS-CoV-2 data, the model obtained C statistics of 0.36, 0.53 (0.473-0.584) and 0.56 (0.488-0.636) on Spanish, US, and South Korean data sets, respectively. The calibration was poor, with the model underestimating risk. When validated on 12 data sets containing influenza patients across the OHDSI network, the C statistics ranged between 0.40 and 0.68.Conclusions: Our results show that the discriminative performance of the C-19 index model is low for influenza cohorts and even worse among patients with COVID-19 in the United States, Spain, and South Korea. These results suggest that C-19 should not be used to aid decision-making during the COVID-19 pandemic. Our findings highlight the importance of performing external validation across a range of settings, especially when a prediction model is being extrapolated to a different population. In the field of prediction, extensive validation is required to create appropriate trust in a model.Development and application of statistical models for medical scientific researc

    Bayesian inference reveals host-specific contributions to the epidemic expansion of influenza A H5N1

    No full text
    Since its first isolation in 1996 in Guangdong, China, the highly pathogenic avian influenza virus (HPAIV) H5N1 has circulated in avian hosts for almost two decades and spread to more than 60 countries worldwide. The role of different avian hosts and the domestic-wild bird interface has been critical in shaping the complex HPAIV H5N1 disease ecology, but remains difficult to ascertain. To shed light on the large-scale H5N1 transmission patterns and disentangle the contributions of different avian hosts on the tempo and mode of HPAIV H5N1 dispersal, we apply Bayesian evolutionary inference techniques to comprehensive sets of hemagglutinin and neuraminidase gene sequences sampled between 1996 and 2011 throughout Asia and Russia. Our analyses demonstrate that the large-scale H5N1 transmission dynamics are structured according to different avian flyways, and that the incursion of the Central Asian flyway specifically was driven by Anatidae hosts coinciding with rapid rate of spread and an epidemic wavefront acceleration. This also resulted in longdistance dispersal that is likely to be explained by wild bird migration. We identify a significant degree of asymmetry in the large-scale transmission dynamics between Anatidae and Phasianidae, with the latter largely representing poultry as an evolutionary sink. A joint analysis of host dynamics and continuous spatial diffusion demonstrates that the rate of viral dispersal and host diffusivity is significantly higher for Anatidae compared with Phasianidae. These findings complement risk modeling studies and satellite tracking of wild birds in demonstrating a continental-scale structuring into areas of H5N1 persistence that are connected through migratory waterfowl.SCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Interpreting observational studies: Why empirical calibration is needed to correct p-values

    Get PDF
    Often the literature makes assertions of medical product effects on the basis of ' p<0.05'. The underlying premise is that at this threshold, there is only a 5% probability that the observed effect would be seen by chance when in reality there is no effect. In observational studies, much more than in randomized trials, bias and confounding may undermine this premise. To test this premise, we selected three exemplar drug safety studies from literature, representing a case-control, a cohort, and a self-controlled case series design. We attempted to replicate these studies as best we could for the drugs studied in the original articles. Next, we applied the same three designs to sets of negative controls: drugs that are not believed to cause the outcome of interest. We observed how often p<0.05 when the null hypothesis is true, and we fitted distributions to the effect estimates. Using these distributions, we compute calibrated p-values that reflect the probability of observing the effect estimate under the null hypothesis, taking both random and systematic error into account. An automated analysis of scientific literature was performed to evaluate the potential impact of such a calibration. Our experiment provides evidence that the majority of observational studies would declare statistical significance when no effect is present. Empirical calibration was found to reduce spurious results to the desired 5% level. Applying these adjustments to literature suggests that at least 54% of findings with p<0.05 are not actually statistically significant and should be reevaluated

    SPREAD 4: online visualisation of pathogen phylogeographic reconstructions

    No full text
    Phylogeographic analyses aim to extract information about pathogen spread from genomic data, and visualising spatio-temporal reconstructions is a key aspect of this process. Here we present SPREAD 4, a feature-rich web-based application that visualises estimates of pathogen dispersal resulting from Bayesian phylogeographic inference using BEAST on a geographic map, offering zoom-and-filter functionality and smooth animation over time. SPREAD 4 takes as input phylogenies with both discrete and continuous location annotation and offers customised visualisation as well as generation of publication-ready figures. SPREAD 4 now features account-based storage and easy sharing of visualisations by means of unique web addresses. SPREAD 4 is intuitive to use and is available online at https://spreadviz.org, with an accompanying web page containing answers to frequently asked questions at https://beast.community/spread4.SCOPUS: ar.jinfo:eu-repo/semantics/publishe
    corecore