10 research outputs found

    Non-parametric machine learning for biological sequence data

    Get PDF
    In the past decade there has been a massive increase in the volume of biological sequence data, driven by massively parallel sequencing technologies. This has enabled data-driven statistical analyses using non-parametric predictive models (including those from machine learning) to complement more traditional, hypothesis-driven approaches. This thesis addresses several challenges that arise when applying non-parametric predictive models to biological sequence data. Some of these challenges arise due to the nature of the biological system of interest. For example, in the study of the human microbiome the phylogenetic relationships between microorganisms are often ignored in statistical analyses. This thesis outlines a novel approach to modelling phylogenetic similarity using string kernels and demonstrates its utility in the two-sample test and host-trait prediction. Other challenges arise from limitations in our understanding of the models themselves. For example, calculating variable importance (a key task in biomedical applications) is not possible for many models. This thesis describes a novel extension of an existing approach to compute importance scores for grouped variables in a Bayesian neural network. It also explores the behaviour of random forest classifiers when applied to microbial datasets, with a focus on the robustness of the biological findings under different modelling assumptions.Open Acces

    Fully discrete finite element data assimilation method for the heat equation

    Get PDF
    We consider a finite element discretization for the reconstruction of the final state of the heat equation, when the initial data is unknown, but additional data is given in a sub domain in the space time. For the discretization in space we consider standard continuous affine finite element approximation, and the time derivative is discretized using a backward differentiation. We regularize the discrete system by adding a penalty of the H1H^1-semi-norm of the initial data, scaled with the mesh-parameter. The analysis of the method uses techniques developed in E. Burman and L. Oksanen, Data assimilation for the heat equation using stabilized finite element methods, arXiv, 2016, combining discrete stability of the numerical method with sharp Carleman estimates for the physical problem, to derive optimal error estimates for the approximate solution. For the natural space time energy norm, away from t=0t=0, the convergence is the same as for the classical problem with known initial data, but contrary to the classical case, we do not obtain faster convergence for the L2L^2-norm at the final time

    Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels

    Full text link
    Bacterial community composition is measured using 16S rRNA (ribosomal ribonucleic acid) gene sequencing, for which one of the defining characteristics is the phylogenetic relationships that exist between variables. Here, we demonstrate the utility of modelling these relationships in two statistical tasks (the two sample test and host trait prediction) by employing string kernels originally proposed in natural language processing. We show via simulation studies that a kernel two-sample test using the proposed kernels, which explicitly model phylogenetic relationships, is powerful while also being sensitive to the phylogenetic scale of the difference between the two populations. We also demonstrate how the proposed kernels can be used with Gaussian processes to improve predictive performance in host trait prediction. Our method is implemented in the Python package StringPhylo (available at github.com/jonathanishhorowicz/stringphylo)

    GpABC: a Julia package for approximate Bayesian computation with Gaussian process emulation

    Get PDF
    Motivation Approximate Bayesian computation (ABC) is an important framework within which to infer the structure and parameters of a systems biology model. It is especially suitable for biological systems with stochastic and nonlinear dynamics, for which the likelihood functions are intractable. However, the associated computational cost often limits ABC to models that are relatively quick to simulate in practice. Results We here present a Julia package, GpABC, that implements parameter inference and model selection for deterministic or stochastic models using i) standard rejection ABC or ABC-SMC, or ii) ABC with Gaussian process emulation. The latter significantly reduces the computational cost. Availability and Implementation https://github.com/tanhevg/GpABC.jl Supplementary information Supplementary data are available at Bioinformatics online

    Epidemia:An R Package for Semi-Mechanistic Bayesian Modelling of Infectious Diseases using Point Processes

    Get PDF
    This article introduces epidemia, an R package for Bayesian, regression-oriented modeling of infectious diseases. The implemented models define a likelihood for all observed data while also explicitly modeling transmission dynamics: an approach often termed as semi-mechanistic. Infections are propagated over time using renewal equations. This approach is inspired by self-exciting, continuous-time point processes such as the Hawkes process. A variety of inferential tasks can be performed using the package. Key epidemiological quantities, including reproduction numbers and latent infections, may be estimated within the framework. The models may be used to evaluate the determinants of changes in transmission rates, including the effects of control measures. Epidemic dynamics may be simulated either from a fitted model or a prior model; allowing for prior/posterior predictive checks, experimentation, and forecasting

    Genomic attributes of airway commensal bacteria and mucosa

    Get PDF
    Microbial communities at the airway mucosal barrier are conserved and highly ordered, in likelihood reflecting co-evolution with human host factors. Freed of selection to digest nutrients, the airway microbiome underpins cognate management of mucosal immunity and pathogen resistance. We show here the initial results of systematic culture and whole-genome sequencing of the thoracic airway bacteria, identifying 52 novel species amongst 126 organisms that constitute 75% of commensals typically present in heathy individuals. Clinically relevant genes encode antimicrobial synthesis, adhesion and biofilm formation, immune modulation, iron utilisation, nitrous oxide (NO) metabolism and sphingolipid signalling. Using whole-genome content we identify dysbiotic features that may influence asthma and chronic obstructive pulmonary disease. We match isolate gene content to transcripts and metabolites expressed late in airway epithelial differentiation, identifying pathways to sustain host interactions with microbiota. Our results provide a systematic basis for decrypting interactions between commensals, pathogens, and mucosa in lung diseases of global significance

    Age groups that sustain resurging COVID-19 epidemics in the United States.

    Get PDF
    After initial declines, in mid-2020 a resurgence in transmission of novel coronavirus disease (COVID-19) occurred in the United States and Europe. As efforts to control COVID-19 disease are reintensified, understanding the age demographics driving transmission and how these affect the loosening of interventions is crucial. We analyze aggregated, age-specific mobility trends from more than 10 million individuals in the United States and link these mechanistically to age-specific COVID-19 mortality data. We estimate that as of October 2020, individuals aged 20 to 49 are the only age groups sustaining resurgent SARS-CoV-2 transmission with reproduction numbers well above one and that at least 65 of 100 COVID-19 infections originate from individuals aged 20 to 49 in the United States. Targeting interventions-including transmission-blocking vaccines-to adults aged 20 to 49 is an important consideration in halting resurgent epidemics and preventing COVID-19-attributable deaths

    Genomic and ecologic characteristics of the airway microbial-mucosal complex

    No full text
    S ummary paragraph Lung diseases due to infection and dysbiosis affect hundreds of millions of people world-wide 1-4 . Microbial communities at the airway mucosal barrier are conserved and highly ordered 5 , reflecting symbiosis and co-evolution with human host factors 6 . Freed of selection to digest nutrients for the host, the airway microbiome underpins cognate management of mucosal immunity and pathogen resistance. We show here the results of the first systematic culture and whole-genome sequencing of the principal airway bacterial species, identifying abundant novel organisms within the genera Streptococcus, Pauljensenia, Neisseria and Gemella . Bacterial genomes were enriched for genes encoding antimicrobial synthesis, adhesion and biofilm formation, immune modulation, iron utilisation, nitrous oxide (NO) metabolism and sphingolipid signalling. RNA-targeting CRISPR elements in some taxa suggest the potential to prevent or treat specific viral infections. Homologues of human RO60 present in Neisseria spp. provide a possible respiratory primer for autoimmunity in systemic lupus erythematosus (SLE) and Sjögren syndrome. We interpret the structure and biogeography of airway microbial communities from clinical surveys in the context of whole-genome content, identifying features of airway dysbiosis that may presage breakdown of homeostasis during acute attacks of asthma and chronic obstructive pulmonary disease (COPD). We match the gene content of isolates to human transcripts and metabolites expressed late in airway epithelial differentiation, identifying pathways that can sustain host interactions with the microbiota. Our results provide a systematic basis for decrypting interactions between commensals, pathogens, and mucosal immunity in lung diseases of global significance
    corecore