10 research outputs found
Non-parametric machine learning for biological sequence data
In the past decade there has been a massive increase in the volume of biological sequence data, driven by massively parallel sequencing technologies. This has enabled data-driven statistical analyses using non-parametric predictive models (including those from machine learning) to complement more traditional, hypothesis-driven approaches. This thesis addresses several challenges that arise when applying non-parametric predictive models to biological sequence data.
Some of these challenges arise due to the nature of the biological system of interest. For example, in the study of the human microbiome the phylogenetic relationships between microorganisms are often ignored in statistical analyses. This thesis outlines a novel approach to modelling phylogenetic similarity using string kernels and demonstrates its utility in the two-sample test and host-trait prediction.
Other challenges arise from limitations in our understanding of the models themselves. For example, calculating variable importance (a key task in biomedical applications) is not possible for many models. This thesis describes a novel extension of an existing approach to compute importance scores for grouped variables in a Bayesian neural network. It also explores the behaviour of random forest classifiers when applied to microbial datasets, with a focus on the robustness of the biological findings under different modelling assumptions.Open Acces
Fully discrete finite element data assimilation method for the heat equation
We consider a finite element discretization for the reconstruction of the
final state of the heat equation, when the initial data is unknown, but
additional data is given in a sub domain in the space time. For the
discretization in space we consider standard continuous affine finite element
approximation, and the time derivative is discretized using a backward
differentiation. We regularize the discrete system by adding a penalty of the
-semi-norm of the initial data, scaled with the mesh-parameter. The
analysis of the method uses techniques developed in E. Burman and L. Oksanen,
Data assimilation for the heat equation using stabilized finite element
methods, arXiv, 2016, combining discrete stability of the numerical method with
sharp Carleman estimates for the physical problem, to derive optimal error
estimates for the approximate solution. For the natural space time energy norm,
away from , the convergence is the same as for the classical problem with
known initial data, but contrary to the classical case, we do not obtain faster
convergence for the -norm at the final time
Modelling phylogeny in 16S rRNA gene sequencing datasets using string kernels
Bacterial community composition is measured using 16S rRNA (ribosomal
ribonucleic acid) gene sequencing, for which one of the defining
characteristics is the phylogenetic relationships that exist between variables.
Here, we demonstrate the utility of modelling these relationships in two
statistical tasks (the two sample test and host trait prediction) by employing
string kernels originally proposed in natural language processing. We show via
simulation studies that a kernel two-sample test using the proposed kernels,
which explicitly model phylogenetic relationships, is powerful while also being
sensitive to the phylogenetic scale of the difference between the two
populations. We also demonstrate how the proposed kernels can be used with
Gaussian processes to improve predictive performance in host trait prediction.
Our method is implemented in the Python package StringPhylo (available at
github.com/jonathanishhorowicz/stringphylo)
GpABC: a Julia package for approximate Bayesian computation with Gaussian process emulation
Motivation Approximate Bayesian computation (ABC) is an important framework within which to infer the structure and parameters of a systems biology model. It is especially suitable for biological systems with stochastic and nonlinear dynamics, for which the likelihood functions are intractable. However, the associated computational cost often limits ABC to models that are relatively quick to simulate in practice. Results We here present a Julia package, GpABC, that implements parameter inference and model selection for deterministic or stochastic models using i) standard rejection ABC or ABC-SMC, or ii) ABC with Gaussian process emulation. The latter significantly reduces the computational cost. Availability and Implementation https://github.com/tanhevg/GpABC.jl Supplementary information Supplementary data are available at Bioinformatics online
Epidemia:An R Package for Semi-Mechanistic Bayesian Modelling of Infectious Diseases using Point Processes
This article introduces epidemia, an R package for Bayesian,
regression-oriented modeling of infectious diseases. The implemented models
define a likelihood for all observed data while also explicitly modeling
transmission dynamics: an approach often termed as semi-mechanistic. Infections
are propagated over time using renewal equations. This approach is inspired by
self-exciting, continuous-time point processes such as the Hawkes process. A
variety of inferential tasks can be performed using the package. Key
epidemiological quantities, including reproduction numbers and latent
infections, may be estimated within the framework. The models may be used to
evaluate the determinants of changes in transmission rates, including the
effects of control measures. Epidemic dynamics may be simulated either from a
fitted model or a prior model; allowing for prior/posterior predictive checks,
experimentation, and forecasting
Genomic attributes of airway commensal bacteria and mucosa
Microbial communities at the airway mucosal barrier are conserved and highly ordered, in likelihood reflecting co-evolution with human host factors. Freed of selection to digest nutrients, the airway microbiome underpins cognate management of mucosal immunity and pathogen resistance. We show here the initial results of systematic culture and whole-genome sequencing of the thoracic airway bacteria, identifying 52 novel species amongst 126 organisms that constitute 75% of commensals typically present in heathy individuals. Clinically relevant genes encode antimicrobial synthesis, adhesion and biofilm formation, immune modulation, iron utilisation, nitrous oxide (NO) metabolism and sphingolipid signalling. Using whole-genome content we identify dysbiotic features that may influence asthma and chronic obstructive pulmonary disease. We match isolate gene content to transcripts and metabolites expressed late in airway epithelial differentiation, identifying pathways to sustain host interactions with microbiota. Our results provide a systematic basis for decrypting interactions between commensals, pathogens, and mucosa in lung diseases of global significance
Age groups that sustain resurging COVID-19 epidemics in the United States.
After initial declines, in mid-2020 a resurgence in transmission of novel coronavirus disease (COVID-19) occurred in the United States and Europe. As efforts to control COVID-19 disease are reintensified, understanding the age demographics driving transmission and how these affect the loosening of interventions is crucial. We analyze aggregated, age-specific mobility trends from more than 10 million individuals in the United States and link these mechanistically to age-specific COVID-19 mortality data. We estimate that as of October 2020, individuals aged 20 to 49 are the only age groups sustaining resurgent SARS-CoV-2 transmission with reproduction numbers well above one and that at least 65 of 100 COVID-19 infections originate from individuals aged 20 to 49 in the United States. Targeting interventions-including transmission-blocking vaccines-to adults aged 20 to 49 is an important consideration in halting resurgent epidemics and preventing COVID-19-attributable deaths
Genomic and ecologic characteristics of the airway microbial-mucosal complex
S ummary paragraph Lung diseases due to infection and dysbiosis affect hundreds of millions of people world-wide 1-4 . Microbial communities at the airway mucosal barrier are conserved and highly ordered 5 , reflecting symbiosis and co-evolution with human host factors 6 . Freed of selection to digest nutrients for the host, the airway microbiome underpins cognate management of mucosal immunity and pathogen resistance. We show here the results of the first systematic culture and whole-genome sequencing of the principal airway bacterial species, identifying abundant novel organisms within the genera Streptococcus, Pauljensenia, Neisseria and Gemella . Bacterial genomes were enriched for genes encoding antimicrobial synthesis, adhesion and biofilm formation, immune modulation, iron utilisation, nitrous oxide (NO) metabolism and sphingolipid signalling. RNA-targeting CRISPR elements in some taxa suggest the potential to prevent or treat specific viral infections. Homologues of human RO60 present in Neisseria spp. provide a possible respiratory primer for autoimmunity in systemic lupus erythematosus (SLE) and Sjögren syndrome. We interpret the structure and biogeography of airway microbial communities from clinical surveys in the context of whole-genome content, identifying features of airway dysbiosis that may presage breakdown of homeostasis during acute attacks of asthma and chronic obstructive pulmonary disease (COPD). We match the gene content of isolates to human transcripts and metabolites expressed late in airway epithelial differentiation, identifying pathways that can sustain host interactions with the microbiota. Our results provide a systematic basis for decrypting interactions between commensals, pathogens, and mucosal immunity in lung diseases of global significance