623 research outputs found

    Model assessment for Bayesian spatio-temporal epidemic models for complex data sets using hybrid computational methods

    Get PDF
    This project investigates the use of model assessment techniques for stochastic spatiotemporal models, with a focus on embedding classical style tests within the Bayesian framework and applying them to study real-world systems. Techniques will be investigated within the context of epidemic models. These models model the spread of a disease, for example, citrus canker, over a spatial region. We will focus on methods of choosing between different transmission kernels. The transmission kernel is a component in the model which determines how the disease spreads over space and time, and is important in choosing the right strategy for the disease, for example, culling of infected individual. The methods for model selection within this context are challenging to develop and implement. Building on recent work within the group which has focused on tests applied to residual processes, we will investigate how likelihood-based tests might be applied to latent processes in order to formulate methods that avoid the sensitivity to parameter priors suffered by purely Bayesian approaches to model comparison. In addition, we extend existing latent residual tests to detect the presence of anisotropic spatial kernels. The power of these tests will be calculated and their advantages and disadvantages investigated, both from a computational and a practical perspective as well from a theoretical perspective. These investigations will be carried out using computational statistical methods performed on simulated and real-world data sets, including the DEFRA data-set for the foot-and-mouth outbreak of 2001. Our investigations show that the likelihood-based methods are able to detect misspecification of spatial kernel, sometimes exceeding the power of existing latent residual tests. Our directional infection link residual test is shown to be able detect anisotropy in simulated data. Using hybrid computational programming techniques, our tests have been shown to scale to big data sets of 188,361 individuals, and detect mis-specification of kernel in an existing analysis of the data.EPSRC fundin

    Revealing the evolutionary history and epidemiological dynamics of emerging RNA viral pathogens

    Get PDF
    Fast-evolving RNA viruses are a leading cause of morbidity and mortality among human and animal populations, contributing significantly to both global health and economic burden. The advent and revolution of high-throughput sequencing has empowered phylogenetic analyses with increasing amounts of temporally and spatially sampled viral data. Moreover, the parallel advancement in molecular evolution and phylogenetic methods has provided investigators with a unique opportunity to gain detailed insight into the evolutionary and epidemiological dynamics of emerging viral pathogens. Using state-of-the-art statistical approaches, this thesis addresses some of the important but controversial questions in viral emergence. Chapter 2 introduces a new framework to quantify and investigate reassortment events in influenza A viruses. By developing a computationally efficient algorithm to calculate the largest common subtree for a pair of tree sets, which are estimated from diffe rent parts of the genome for the same taxa set, the level of phylogenetic incongruency due to reassortment can be appropriately ascertained. Chapters 3, 4 and 5 investigate the evolutionary origins of three diff erent viruses: the novel emergence and cross-species transmission of SARSCoV, the genesis and dissemination of the unique HCV circulating recombinant form, and the ancient divergence of all influenza viruses, respectively. Moreover, Chapter 4 presents an improved statistical framework, which provides more precise evolutionary estimates, by utilizing the hierarchical bayes approach to investigate recombination events in emerging RNA viruses. The last empirical study, presented in Chapter 6, applies the recently developed Bayesian phylogeography models to a large viral sequence dataset sampled from southern Viet Nam to examine the fine-scale spatiotemporal dynamics of endemic dengue in Southeast Asia. The work presented here reflects both the advancements made in sequencing technology and statistical phylogenetics, along with some of the challenges that remain in studying the emergence of fast-evolving RNA viruses. This thesis proposes new and improved solutions to these evolutionary problems, such as incorporating non-vertical evolution (i.e. homologous recombination and reassortment) into the phylodynamic framework, with the aim of facilitating future investigations of emerging viral diseases

    From Epidemic to Pandemic Modelling

    Get PDF
    We present a methodology for systematically extending epidemic models to multilevel and multiscale spatio-temporal pandemic ones. Our approach builds on the use of coloured stochastic and continuous Petri nets facilitating the sound component-based extension of basic SIR models to include population stratification and also spatio-geographic information and travel connections, represented as graphs, resulting in robust stratified pandemic metapopulation models. This method is inherently easy to use, producing scalable and reusable models with a high degree of clarity and accessibility which can be read either in a deterministic or stochastic paradigm. Our method is supported by a publicly available platform PetriNuts; it enables the visual construction and editing of models; deterministic, stochastic and hybrid simulation as well as structural and behavioural analysis. All the models are available as supplementary material, ensuring reproducibility.Comment: 79 pages (with Appendix), 23 figures, 7 table

    MCMC methods: graph samplers, invariance tests and epidemic models

    Get PDF
    Markov Chain Monte Carlo (MCMC) techniques are used ubiquitously for simulation-based inference. This thesis provides novel contributions to MCMC methods and their application to graph sampling and epidemic modeling. The first topic considered is that of sampling graphs conditional on a set of prescribed statistics, which is a difficult problem arising naturally in many fields: sociology (Holland and Leinhardt, 1981), psychology (Connor and Simberloff, 1979), categorical data analysis (Agresti, 1992) and finance (Squartini et al., 2018, Gandy and Veraart, 2019) being examples. Bespoke MCMC samplers are proposed for this setting. The second major topic addressed is that of modeling the dynamics of infectious diseases, where MCMC is leveraged as the general inference engine. The first part of this thesis addresses important problems such as the uniform sampling of graphs with given degree sequences, and weighted graphs with given strength sequences. These distributions are frequently used for exact tests on social networks and two-way contingency tables. Another application is quantifying the statistical significance of patterns observed in real networks. This is crucial for understanding whether such patterns indicate the presence of interesting network phenomena, or whether they simply result from less interesting processes, such as nodal-heterogeneity. The MCMC samplers developed in the course of this research are complex, and there is great scope for conceptual, analytic, and implementation errors. This motivates a chapter that develops novel tests for detecting errors in MCMC implementations. The tests introduced are unique in being exact, which allows us to keep the false rejection probability arbitrarily low. Rather than develop bespoke samplers, as in the first part of the thesis, the second part leverages a standard MCMC framework Stan (Stan Development Team, 2018) as the workhorse for fitting state-of-the-art epidemic models. We present a general framework for semi-mechanistic Bayesian modeling of infectious diseases using renewal processes. The term semi-mechanistic relates to statistical estimation within some constrained mechanism. This research was motivated by the ongoing SARS-COV-2 pandemic, and variants of the model have been used in specific analyses of Covid-19. We present epidemia, an R package allowing researchers to leverage the epidemic models. A key goal of this work is to demonstrate that MCMC, and in particular, Stan’s No-U-Turn (Hoffman and Gelman, 2014) sampler, can be routinely employed to fit a large-class of epidemic models. A second goal is to make the models accessible to the general research community, through epidemia.Open Acces

    Statistical Inference for Propagation Processes on Complex Networks

    Get PDF
    Die Methoden der Netzwerktheorie erfreuen sich wachsender Beliebtheit, da sie die Darstellung von komplexen Systemen durch Netzwerke erlauben. Diese werden nur mit einer Menge von Knoten erfasst, die durch Kanten verbunden werden. Derzeit verfügbare Methoden beschränken sich hauptsächlich auf die deskriptive Analyse der Netzwerkstruktur. In der hier vorliegenden Arbeit werden verschiedene Ansätze für die Inferenz über Prozessen in komplexen Netzwerken vorgestellt. Diese Prozesse beeinflussen messbare Größen in Netzwerkknoten und werden durch eine Menge von Zufallszahlen beschrieben. Alle vorgestellten Methoden sind durch praktische Anwendungen motiviert, wie die Übertragung von Lebensmittelinfektionen, die Verbreitung von Zugverspätungen, oder auch die Regulierung von genetischen Effekten. Zunächst wird ein allgemeines dynamisches Metapopulationsmodell für die Verbreitung von Lebensmittelinfektionen vorgestellt, welches die lokalen Infektionsdynamiken mit den netzwerkbasierten Transportwegen von kontaminierten Lebensmitteln zusammenführt. Dieses Modell ermöglicht die effiziente Simulationen verschiedener realistischer Lebensmittelinfektionsepidemien. Zweitens wird ein explorativer Ansatz zur Ursprungsbestimmung von Verbreitungsprozessen entwickelt. Auf Grundlage einer netzwerkbasierten Redefinition der geodätischen Distanz können komplexe Verbreitungsmuster in ein systematisches, kreisrundes Ausbreitungsschema projiziert werden. Dies gilt genau dann, wenn der Ursprungsnetzwerkknoten als Bezugspunkt gewählt wird. Die Methode wird erfolgreich auf den EHEC/HUS Epidemie 2011 in Deutschland angewandt. Die Ergebnisse legen nahe, dass die Methode die aufwändigen Standarduntersuchungen bei Lebensmittelinfektionsepidemien sinnvoll ergänzen kann. Zudem kann dieser explorative Ansatz zur Identifikation von Ursprungsverspätungen in Transportnetzwerken angewandt werden. Die Ergebnisse von umfangreichen Simulationsstudien mit verschiedenstensten Übertragungsmechanismen lassen auf eine allgemeine Anwendbarkeit des Ansatzes bei der Ursprungsbestimmung von Verbreitungsprozessen in vielfältigen Bereichen hoffen. Schließlich wird gezeigt, dass kernelbasierte Methoden eine Alternative für die statistische Analyse von Prozessen in Netzwerken darstellen können. Es wurde ein netzwerkbasierter Kern für den logistischen Kernel Machine Test entwickelt, welcher die nahtlose Integration von biologischem Wissen in die Analyse von Daten aus genomweiten Assoziationsstudien erlaubt. Die Methode wird erfolgreich bei der Analyse genetischer Ursachen für rheumatische Arthritis und Lungenkrebs getestet. Zusammenfassend machen die Ergebnisse der vorgestellten Methoden deutlich, dass die Netzwerk-theoretische Analyse von Verbreitungsprozessen einen wesentlichen Beitrag zur Beantwortung verschiedenster Fragestellungen in unterschiedlichen Anwendungen liefern kann

    Inference and experimental design for percolation and random graph models.

    Get PDF
    The problem of optimal arrangement of nodes of a random weighted graph is studied in this thesis. The nodes of graphs under study are fixed, but their edges are random and established according to the so called edge-probability function. This function is assumed to depend on the weights attributed to the pairs of graph nodes (or distances between them) and a statistical parameter. It is the purpose of experimentation to make inference on the statistical parameter and thus to extract as much information about it as possible. We also distinguish between two different experimentation scenarios: progressive and instructive designs. We adopt a utility-based Bayesian framework to tackle the optimal design problem for random graphs of this kind. Simulation based optimisation methods, mainly Monte Carlo and Markov Chain Monte Carlo, are used to obtain the solution. We study optimal design problem for the inference based on partial observations of random graphs by employing data augmentation technique. We prove that the infinitely growing or diminishing node configurations asymptotically represent the worst node arrangements. We also obtain the exact solution to the optimal design problem for proximity graphs (geometric graphs) and numerical solution for graphs with threshold edge-probability functions. We consider inference and optimal design problems for finite clusters from bond percolation on the integer lattice Zd and derive a range of both numerical and analytical results for these graphs. We introduce inner-outer plots by deleting some of the lattice nodes and show that the ‘mostly populated’ designs are not necessarily optimal in the case of incomplete observations under both progressive and instructive design scenarios. Finally, we formulate a problem of approximating finite point sets with lattice nodes and describe a solution to this problem

    Outbreak Detection From Virus Genetic Sequence Variation By Community Detection

    Get PDF
    Detecting risk groups in transmission networks can be difficult due to a virus\u27 high transmission rate. We hypothesize that this problem can be resolved by community detection methods. Community detection is a clustering method based on edge density, which can break a connected component into multiple smaller clusters. My project develops a framework to find more informative clusters of virus sequences by applying community detection methods to transmission networks of HIV-1 sequences from Beijing and Tennessee, and a global dataset of SARS-CoV-2 sequences. We set the sequences with the most recent sample collection date as “new cases” and the remaining as “known cases”. Then, the difference of Akaike information criterion (AIC) between two Poisson regression models is measured. By using this framework, we determine that the HIV-1 database from Beijing favors a higher distance threshold than Tennessee, and in the SARS-CoV-2 transmission network, some pairs of countries (i.e., England and Portugal) are more significantly associated than by chance

    Genomic and bioinformatic approach to avian influenza virus evolution

    Get PDF
    Viral zoonotic agents have a significant impact both on human and veterinary public health. Ecosystems changes, increasing urbanization and easy connection have influenced the balance between pathogen and related host species. In recent years most threatening viruses, originated from animal hosts causing emerging diseases; most of them are RNA viruses that thanks to a large population sizes, high mutation rate and short generation time allow rapid evolution, genetic variability and the selection of new variants. A constant and adequate surveillance program and the sharing of different professional expertise are necessaries to follow viral evolution and to formulate efficient public health policy (Howard and Fletcher, 2012). Influenza A virus is considered one of the most challenging RNA viruses for its zoonotic potential role in the animal-human interface, for global health and economic impact; almost every year influenza epidemics cause morbidity and mortality in the human and is also associated with influenza virus pandemics. Both wild and domestic birds are considered the primary natural reservoir of influenza A virus and in particular wild birds are thought to be the source of influenza A viruses in all other animals (http://www.cdc.gov/flu/about/viruses/transmission.htm). Different techniques are available to genetically characterize and study viruses in order to understand their behavior, the evolutionary dynamics, the host-virus interactions and their origin; the aim is to develop a valid support with appropriate treatments during the phases of surveillance and diagnosis of possible epidemics. During my PhD it was used an integrated approach, both genomic and structural, to study the evolution of avian influenza A virus in particular focusing on the hemagglutinin, the major surface glycoprotein, belonging to the H5, H7 and H9 (the major "avian" subtypes responsible for human infection). Next-generation sequencing (NGS) was used to investigate and characterize the complexity of the viral population to detect low-frequency mutations and to follow the evolution of the genetically related variants present in a viral population. To compare and inspect genetic data, phylogenetic approach has shown to be a useful tools in the analysis of viral evolution. It has been used to explain the molecular epidemiology, transmission and viral evolution. In order to obtain a more complete view of the ‘functional evolution’, phylogenetic analyses based on sequence comparison and resulting in trees, was integrated taking into account information from structural comparison. Three-dimensional structural approach have shown to be a useful tool to display similarities and to inspect motifs that cannot be discovered analyzing primary sequences alone. Indeed, in the primary sequences the introduction of a mutation does not take into account the effect on the protein folding or on the surface properties, while in the three-dimensional structures, since each mutation is able to influence the structural characteristics and interactions, is directly detectable. This approach has also brought a further contribution to the phylogenetic analysis. In particular the study has focused on the evolutionary dynamics and the adaptive strategies of avian influenza H7N1 and H7N3 subtypes that circulated in Northern Italy for similar periods of time under similar epidemiological conditions. Within and between host population dynamics of Avian HPAI H7N7 viruses, that affected Italy during 2013, were investigated using next generation technology. NGS analysis was used to characterize viral population complexity into two groups of animals challenged with the same virus H5N1 HPAIvirus but vaccinated with vaccine conferring different protection levels. An extensive comparison of structural domains and sub-regions was performed on the hemagglutinin of different subtypes of influenza A virus, with particular interest to different clades of HPAI H5N1 circulating in Egypt (where bird flu is endemic in poultry ), to investigate any domain-specific changes. Influenza A viruses belonging to H9 subtype were inspected from a phylogenetic and a structural point of view to infer type-specific characteristic and confirm if surface properties could be associated to 'functional evolution' of viral surface determinants as seen in H5N1 subtype. This work suggests that integrating genomic, phylogenetic, and structural comparison can help in understanding the 'functional evolution' of avian influenza A virus

    2013 GREAT Day Program

    Get PDF
    SUNY Geneseo’s Seventh Annual GREAT Day.https://knightscholar.geneseo.edu/program-2007/1007/thumbnail.jp
    corecore