3,349 research outputs found

    Machine Learning for Disease Outbreak Detection Using Probabilistic Models

    Get PDF
    RÉSUMÉ L’expansion de maladies connues et l’émergence de nouvelles maladies ont affectĂ© la vie de nombreuses personnes et ont eu des consĂ©quences Ă©conomiques importantes. L’Ébola n’est que le dernier des exemples rĂ©cents. La dĂ©tection prĂ©coce d’infections Ă©pidĂ©miologiques s’avĂšre donc un enjeu de taille. Dans le secteur de la surveillance syndromique, nous avons assistĂ© rĂ©cemment Ă  une prolifĂ©ration d’algorithmes de dĂ©tection d’épidĂ©mies. Leur performance peut varier entre eux et selon diffĂ©rents paramĂštres de configuration, de sorte que l’efficacitĂ© d’un systĂšme de surveillance Ă©pidĂ©miologique s’en trouve d’autant affectĂ©. Pourtant, on ne possĂšde que peu d’évaluations fiables de la performance de ces algorithmes sous diffĂ©rentes conditions et pour diffĂ©rents types d’épidĂ©mie. Les Ă©valuations existantes sont basĂ©es sur des cas uniques et les donnĂ©es ne sont pas du domaine public. Il est donc difficile de comparer ces algorithmes entre eux et difficile de juger de la gĂ©nĂ©ralisation des rĂ©sultats. Par consĂ©quent, nous ne sommes pas en mesure de dĂ©terminer quel d’algorithme devrait ĂȘtre appliquĂ© dans quelles circonstances. Cette thĂšse poursuit trois objectifs gĂ©nĂ©raux : (1) Ă©tablir la relation entre la performance des algorithmes de dĂ©tection d’épidĂ©mies et le type et la sĂ©vĂ©ritĂ© de ces Ă©pidĂ©mies, (2) amĂ©liorer les prĂ©dictions d’épidĂ©mies par la combinaison d’algorithmes et (3) fournir une mĂ©thode d’analyse des Ă©pidĂ©mies qui englobe une perspective de coĂ»ts afin de minimiser l’impact Ă©conomique des erreurs du type faux positifs et faux nĂ©gatifs. L’approche gĂ©nĂ©rale de notre Ă©tude repose sur l’utilisation de donnĂ©es de simulation d’épidĂ©mies dont le vecteur de transmission est un rĂ©seau d’aqueducs. Les donnĂ©es sont obtenues de la plateforme de simulation SnAP du Department of Epidemiology and Biostatistics Surveillance Lab de l’universitĂ© McGill. Cette approche nous permet de crĂ©er les diffĂ©rentes conditions de types et d’intensitĂ©s d’épidĂ©miologie nĂ©cessaires Ă  l’analyse de la performance des algorithmes de dĂ©tection. Le premier objectif porte sur l’influence des diffĂ©rents types et diffĂ©rentes intensitĂ©s d’épidĂ©miologie sur la performance des algorithmes. Elle est modĂ©lisĂ©e Ă  l’aide d’un modĂšle basĂ© sur un rĂ©seau bayĂ©sien. Ce modĂšle prĂ©dit avec succĂšs la variation de performance observĂ©e dans les donnĂ©es. De plus, l’utilisation d’un rĂ©seau bayĂ©sien permet de quantifier l’influence de chaque variable et relĂšve aussi le rĂŽle que jouent d’autres paramĂštres qui Ă©taient jusqu’ici ignorĂ©s dans les travaux antĂ©rieurs, Ă  savoir le seuil de dĂ©tection et l’importance de tenir compte de rĂ©currences hebdomadaires. Le second objectif vise Ă  exploiter les rĂ©sultats autour du premier objectif et de combiner les algorithmes pour optimiser la performance en fonction des facteurs d’influence. Les rĂ©sultats des algorithmes sont combinĂ©s Ă  l’aide de la mĂ©thode de Mixture hiĂ©rarchique d’expert (Hierarchical Mixture of Experts—HME). Le modĂšle HME est entraĂźnĂ© Ă  pondĂ©rer la contribution de chaque algorithme en fonction des donnĂ©es. Les rĂ©sultats de cette combinaison des rĂ©sultats d’algorithmes sont comparables avec les meilleurs rĂ©sultats des algorithmes individuels, et s’avĂšrent plus robustes Ă  travers diffĂ©rentes variations. Le niveau de contamination n’influence pas la performance relative du modĂšle HME. Finalement, nous avons tentĂ© d’optimiser des mĂ©thodes de dĂ©tection d’épidĂ©mies en fonction des coĂ»ts et bĂ©nĂ©fices escomptĂ©s des prĂ©dictions correctes et incorrects. Les rĂ©sultats des algorithms de dĂ©tection sont Ă©valuĂ©s en fonction des dĂ©cisions possibles qui en dĂ©coulent et en tenant compte de donnĂ©es rĂ©elles sur les coĂ»ts totaux d’utilisation des ressources du systĂšme de santĂ©. Dans un premier temps, une rĂ©gression polynomiale permet d’estimer le coĂ»t d’une Ă©pidĂ©mie selon le dĂ©lai de dĂ©tection. Puis, nous avons dĂ©veloppĂ© un modĂšle d’apprentissage d’arbre de dĂ©cision qui tient compte du coĂ»t et qui prĂ©dit les dĂ©tections Ă  partir des algorithmes connus. Les rĂ©sultats expĂ©rimentaux dĂ©montrent que ce modĂšle permet de rĂ©duire le coĂ»t total des Ă©pidĂ©mies, tout en maintenant le niveau de dĂ©tection des Ă©pidĂ©mies comparables Ă  ceux d’autres mĂ©thodes.----------ABSTRACT The past decade has seen the emergence of new diseases or expansion of old ones (such as Ebola) causing high human and financial costs. Hence, early detection of disease outbreaks is crucial. In the field of syndromic surveillance, there has recently been a proliferation of outbreak detection algorithms. The choice of outbreak detection algorithm and its configuration can result in important variations in the performance of public health surveillance systems. But performance evaluations have not kept pace with algorithm development. These evaluations are usually based on a single data set which is not publicly available, so the evaluations are difficult to generalize or replicate. Furthermore, the performance of different algorithms is influenced by the nature of the disease outbreak. As a result of the lack of thorough performance evaluations, one cannot determine which algorithm should be applied under what circumstances. Briefly, this research has three general objectives: (1) characterize the dependence of the performance of detection algorithms on the type and severity of outbreak, (2) aggregate the predictions of several outbreak detection algorithms, (3) analyze outbreak detection methods from a cost-benefit point of view and develop a detection method which minimizes the total cost of missing outbreaks and false alarms. To achieve the first objective, we propose a Bayesian network model learned from simulated outbreak data overlaid on real healthcare utilization data which predicts detection performance as a function of outbreak characteristics and surveillance system parameters. This model predicts the performance of outbreak detection methods with high accuracy. The model can also quantify the influence of different outbreak characteristics and detection methods on detection performance in a variety of practically relevant surveillance scenarios. In addition to identifying outbreak characteristics expected to have a strong influence on detection performance, the learned model suggests a role for other algorithm features, such as alerting threshold and taking weekly patterns into account, which was previously not the focus of attention in the literature. To achieve the second objective, we use Hierarchical Mixture of Experts (HME) to combine the responses of multiple experts (i.e., predictors) which are outbreak detection methods. The contribution of each predictor in forming the final output is learned and depends on the input data. The developed HME algorithm is competitive with the best detection algorithm in the experimental evaluation, and is more robust under different circumstances. The level of contamination of the surveillance time series does not influence the relative performance of the HME. The optimization of outbreak detection methods also relies on the estimation of future benefits of true alarms and the cost of false alarms. In the third part of the thesis, we analyze some commonly used outbreak detection methods in terms of the cost of missing outbreaks and false alarms, using simulated outbreak data overlaid on real healthcare utilization data. We estimate the total cost of missing outbreaks and false alarms, in addition to the accuracy of outbreak detection and we fit a polynomial regression function to estimate the cost of an outbreak based on the delay until it is detected. Then, we develop a cost-sensitive decision tree learner, which predicts outbreaks by looking at the prediction of commonly used detection methods. Experimental results show that using the developed cost-sensitive decision tree decreases the total cost of the outbreak, while the accuracy of outbreak detection remains competitive with commonly used methods

    Analysis of Heterogeneous Data Sources for Veterinary Syndromic Surveillance to Improve Public Health Response and Aid Decision Making

    Get PDF
    The standard technique of implementing veterinary syndromic surveillance (VSyS) is the detection of temporal or spatial anomalies in the occurrence of health incidents above a set threshold in an observed population using the Frequentist modelling approach. Most implementation of this technique also requires the removal of historical outbreaks from the datasets to construct baselines. Unfortunately, some challenges exist, such as data scarcity, delayed reporting of health incidents, and variable data availability from sources, which make the VSyS implementation and alarm interpretation difficult, particularly when quantifying surveillance risk with associated uncertainties. This problem indicates that alternate or improved techniques are required to interpret alarms when incorporating uncertainties and previous knowledge of health incidents into the model to inform decision-making. Such methods must be capable of retaining historical outbreaks to assess surveillance risk. In this research work, the Stochastic Quantitative Risk Assessment (SQRA) model was proposed and developed for detecting and quantifying the risk of disease outbreaks with associated uncertainties using the Bayesian probabilistic approach in PyMC3. A systematic and comparative evaluation of the available techniques was used to select the most appropriate method and software packages based on flexibility, efficiency, usability, ability to retain historical outbreaks, and the ease of developing a model in Python. The social media datasets (Twitter) were first applied to infer a possible disease outbreak incident with associated uncertainties. Then, the inferences were subsequently updated using datasets from the clinical and other healthcare sources to reduce uncertainties in the model and validate the outbreak. Therefore, the proposed SQRA model demonstrates an approach that uses the successive refinement of analysis of different data streams to define a changepoint signalling a disease outbreak. The SQRA model was tested and validated to show the method's effectiveness and reliability for differentiating and identifying risk regions with corresponding changepoints to interpret an ongoing disease outbreak incident. This demonstrates that a technique such as the SQRA method obtained through this research may aid in overcoming some of the difficulties identified in VSyS, such as data scarcity, delayed reporting, and variable availability of data from sources, ultimately contributing to science and practice

    C-Watcher: A Framework for Early Detection of High-Risk Neighborhoods Ahead of COVID-19 Outbreak

    Full text link
    The novel coronavirus disease (COVID-19) has crushed daily routines and is still rampaging through the world. Existing solution for nonpharmaceutical interventions usually needs to timely and precisely select a subset of residential urban areas for containment or even quarantine, where the spatial distribution of confirmed cases has been considered as a key criterion for the subset selection. While such containment measure has successfully stopped or slowed down the spread of COVID-19 in some countries, it is criticized for being inefficient or ineffective, as the statistics of confirmed cases are usually time-delayed and coarse-grained. To tackle the issues, we propose C-Watcher, a novel data-driven framework that aims at screening every neighborhood in a target city and predicting infection risks, prior to the spread of COVID-19 from epicenters to the city. In terms of design, C-Watcher collects large-scale long-term human mobility data from Baidu Maps, then characterizes every residential neighborhood in the city using a set of features based on urban mobility patterns. Furthermore, to transfer the firsthand knowledge (witted in epicenters) to the target city before local outbreaks, we adopt a novel adversarial encoder framework to learn "city-invariant" representations from the mobility-related features for precise early detection of high-risk neighborhoods, even before any confirmed cases known, in the target city. We carried out extensive experiments on C-Watcher using the real-data records in the early stage of COVID-19 outbreaks, where the results demonstrate the efficiency and effectiveness of C-Watcher for early detection of high-risk neighborhoods from a large number of cities.Comment: 11 pages, accepted by AAAI 2021, appendix is include

    Spatial epidemiological approaches to inform leptospirosis surveillance and control: a systematic review and critical appraisal of methods

    Get PDF
    Leptospirosis is a global zoonotic disease that the transmission is driven by complex geographical and temporal variation in demographics, animal hosts and socioecological factors. This results in complex challenges for the identification of high‐risk areas. Spatial and temporal epidemiological tools could be used to support leptospirosis control programs, but the adequacy of its application has not been evaluated. We searched literature in six databases including PubMed, Web of Science, EMBASE, Scopus, SciELO and Zoological Record to systematically review and critically assess the use of spatial and temporal analytical tools for leptospirosis and to provide general framework for its application in future studies. We reviewed 115 articles published between 1930 and October 2018 from 41 different countries. Of these, 65 (56.52%) articles were on human leptospirosis, 39 (33.91%) on animal leptospirosis and 11 (9.5%) used data from both human and animal leptospirosis. Spatial analytical (n = 106) tools were used to describe the distribution of incidence/prevalence at various geographical scales (96.5%) and to explored spatial patterns to detect clustering and hot spots (33%). A total of 51 studies modelled the relationships of various variables on the risk of human (n = 31), animal (n = 17) and both human and animal infection (n = 3). Among those modelling studies, few studies had generated spatially structured models and predictive maps of human (n = 2/31) and animal leptospirosis (n = 1/17). In addition, nine studies applied time‐series analytical tools to predict leptospirosis incidence. Spatial and temporal analytical tools have been greatly utilized to improve our understanding on leptospirosis epidemiology. Yet the quality of the epidemiological data, the selection of covariates and spatial analytical techniques should be carefully considered in future studies to improve usefulness of evidence as tools to support leptospirosis control. A general framework for the application of spatial analytical tools for leptospirosis was proposed

    A systematic review of the data, methods and environmental covariates used to map Aedes-borne arbovirus transmission risk

    Get PDF
    BACKGROUND: Aedes (Stegomyia)-borne diseases are an expanding global threat, but gaps in surveillance make comprehensive and comparable risk assessments challenging. Geostatistical models combine data from multiple locations and use links with environmental and socioeconomic factors to make predictive risk maps. Here we systematically review past approaches to map risk for different Aedes-borne arboviruses from local to global scales, identifying differences and similarities in the data types, covariates, and modelling approaches used. METHODS: We searched on-line databases for predictive risk mapping studies for dengue, Zika, chikungunya, and yellow fever with no geographical or date restrictions. We included studies that needed to parameterise or fit their model to real-world epidemiological data and make predictions to new spatial locations of some measure of population-level risk of viral transmission (e.g. incidence, occurrence, suitability, etc.). RESULTS: We found a growing number of arbovirus risk mapping studies across all endemic regions and arboviral diseases, with a total of 176 papers published 2002-2022 with the largest increases shortly following major epidemics. Three dominant use cases emerged: (i) global maps to identify limits of transmission, estimate burden and assess impacts of future global change, (ii) regional models used to predict the spread of major epidemics between countries and (iii) national and sub-national models that use local datasets to better understand transmission dynamics to improve outbreak detection and response. Temperature and rainfall were the most popular choice of covariates (included in 50% and 40% of studies respectively) but variables such as human mobility are increasingly being included. Surprisingly, few studies (22%, 31/144) robustly tested combinations of covariates from different domains (e.g. climatic, sociodemographic, ecological, etc.) and only 49% of studies assessed predictive performance via out-of-sample validation procedures. CONCLUSIONS: Here we show that approaches to map risk for different arboviruses have diversified in response to changing use cases, epidemiology and data availability. We identify key differences in mapping approaches between different arboviral diseases, discuss future research needs and outline specific recommendations for future arbovirus mapping

    Center for Research on Sustainable Forests 2019 Annual Report

    Get PDF
    The Center for Research on Sustainable Forests (CRSF) continued its evolution as University of Maine research center in FY21 with several new and ongoing initiatives. Despite the continual challenges created by the global pandemic, dedicated CRSF faculty, staff and students have furthered our collaborations and generated numerous outcomes for our stakeholders. Of particular note this past FY, the Northeastern States Research Cooperative (NSRC) awarded 13 new projects across the region, including three involving the University of Maine; the Forest Climate Change Initiative’s Science and Practice monthly webinar series organized with the Forest Stewardship Guild attracted strong participation both internal/external to Maine; and release of th Natural Climate Solutions for Forestry & Agriculture Final Report outlining the potential of alternative management strategies for increasing carbon sequestration. In addition, several external grants were received in FY21 from NASA Carbon Monitoring Systems, a NASA GEDI, several from the USDA, and one from the Maine Department of Inland Fisheries & Wildlife, which help to continue grow the CRSF research program and build capacity within the center

    Challenges in developing methods for quantifying the effects of weather and climate on water-associated diseases: A systematic review

    Get PDF
    Infectious diseases attributable to unsafe water supply, sanitation and hygiene (e.g. Cholera, Leptospirosis, Giardiasis) remain an important cause of morbidity and mortality, especially in low-income countries. Climate and weather factors are known to affect the transmission and distribution of infectious diseases and statistical and mathematical modelling are continuously developing to investigate the impact of weather and climate on water-associated diseases. There have been little critical analyses of the methodological approaches. Our objective is to review and summarize statistical and modelling methods used to investigate the effects of weather and climate on infectious diseases associated with water, in order to identify limitations and knowledge gaps in developing of new methods. We conducted a systematic review of English-language papers published from 2000 to 2015. Search terms included concepts related to water-associated diseases, weather and climate, statistical, epidemiological and modelling methods. We found 102 full text papers that met our criteria and were included in the analysis. The most commonly used methods were grouped in two clusters: process-based models (PBM) and time series and spatial epidemiology (TS-SE). In general, PBM methods were employed when the bio-physical mechanism of the pathogen under study was relatively well known (e.g. Vibrio cholerae); TS-SE tended to be used when the specific environmental mechanisms were unclear (e.g. Campylobacter). Important data and methodological challenges emerged, with implications for surveillance and control of water-associated infections. The most common limitations comprised: non-inclusion of key factors (e.g. biological mechanism, demographic heterogeneity, human behavior), reporting bias, poor data quality, and collinearity in exposures. Furthermore, the methods often did not distinguish among the multiple sources of time-lags (e.g. patient physiology, reporting bias, healthcare access) between environmental drivers/exposures and disease detection. Key areas of future research include: disentangling the complex effects of weather/climate on each exposure-health outcome pathway (e.g. person-to-person vs environment-to-person), and linking weather data to individual cases longitudinally

    Pattern-based prediction of population outbreaks.

    Get PDF
    Resumo: A complexidade e a importĂąncia prĂĄtica dos surtos de insetos tornaram o problema de prever surtos um foco de pesquisa recente. Propomos o mĂ©todo de PrevisĂŁo Baseada em PadrĂ”es (PBP) para prever surtos populacionais. Este mĂ©todo usa informaçÔes sobre valores de sĂ©ries temporais anteriores que precedem um evento de surto como preditores de surtos futuros, o que pode ser Ăștil ao monitorar espĂ©cies de pragas. NĂłs ilustramos o mĂ©todo usando conjuntos de dados simulados e uma sĂ©rie temporal de pulgĂ”es obtida em lavouras de trigo no sul do Brasil. Abstract: The complexity and practical importance of insect outbreaks have made the problem of predicting outbreaks a focus of recent research. We propose the Pattern-Based Prediction (PBP) method for predicting population outbreaks. It uses information on previous time series values that precede an outbreak event as predictors of future outbreaks, which can be helpful when monitoring pest species. We illustrate the methodology using simulated datasets and an aphid time series obtained in wheat crops in Southern Brazil. We obtained an average test accuracy of 84.6% in the simulation studies implemented with stochastic models and 95.0% for predicting outbreaks using a time series of aphids in wheat crops in Southern Brazil. Our results show the PBP method's feasibility in predicting population outbreaks. We benchmarked our results against established state-of-the-art machine learning methods: Support Vector Machines, Deep Neural Networks, Long Short Term Memory and Random Forests. The PBP method yielded a competitive performance associated with higher true-positive rates in most comparisons while providing interpretability rather than being a black-box method. It is an improvement over current state-of-the-art machine learning tools, especially by non-specialists, such as ecologists aiming to use a quantitative approach for pest monitoring. We provide the implemented PBP method in Python through the pypbp package
    • 

    corecore