449 research outputs found

    What's unusual in online disease outbreak news?

    Get PDF
    Background: Accurate and timely detection of public health events of international concern is necessary to help support risk assessment and response and save lives. Novel event-based methods that use the World Wide Web as a signal source offer potential to extend health surveillance into areas where traditional indicator networks are lacking. In this paper we address the issue of systematically evaluating online health news to support automatic alerting using daily disease-country counts text mined from real world data using BioCaster. For 18 data sets produced by BioCaster, we compare 5 aberration detection algorithms (EARS C2, C3, W2, F-statistic and EWMA) for performance against expert moderated ProMED-mail postings. Results: We report sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), mean alerts/100 days and F1, at 95% confidence interval (CI) for 287 ProMED-mail postings on 18 outbreaks across 14 countries over a 366 day period. Results indicate that W2 had the best F1 with a slight benefit for day of week effect over C2. In drill down analysis we indicate issues arising from the granular choice of country-level modeling, sudden drops in reporting due to day of week effects and reporting bias. Automatic alerting has been implemented in BioCaster available from http://born.nii.ac.jp. Conclusions: Online health news alerts have the potential to enhance manual analytical methods by increasing throughput, timeliness and detection rates. Systematic evaluation of health news aberrations is necessary to push forward our understanding of the complex relationship between news report volumes and case numbers and to select the best performing features and algorithms

    Disease surveillance systems

    Get PDF
    Recent advances in information and communication technologies have made the development and operation of complex disease surveillance systems technically feasible, and many systems have been proposed to interpret diverse data sources for health-related signals. Implementing these systems for daily use and efficiently interpreting their output, however, remains a technical challenge. This thesis presents a method for understanding disease surveillance systems structurally, examines four existing systems, and discusses the implications of developing such systems. The discussion is followed by two papers. The first paper describes the design of a national outbreak detection system for daily disease surveillance. It is currently in use at the Swedish Institute for Communicable Disease Control. The source code has been licenced under GNU v3 and is freely available. The second paper discusses methodological issues in computational epidemiology, and presents the lessons learned from a software development project in which a spatially explicit micro-meso-macro model for the entire Swedish population was built based on registry data

    Use of data mining and artificial intelligence to derive public health evidence from large datasets

    Get PDF
    This thesis explores the use of data mining and AI-tailored frameworks for extracting public health evidence from large health datasets. The research presented in this thesis demonstrates the potential of these tools for automating and simplifying the data mining process, and for providing valuable insights into various public health issues.In Paper I, we used data mining and natural language processing to analyze the characteristics of genomic research on non-communicable diseases (NCDs) from the GWAS Catalog (2005 to 2022). We found that the majority of research institutions leading the work are often US-based and the majority of first, senior and all authors were male. The vast majority of complex trait GWAS has been performed in European ancestry populations, with cohorts and scientists predominantly located in medium-to-high socioeconomically ranked countries. This lack of diversity in both the data and the authorship of GWAS research has potential implications for the generalizability of genetic discoveries and the development of future interventions.In Paper II, we analyzed data collected through the app-based COVID Symptom Study in Sweden. We then created a symptom-based model to estimate the individual probability of symptomatic COVID-19 and employed this to estimate daily regional COVID-19 prevalence. We also used this data to predict next week COVID-19 hospital admissions and compared it to a model based on case notifications. We found that the symptom-based model had a lower median absolute percentage error during the first wave of the pandemic and that the model was transferable to an English dataset. The findings of this study demonstrate the feasibility of large-scale syndromic surveillance and the potential for population-based participatory surveillance initiatives in future pandemics and epidemics.In Paper III, we used data from over 500,000 participants in the COVID Symptom Study to investigate the impact of obesity and diabetes on the symptoms and duration of long-COVID. Using advanced data mining techniques, we found that individuals with higher BMI and diabetes had a higher burden of symptoms during the initial COVID-19 infection and a prolonged duration of long-COVID symptoms. We also found that vaccination had a protective effect against both COVID-19 symptoms and long-COVID symptoms in these at-risk groups. Our results demonstrate the disproportionate impact of COVID-19 on certain populations and the utility of app-based syndromic surveillance in providing timely and accurate information on the spread and impact of the virus
    • …
    corecore