4,405 research outputs found
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
Deep-Learning for Classification of Colorectal Polyps on Whole-Slide Images
Histopathological characterization of colorectal polyps is an important
principle for determining the risk of colorectal cancer and future rates of
surveillance for patients. This characterization is time-intensive, requires
years of specialized training, and suffers from significant inter-observer and
intra-observer variability. In this work, we built an automatic
image-understanding method that can accurately classify different types of
colorectal polyps in whole-slide histology images to help pathologists with
histopathological characterization and diagnosis of colorectal polyps. The
proposed image-understanding method is based on deep-learning techniques, which
rely on numerous levels of abstraction for data representation and have shown
state-of-the-art results for various image analysis tasks. Our
image-understanding method covers all five polyp types (hyperplastic polyp,
sessile serrated polyp, traditional serrated adenoma, tubular adenoma, and
tubulovillous/villous adenoma) that are included in the US multi-society task
force guidelines for colorectal cancer risk assessment and surveillance, and
encompasses the most common occurrences of colorectal polyps. Our evaluation on
239 independent test samples shows our proposed method can identify the types
of colorectal polyps in whole-slide images with a high efficacy (accuracy:
93.0%, precision: 89.7%, recall: 88.3%, F1 score: 88.8%). The presented method
in this paper can reduce the cognitive burden on pathologists and improve their
accuracy and efficiency in histopathological characterization of colorectal
polyps, and in subsequent risk assessment and follow-up recommendations
Doctor of Philosophy
dissertationPublic health surveillance systems are crucial for the timely detection and response to public health threats. Since the terrorist attacks of September 11, 2001, and the release of anthrax in the following month, there has been a heightened interest in public health surveillance. The years immediately following these attacks were met with increased awareness and funding from the federal government which has significantly strengthened the United States surveillance capabilities; however, despite these improvements, there are substantial challenges faced by today's public health surveillance systems. Problems with the current surveillance systems include: a) lack of leveraging unstructured public health data for surveillance purposes; and b) lack of information integration and the ability to leverage resources, applications or other surveillance efforts due to systems being built on a centralized model. This research addresses these problems by focusing on the development and evaluation of new informatics methods to improve the public health surveillance. To address the problems above, we first identified a current public surveillance workflow which is affected by the problems described and has the opportunity for enhancement through current informatics techniques. The 122 Mortality Surveillance for Pneumonia and Influenza was chosen as the primary use case for this dissertation work. The second step involved demonstrating the feasibility of using unstructured public health data, in this case death certificates. For this we created and evaluated a pipeline iv composed of a detection rule and natural language processor, for the coding of death certificates and the identification of pneumonia and influenza cases. The second problem was addressed by presenting the rationale of creating a federated model by leveraging grid technology concepts and tools for the sharing and epidemiological analyses of public health data. As a case study of this approach, a secured virtual organization was created where users are able to access two grid data services, using death certificates from the Utah Department of Health, and two analytical grid services, MetaMap and R. A scientific workflow was created using the published services to replicate the mortality surveillance workflow. To validate these approaches, and provide proofs-of-concepts, a series of real-world scenarios were conducted
Investigating Sociodemographic Disparities in Cancer Risk Using Web-Based Informatics
Cancer health disparities due to demographic and socioeconomic factors are an area of great interest in the epidemiological community. Adjusting for such factors is important when developing cancer risk models. However, for digital epidemiology studies relying on online sources such information is not readily available. This paper presents a novel method for extracting demographic and socioeconomic information from openly available online obituaries. The method relies on tailored language processing rules and a probabilistic scheme to map subjects’ occupation history to the occupation classification codes and related earnings provided by the U.S. Census Bureau. Using this information, a case-control study is executed fully in silico to investigate how age, gender, parity, and income level impact breast and lung cancer risk. Based on 48,368 online obituaries (4,643 for breast cancer, 6,274 for lung cancer, and 37,451 cancer-free) collected automatically and a generalized cancer risk model, our study shows strong association between age, parity, and socioeconomic status and cancer risk. Although for breast cancer the observed trends are very consistent with traditional epidemiological studies, some inconsistency is observed for lung cancer with respect to socioeconomic status
Public Health and Epidemiology Informatics: Recent Research and Trends in the United States
Objectives
To survey advances in public health and epidemiology informatics over the past three years.
Methods
We conducted a review of English-language research works conducted in the domain of public health informatics (PHI), and published in MEDLINE between January 2012 and December 2014, where information and communication technology (ICT) was a primary subject, or a main component of the study methodology. Selected articles were synthesized using a thematic analysis using the Essential Services of Public Health as a typology.
Results
Based on themes that emerged, we organized the advances into a model where applications that support the Essential Services are, in turn, supported by a socio-technical infrastructure that relies on government policies and ethical principles. That infrastructure, in turn, depends upon education and training of the public health workforce, development that creates novel or adapts existing infrastructure, and research that evaluates the success of the infrastructure. Finally, the persistence and growth of infrastructure depends on financial sustainability.
Conclusions
Public health informatics is a field that is growing in breadth, depth, and complexity. Several Essential Services have benefited from informatics, notably, “Monitor Health,” “Diagnose & Investigate,” and “Evaluate.” Yet many Essential Services still have not yet benefited from advances such as maturing electronic health record systems, interoperability amongst health information systems, analytics for population health management, use of social media among consumers, and educational certification in clinical informatics. There is much work to be done to further advance the science of PHI as well as its impact on public health practice
Enhancing Drug Overdose Mortality Surveillance through Natural Language Processing and Machine Learning
Epidemiological surveillance is key to monitoring and assessing the health of populations. Drug overdose surveillance has become an increasingly important part of public health practice as overdose morbidity and mortality has increased due in large part to the opioid crisis. Monitoring drug overdose mortality relies on death certificate data, which has several limitations including timeliness and the coding structure used to identify specific substances that caused death. These limitations stem from the need to analyze the free-text cause-of-death sections of the death certificate that are completed by the medical certifier during death investigation. Other fields, including clinical sciences, have utilized natural language processing (NLP) methods to gain insight from free-text data, but thus far, adoption of NLP methods in epidemiological surveillance has been limited. Through a narrative review of NLP methods currently used in public health surveillance and the integration of two NLP tasks, classification and named entity recognition, this dissertation enhances the capabilities of public health practitioners and researchers to perform drug overdose mortality surveillance. This dissertation advances both surveillance science and public health practice by integrating methods from bioinformatics into the surveillance pipeline which provides more timely and increased quality overdose mortality surveillance, which is essential to guiding effective public health response to the continuing drug overdose epidemic
Addendum to Informatics for Health 2017: Advancing both science and practice
This article presents presentation and poster abstracts that were mistakenly omitted from the original publication
Extracting information from the text of electronic medical records to improve case detection: a systematic review
Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality.
Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed.
Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025).
Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)
International Society for Disease Surveillance Conference 2011: Building the Future of Public Health Surveillance: Building the Future of Public Health Surveillance
Daniel Reidpath - ORCID: 0000-0002-8796-0420 https://orcid.org/0000-0002-8796-04204pubpub1117
Public health surveillance : preparing for the future
Suggested citation: Office of Public Health Scientific Services. Centers for Disease Control and Prevention. Public Health Surveillance: Preparing for the Future. Atlanta, GA: Centers for Disease Control andPrevention; September 2018.Surveillance-Series-Bookleth.pdf2018992
- …