Search CORE

43 research outputs found

Recommended from our members

Advancing The Scope of Nontraditional Data Streams: Using Internet Data To Understand Human Behavior, Account For Bias, and Improve Forecasting

Author: Daughton Ashlynn Rae
Publication venue: University of Colorado Boulder
Publication date: 14/11/2019
Field of study

Internet data are pervasive. An almost incomprehensible amount of data1 are generated daily as a result of search engines, email, social media, and other web platforms[1]. Much of these data are publicly available and can provide direct insight into individual's thoughts and behaviors. In particular, many fields, including disease surveillance and political science, have found these data to be useful for predictive modeling. Traditionally, predictive models have relied on data from official or academic sources including surveys, outbreak case studies, and official disease reports. While these data are often considered accurate, they are time consuming and inefficient to collect. In situations that require real-time decision making, stale official data can produce stale models. Internet data streams can compliment official reports to produce better, more comprehensive models. This thesis contributes to multiple domains' understanding of when and how to use nontraditional data sources, the kinds of data available in online spaces, and appropriate methods to incorporate these data into downstream applications. Infuenza and Zika and the political peace process in Colombia from 2011-2017 are used as case studies. Using case studies from multiple domain spaces provides a new breadth of understanding of Internet data. In particular, this thesis will: (1) Use Internet data streams to identify human behaviors (2) Develop methods to understand bias in Internet data streams and classification algorithms (3) Evaluate usefulness of Internet data in various forecasting models To answer these questions, I use large corpora of Internet data, including social media data. Methods are drawn from natural language processing, social computing, data mining, computational epidemiology, and machine learning literature. Accurate models help decision makers make timely and effective decisions about interventions. Currently, models are a common first line consideration for decision makers at all levels of government, globally. Better understanding of available traces in nontraditional data streams, and more nuanced understanding of modeling decisions can immediately impact decision makers throughout our country and the globe2. 1 Estimated 2.5 exabytes daily in May 2018 [1]. 2 The Los Alamos Unlimited Release number for this document is LA-UR-19-31417.</p

CU Scholar Institutional Repository

Estimating influenza incidence using search query deceptiveness and generalized ridge regression

Author: Barnard Martha
Daughton Ashlynn R.
O'Connell Fiona
Osthus Dave
Priedhorsky Reid
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 11/01/2019
Field of study

Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

Epidemiological data challenges: planning for a more robust future through data standards

Author: Daughton Ashlynn R.
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Khalsa Hari
Priedhorsky Reid
Tasseff Byron
Velappan Nileena
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2018
Field of study

Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

arXiv.org e-Print Archive

Directory of Open Access Journals

Frontiers - Publisher Connector

Salivary microbiomes of indigenous Tsimane mothers and infants are distinct despite frequent premastication

Author: Armand E.K. Dichosa
Ashlynn R. Daughton
Cliff S. Han
Hillard Kaplan
Joe Alcock
Melanie Ann Martin
Michael D. Gurven
Seth Frietze
Publication venue: 'PeerJ'
Publication date: 01/01/2016
Field of study

Background Premastication, the transfer of pre-chewed food, is a common infant and young child feeding practice among the Tsimane, forager-horticulturalists living in the Bolivian Amazon. Research conducted primarily with Western populations has shown that infants harbor distinct oral microbiota from their mothers. Premastication, which is less common in these populations, may influence the colonization and maturation of infant oral microbiota, including via transmission of oral pathogens. We collected premasticated food and saliva samples from Tsimane mothers and infants (9–24 months of age) to test for evidence of bacterial transmission in premasticated foods and overlap in maternal and infant salivary microbiota. We extracted bacterial DNA from two premasticated food samples and 12 matched salivary samples from maternal-infant pairs. DNA sequencing was performed with MiSeq (Illumina). We evaluated maternal and infant microbial composition in terms of relative abundance of specific taxa, alpha and beta diversity, and dissimilarity distances. Results The bacteria in saliva and premasticated food were mapped to 19 phyla and 400 genera and were dominated by Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes. The oral microbial communities of Tsimane mothers and infants who frequently share premasticated food were well-separated in a non-metric multi-dimensional scaling ordination (NMDS) plot. Infant microbiotas clustered together, with weighted Unifrac distances significantly differing between mothers and infants. Infant saliva contained more Firmicutes (p < 0.01) and fewer Proteobacteria (p < 0.05) than did maternal saliva. Many genera previously associated with dental and periodontal infections, e.g. Neisseria, Gemella, Rothia, Actinomyces, Fusobacterium, and Leptotrichia, were more abundant in mothers than in infants. Conclusions Salivary microbiota of Tsimane infants and young children up to two years of age do not appear closely related to those of their mothers, despite frequent premastication and preliminary evidence that maternal bacteria is transmitted to premasticated foods. Infant physiology and diet may constrain colonization by maternal bacteria, including several oral pathogens

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Chapman University Digital Commons

The Biosurveillance Analytics Resource Directory (BARD): Facilitating the Use of Epidemiological Models for Infectious Disease Surveillance

Author: Abeyta Esteban
Althouse Ben
Burkom Howard
Castro Lauren
Daughton Ashlynn
Del Valle Sara Y
Deshpande Alina
Fairchild Geoffrey
Generous Nicholas
Hyman James M
Kiang Richard
Margevicius Kristen J
Morse Andrew P
Pancerella Carmen M
Pullum Laura
Ramanathan Arvind
Schlegelmilch Jeffrey
Scott Aaron
Taylor-McCabe Kirsten J
Vespignani Alessandro
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

Epidemiological modeling for infectious disease is important for disease management and its routine implementation needs to be facilitated through better description of models in an operational context. A standardized model characterization process that allows selection or making manual comparisons of available models and their results is currently lacking. A key need is a universal framework to facilitate model description and understanding of its features. Los Alamos National Laboratory (LANL) has developed a comprehensive framework that can be used to characterize an infectious disease model in an operational context. The framework was developed through a consensus among a panel of subject matter experts. In this paper, we describe the framework, its application to model characterization, and the development of the Biosurveillance Analytics Resource Directory (BARD; http://brd.bsvgateway.org/brd/), to facilitate the rapid selection of operational models for specific infectious/communicable diseases. We offer this framework and associated database to stakeholders of the infectious disease modeling field as a tool for standardizing model description and facilitating the use of epidemiological models

University of Liverpool Repository

Crossref

Columbia University Academic Commons

Directory of Open Access Journals

PubMed Central

Using phage display selected antibodies to dissect microbiomes for complete de novo genome sequencing of low abundance microbes

Author: Andrew RM Bradbury
Armand EK Dichosa
Ashlynn R Daughton
Cliff S Han
Csaba Kiss
Devin W Close
Fortunato Ferrara
Hajnalka E Daligault
Krista G Reitenga
Nileena Velappan
Sandeep Kumar
Srinivas Iyer
Timothy C Sanchez
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Springer - Publisher Connector

Evaluation of Point of Need Diagnostic Tests for Use in California Influenza Outbreaks

Author: Daughton Ashlynn
Deshpande Alina
Publication venue: University of Illinois at Chicago Library
Publication date: 24/03/2016
Field of study

Because of the potential threats flu viruses pose, the United States, like many developed countries, has a very well established flu surveillance system consisting of 10 components collecting laboratory data, mortality data, hospitalization data and sentinel outpatient care data. Currently, this surveillance system is estimated to lag behind the actual seasonal outbreak by one to two weeks. As new data streams come online, it is important to understand what added benefit they bring to the flu surveillance system complex. For data streams to be effective, they should provide data in a more timely fashion or provide additional data that current surveillance systems cannot provide. Two multiplexed diagnostic tools designed to test syndromically relevant pathogens and wirelessly upload data for rapid integration and interpretation were evaluated to see how they fit into the influenza surveillance scheme in California

University of Illinois at Chicago: Journals@UIC

Evaluation of Point of Need Diagnostic Tests for Use in California Influenza Outbreaks

Author: Daughton Ashlynn
Deshpande Alina
Publication venue: 'University of Illinois Libraries'
Publication date: 24/03/2016
Field of study

University of Illinois at Chicago: Journals@UIC

Crossref

Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited.

Author: Ashlynn R Daughton
Dave Osthus
Reid Priedhorsky
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/02/2019
Field of study

The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when "perfect" nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches

Directory of Open Access Journals

The Francis Crick Institute