25 research outputs found

    Estimating influenza incidence using search query deceptiveness and generalized ridge regression

    Full text link
    Seasonal influenza is a sometimes surprisingly impactful disease, causing thousands of deaths per year along with much additional morbidity. Timely knowledge of the outbreak state is valuable for managing an effective response. The current state of the art is to gather this knowledge using in-person patient contact. While accurate, this is time-consuming and expensive. This has motivated inquiry into new approaches using internet activity traces, based on the theory that lay observations of health status lead to informative features in internet data. These approaches risk being deceived by activity traces having a coincidental, rather than informative, relationship to disease incidence; to our knowledge, this risk has not yet been quantitatively explored. We evaluated both simulated and real activity traces of varying deceptiveness for influenza incidence estimation using linear regression. We found that deceptiveness knowledge does reduce error in such estimates, that it may help automatically-selected features perform as well or better than features that require human curation, and that a semantic distance measure derived from the Wikipedia article category tree serves as a useful proxy for deceptiveness. This suggests that disease incidence estimation models should incorporate not only data about how internet features map to incidence but also additional data to estimate feature deceptiveness. By doing so, we may gain one more step along the path to accurate, reliable disease incidence estimation using internet data. This capability would improve public health by decreasing the cost and increasing the timeliness of such estimates.Comment: 27 pages, 8 figure

    Epidemiological data challenges: planning for a more robust future through data standards

    Get PDF
    Accessible epidemiological data are of great value for emergency preparedness and response, understanding disease progression through a population, and building statistical and mechanistic disease models that enable forecasting. The status quo, however, renders acquiring and using such data difficult in practice. In many cases, a primary way of obtaining epidemiological data is through the internet, but the methods by which the data are presented to the public often differ drastically among institutions. As a result, there is a strong need for better data sharing practices. This paper identifies, in detail and with examples, the three key challenges one encounters when attempting to acquire and use epidemiological data: 1) interfaces, 2) data formatting, and 3) reporting. These challenges are used to provide suggestions and guidance for improvement as these systems evolve in the future. If these suggested data and interface recommendations were adhered to, epidemiological and public health analysis, modeling, and informatics work would be significantly streamlined, which can in turn yield better public health decision-making capabilities.Comment: v2 includes several typo fixes; v3 adds a paragraph on backfill; v4 adds 2 new paragraphs to the conclusion that address Frontiers reviewer comments; v5 adds some minor modifications that address additional reviewer comment

    Salivary microbiomes of indigenous Tsimane mothers and infants are distinct despite frequent premastication

    Get PDF
    Background Premastication, the transfer of pre-chewed food, is a common infant and young child feeding practice among the Tsimane, forager-horticulturalists living in the Bolivian Amazon. Research conducted primarily with Western populations has shown that infants harbor distinct oral microbiota from their mothers. Premastication, which is less common in these populations, may influence the colonization and maturation of infant oral microbiota, including via transmission of oral pathogens. We collected premasticated food and saliva samples from Tsimane mothers and infants (9–24 months of age) to test for evidence of bacterial transmission in premasticated foods and overlap in maternal and infant salivary microbiota. We extracted bacterial DNA from two premasticated food samples and 12 matched salivary samples from maternal-infant pairs. DNA sequencing was performed with MiSeq (Illumina). We evaluated maternal and infant microbial composition in terms of relative abundance of specific taxa, alpha and beta diversity, and dissimilarity distances. Results The bacteria in saliva and premasticated food were mapped to 19 phyla and 400 genera and were dominated by Firmicutes, Proteobacteria, Actinobacteria, and Bacteroidetes. The oral microbial communities of Tsimane mothers and infants who frequently share premasticated food were well-separated in a non-metric multi-dimensional scaling ordination (NMDS) plot. Infant microbiotas clustered together, with weighted Unifrac distances significantly differing between mothers and infants. Infant saliva contained more Firmicutes (p < 0.01) and fewer Proteobacteria (p < 0.05) than did maternal saliva. Many genera previously associated with dental and periodontal infections, e.g. Neisseria, Gemella, Rothia, Actinomyces, Fusobacterium, and Leptotrichia, were more abundant in mothers than in infants. Conclusions Salivary microbiota of Tsimane infants and young children up to two years of age do not appear closely related to those of their mothers, despite frequent premastication and preliminary evidence that maternal bacteria is transmitted to premasticated foods. Infant physiology and diet may constrain colonization by maternal bacteria, including several oral pathogens

    Even a good influenza forecasting model can benefit from internet-based nowcasts, but those benefits are limited.

    No full text
    The ability to produce timely and accurate flu forecasts in the United States can significantly impact public health. Augmenting forecasts with internet data has shown promise for improving forecast accuracy and timeliness in controlled settings, but results in practice are less convincing, as models augmented with internet data have not consistently outperformed models without internet data. In this paper, we perform a controlled experiment, taking into account data backfill, to improve clarity on the benefits and limitations of augmenting an already good flu forecasting model with internet-based nowcasts. Our results show that a good flu forecasting model can benefit from the augmentation of internet-based nowcasts in practice for all considered public health-relevant forecasting targets. The degree of forecast improvement due to nowcasting, however, is uneven across forecasting targets, with short-term forecasting targets seeing the largest improvements and seasonal targets such as the peak timing and intensity seeing relatively marginal improvements. The uneven forecasting improvements across targets hold even when "perfect" nowcasts are used. These findings suggest that further improvements to flu forecasting, particularly seasonal targets, will need to derive from other, non-nowcasting approaches

    Novel Use of Flu Surveillance Data: Evaluating Potential of Sentinel Populations for Early Detection of Influenza Outbreaks

    No full text
    <div><p>Influenza causes significant morbidity and mortality each year, with 2–8% of weekly outpatient visits around the United States for influenza-like-illness (ILI) during the peak of the season. Effective use of existing flu surveillance data allows officials to understand and predict current flu outbreaks and can contribute to reductions in influenza morbidity and mortality. Previous work used the 2009–2010 influenza season to investigate the possibility of using existing military and civilian surveillance systems to improve early detection of flu outbreaks. Results suggested that civilian surveillance could help predict outbreak trajectory in local military installations. To further test that hypothesis, we compare pairs of civilian and military outbreaks in seven locations between 2000 and 2013. We find no predictive relationship between outbreak peaks or time series of paired outbreaks. This larger study does not find evidence to support the hypothesis that civilian data can be used as sentinel surveillance for military installations. We additionally investigate the effect of modifying the ILI case definition between the standard Department of Defense definition, a more specific definition proposed in literature, and confirmed Influenza A. We find that case definition heavily impacts results. This study thus highlights the importance of careful selection of case definition, and appropriate consideration of case definition in the interpretation of results.</p></div

    Outbreak years and number outbreaks included per location Outbreak years (Number of outbreaks).

    No full text
    <p>Outbreak years and number outbreaks included per location Outbreak years (Number of outbreaks).</p

    Data processing and related tools/ software.

    No full text
    <p>This figure illustrates how each dataset was transformed from the raw data to the processed data used in analyses. Starting points for raw data are shown in blue. Each box indicates a dataset at some stage of processing (e.g. a ‘noun’). Each arrow corresponds to the tools and actions used to transform one dataset into another (e.g. the ‘verb’ acting on the dataset ‘noun’) and is numbered to correspond with the text narrative. Dataset names correspond to those used above. Ultimately, all data was converted into time series of %ILI or confirmed cases. Then outbreaks from the same season were paired based on geographic proximity and analyzed.</p
    corecore