117 research outputs found
Scraping social media photos posted in Kenya and elsewhere to detect and analyze food types
Monitoring population-level changes in diet could be useful for education and for implementing interventions to improve health. Research has shown that data from social media sources can be used for monitoring dietary behavior. We propose a scrape-by-location methodology to create food image datasets from Instagram posts. We used it to collect 3.56 million images over a period of 20 days in March 2019. We also propose a scrape-by-keywords methodology and used it to scrape ∼30,000 images and their captions of 38 Kenyan food types. We publish two datasets of 104,000 and 8,174 image/caption pairs, respectively. With the first dataset, Kenya104K, we train a Kenyan Food Classifier, called KenyanFC, to distinguish Kenyan food from non-food images posted in
Kenya. We used the second dataset, KenyanFood13, to train a classifier KenyanFTR, short for Kenyan Food Type Recognizer, to recognize 13 popular food types in Kenya. The KenyanFTR is a multimodal deep neural network that can identify 13 types of Kenyan foods using both images and their corresponding captions. Experiments show that the average top-1 accuracy of KenyanFC is 99% over 10,400 tested Instagram images and of KenyanFTR is 81% over 8,174 tested data points. Ablation studies show that three of the 13 food types are particularly difficult to categorize based on image content only and that adding analysis of captions to the image analysis yields a classifier that is 9 percent points more accurate than a classifier that relies only on images. Our food trend analysis revealed that cakes and roasted meats were the most popular foods in photographs on Instagram in Kenya in March 2019.Accepted manuscrip
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
We present a machine learning-based methodology capable of providing
real-time ("nowcast") and forecast estimates of influenza activity in the US by
leveraging data from multiple data sources including: Google searches, Twitter
microblogs, nearly real-time hospital visit records, and data from a
participatory surveillance system. Our main contribution consists of combining
multiple influenza-like illnesses (ILI) activity estimates, generated
independently with each data source, into a single prediction of ILI utilizing
machine learning ensemble approaches. Our methodology exploits the information
in each data source and produces accurate weekly ILI predictions for up to four
weeks ahead of the release of CDC's ILI reports. We evaluate the predictive
ability of our ensemble approach during the 2013-2014 (retrospective) and
2014-2015 (live) flu seasons for each of the four weekly time horizons. Our
ensemble approach demonstrates several advantages: (1) our ensemble method's
predictions outperform every prediction using each data source independently,
(2) our methodology can produce predictions one week ahead of GFT's real-time
estimates with comparable accuracy, and (3) our two and three week forecast
estimates have comparable accuracy to real-time predictions using an
autoregressive model. Moreover, our results show that considerable insight is
gained from incorporating disparate data streams, in the form of social media
and crowd sourced data, into influenza predictions in all time horizon
Recommended from our members
A systematic review of studies on forecasting the dynamics of influenza outbreaks
Forecasting the dynamics of influenza outbreaks could be useful for decision-making regarding the allocation of public health resources. Reliable forecasts could also aid in the selection and implementation of interventions to reduce morbidity and mortality due to influenza illness. This paper reviews methods for influenza forecasting proposed during previous influenza outbreaks and those evaluated in hindsight. We discuss the various approaches, in addition to the variability in measures of accuracy and precision of predicted measures. PubMed and Google Scholar searches for articles on influenza forecasting retrieved sixteen studies that matched the study criteria. We focused on studies that aimed at forecasting influenza outbreaks at the local, regional, national, or global level. The selected studies spanned a wide range of regions including USA, Sweden, Hong Kong, Japan, Singapore, United Kingdom, Canada, France, and Cuba. The methods were also applied to forecast a single measure or multiple measures. Typical measures predicted included peak timing, peak height, daily/weekly case counts, and outbreak magnitude. Due to differences in measures used to assess accuracy, a single estimate of predictive error for each of the measures was difficult to obtain. However, collectively, the results suggest that these diverse approaches to influenza forecasting are capable of capturing specific outbreak measures with some degree of accuracy given reliable data and correct disease assumptions. Nonetheless, several of these approaches need to be evaluated and their performance quantified in real-time predictions
Recommended from our members
Monitoring Influenza Epidemics in China with Search Query from Baidu
Several approaches have been proposed for near real-time detection and prediction of the spread of influenza. These include search query data for influenza-related terms, which has been explored as a tool for augmenting traditional surveillance methods. In this paper, we present a method that uses Internet search query data from Baidu to model and monitor influenza activity in China. The objectives of the study are to present a comprehensive technique for: (i) keyword selection, (ii) keyword filtering, (iii) index composition and (iv) modeling and detection of influenza activity in China. Sequential time-series for the selected composite keyword index is significantly correlated with Chinese influenza case data. In addition, one-month ahead prediction of influenza cases for the first eight months of 2012 has a mean absolute percent error less than 11%. To our knowledge, this is the first study on the use of search query data from Baidu in conjunction with this approach for estimation of influenza activity in China
Modeling to Predict Cases of Hantavirus Pulmonary Syndrome in Chile
Background: Hantavirus pulmonary syndrome (HPS) is a life threatening disease transmitted by the rodent Oligoryzomys longicaudatus in Chile. Hantavirus outbreaks are typically small and geographically confined. Several studies have estimated risk based on spatial and temporal distribution of cases in relation to climate and environmental variables, but few have considered climatological modeling of HPS incidence for monitoring and forecasting purposes. Methodology Monthly counts of confirmed HPS cases were obtained from the Chilean Ministry of Health for 2001–2012. There were an estimated 667 confirmed HPS cases. The data suggested a seasonal trend, which appeared to correlate with changes in climatological variables such as temperature, precipitation, and humidity. We considered several Auto Regressive Integrated Moving Average (ARIMA) time-series models and regression models with ARIMA errors with one or a combination of these climate variables as covariates. We adopted an information-theoretic approach to model ranking and selection. Data from 2001–2009 were used in fitting and data from January 2010 to December 2012 were used for one-step-ahead predictions. Results: We focused on six models. In a baseline model, future HPS cases were forecasted from previous incidence; the other models included climate variables as covariates. The baseline model had a Corrected Akaike Information Criterion (AICc) of 444.98, and the top ranked model, which included precipitation, had an AICc of 437.62. Although the AICc of the top ranked model only provided a 1.65% improvement to the baseline AICc, the empirical support was 39 times stronger relative to the baseline model. Conclusions: Instead of choosing a single model, we present a set of candidate models that can be used in modeling and forecasting confirmed HPS cases in Chile. The models can be improved by using data at the regional level and easily extended to other countries with seasonal incidence of HPS
- …