Search CORE

4 research outputs found

Advances in nowcasting influenza-like illness rates using search query logs

Author: Crossan S
Lampos V
Miller AC
Stefansen C
Publication venue: Nature Publishing Group
Publication date: 03/08/2015
Field of study

User-generated content can assist epidemiological surveillance in the early detection and prevalence estimation of infectious diseases, such as influenza. Google Flu Trends embodies the first public platform for transforming search queries to indications about the current state of flu in various places all over the world. However, the original model significantly mispredicted influenza-like illness rates in the US during the 2012–13 flu season. In this work, we build on the previous modeling attempt, proposing substantial improvements. Firstly, we investigate the performance of a widely used linear regularized regression solver, known as the Elastic Net. Then, we expand on this model by incorporating the queries selected by the Elastic Net into a nonlinear regression framework, based on a composite Gaussian Process. Finally, we augment the query-only predictions with an autoregressive model, injecting prior knowledge about the disease. We assess predictive performance using five consecutive flu seasons spanning from 2008 to 2013 and qualitatively explain certain shortcomings of the previous approach. Our results indicate that a nonlinear query modeling approach delivers the lowest cumulative nowcasting error, and also suggest that query information significantly improves autoregressive inferences, obtaining state-of-the-art performance

UCL Discovery

PubMed Central

Enhancing Feature Selection Using Word Embeddings: The Case of Flu Surveillance

Author: Cox IJ
Lampos V
Zou B
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/04/2017
Field of study

Health surveillance systems based on online user-generated content often rely on the identification of textual markers that are related to a target disease. Given the high volume of available data, these systems benefit from an automatic feature selection process. This is accomplished either by applying statistical learning techniques, which do not consider the semantic relationship between the selected features and the inference task, or by developing labour-intensive text classifiers. In this paper, we use neural word embeddings, trained on social media content from Twitter, to determine, in an unsupervised manner, how strongly textual features are semantically linked to an underlying health concept. We then refine conventional feature selection methods by a priori operating on textual variables that are sufficiently close to a target concept. Our experiments focus on the supervised learning problem of estimating influenza-like illness rates from Google search queries. A "flu infection" concept is formulated and used to reduce spurious and potentially confounding features that were selected by previously applied approaches. In this way, we also address forms of scepticism regarding the appropriateness of the feature space, alleviating potential cases of overfitting. Ultimately, the proposed hybrid feature selection method creates a more reliable model that, according to our empirical analysis, improves the inference performance (Mean Absolute Error) of linear and nonlinear regressors by 12% and 28.7%, respectively

UCL Discovery

Assessing the impact of a health intervention via user-generated Internet content

Author: A Culotta
A Monto
A Signorini
AC Hayward
AE Hoerl
AM Presanis
B Efron
B Efron
B Matérn
B O’Hara
C Chew
CE Rasmussen
CE Rasmussen
D Lazer
DJ Smith
DK Duvenaud
DM Morens
DR Olson
Elad Yom-Tov
G Boivin
GJ Milinovich
GJD Smith
H Zou
Ingemar J. Cox
J Bollen
J Ginsberg
JG Petrie
KE Jones
M Baguelin
MA Oliver
MJ Paul
ML Cohen
MT Osterholm
N Cristianini
P Zhao
PM Polgreen
R Tibshirani
RG Pebody
Richard Pebody
S Binder
S Briand
S Cook
T Hastie
V Lampos
Vasileios Lampos
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign

Crossref

Springer - Publisher Connector

UCL Discovery

Copenhagen University Research Information System

Assessing the impact of a health intervention via user-generated Internet content

Author: A Culotta
A Monto
A Signorini
AC Hayward
AE Hoerl
AM Presanis
B Efron
B Efron
B Matérn
B O’Hara
C Chew
CE Rasmussen
CE Rasmussen
D Lazer
DJ Smith
DK Duvenaud
DM Morens
DR Olson
Elad Yom-Tov
G Boivin
GJ Milinovich
GJD Smith
H Zou
Ingemar J. Cox
J Bollen
J Ginsberg
JG Petrie
KE Jones
M Baguelin
MA Oliver
MJ Paul
ML Cohen
MT Osterholm
N Cristianini
P Zhao
PM Polgreen
R Tibshirani
RG Pebody
Richard Pebody
S Binder
S Briand
S Cook
T Hastie
V Lampos
Vasileios Lampos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref