10,150 research outputs found
Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance
We present a machine learning-based methodology capable of providing
real-time ("nowcast") and forecast estimates of influenza activity in the US by
leveraging data from multiple data sources including: Google searches, Twitter
microblogs, nearly real-time hospital visit records, and data from a
participatory surveillance system. Our main contribution consists of combining
multiple influenza-like illnesses (ILI) activity estimates, generated
independently with each data source, into a single prediction of ILI utilizing
machine learning ensemble approaches. Our methodology exploits the information
in each data source and produces accurate weekly ILI predictions for up to four
weeks ahead of the release of CDC's ILI reports. We evaluate the predictive
ability of our ensemble approach during the 2013-2014 (retrospective) and
2014-2015 (live) flu seasons for each of the four weekly time horizons. Our
ensemble approach demonstrates several advantages: (1) our ensemble method's
predictions outperform every prediction using each data source independently,
(2) our methodology can produce predictions one week ahead of GFT's real-time
estimates with comparable accuracy, and (3) our two and three week forecast
estimates have comparable accuracy to real-time predictions using an
autoregressive model. Moreover, our results show that considerable insight is
gained from incorporating disparate data streams, in the form of social media
and crowd sourced data, into influenza predictions in all time horizon
Structured Sparse Modelling with Hierarchical GP
In this paper a new Bayesian model for sparse linear regression with a
spatio-temporal structure is proposed. It incorporates the structural
assumptions based on a hierarchical Gaussian process prior for spike and slab
coefficients. We design an inference algorithm based on Expectation Propagation
and evaluate the model over the real data.Comment: SPARS 201
Multivariate Bayesian Machine Learning Regression for Operation and Management of Multiple Reservoir, Irrigation Canal, and River Systems
The principal objective of this dissertation is to develop Bayesian machine learning models for multiple reservoir, irrigation canal, and river system operation and management. These types of models are derived from the emerging area of machine learning theory; they are characterized by their ability to capture the underlying physics of the system simply by examination of the measured system inputs and outputs. They can be used to provide probabilistic predictions of system behavior using only historical data. The models were developed in the form of a multivariate relevance vector machine (MVRVM) that is based on a sparse Bayesian learning machine approach for regression. Using this Bayesian approach, a predictive confidence interval is obtained from the model that captures the uncertainty of both the model and the data. The models were applied to the multiple reservoir, canal and river system located in the regulated Lower Sevier River Basin in Utah. The models were developed to perform predictions of multi-time-ahead releases of multiple reservoirs, diversions of multiple canals, and streamflow and water loss/gain in a river system. This research represents the first attempt to use a multivariate Bayesian learning regression approach to develop simultaneous multi-step-ahead predictions with predictive confidence intervals for multiple outputs in a regulated river basin system. These predictions will be of potential value to reservoir and canal operators in identifying the best decisions for operation and management of irrigation water supply systems
Boosting Classifiers for Drifting Concepts
This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --
- …