10,150 research outputs found

    Combining Search, Social Media, and Traditional Data Sources to Improve Influenza Surveillance

    Full text link
    We present a machine learning-based methodology capable of providing real-time ("nowcast") and forecast estimates of influenza activity in the US by leveraging data from multiple data sources including: Google searches, Twitter microblogs, nearly real-time hospital visit records, and data from a participatory surveillance system. Our main contribution consists of combining multiple influenza-like illnesses (ILI) activity estimates, generated independently with each data source, into a single prediction of ILI utilizing machine learning ensemble approaches. Our methodology exploits the information in each data source and produces accurate weekly ILI predictions for up to four weeks ahead of the release of CDC's ILI reports. We evaluate the predictive ability of our ensemble approach during the 2013-2014 (retrospective) and 2014-2015 (live) flu seasons for each of the four weekly time horizons. Our ensemble approach demonstrates several advantages: (1) our ensemble method's predictions outperform every prediction using each data source independently, (2) our methodology can produce predictions one week ahead of GFT's real-time estimates with comparable accuracy, and (3) our two and three week forecast estimates have comparable accuracy to real-time predictions using an autoregressive model. Moreover, our results show that considerable insight is gained from incorporating disparate data streams, in the form of social media and crowd sourced data, into influenza predictions in all time horizon

    Structured Sparse Modelling with Hierarchical GP

    Get PDF
    In this paper a new Bayesian model for sparse linear regression with a spatio-temporal structure is proposed. It incorporates the structural assumptions based on a hierarchical Gaussian process prior for spike and slab coefficients. We design an inference algorithm based on Expectation Propagation and evaluate the model over the real data.Comment: SPARS 201

    Multivariate Bayesian Machine Learning Regression for Operation and Management of Multiple Reservoir, Irrigation Canal, and River Systems

    Get PDF
    The principal objective of this dissertation is to develop Bayesian machine learning models for multiple reservoir, irrigation canal, and river system operation and management. These types of models are derived from the emerging area of machine learning theory; they are characterized by their ability to capture the underlying physics of the system simply by examination of the measured system inputs and outputs. They can be used to provide probabilistic predictions of system behavior using only historical data. The models were developed in the form of a multivariate relevance vector machine (MVRVM) that is based on a sparse Bayesian learning machine approach for regression. Using this Bayesian approach, a predictive confidence interval is obtained from the model that captures the uncertainty of both the model and the data. The models were applied to the multiple reservoir, canal and river system located in the regulated Lower Sevier River Basin in Utah. The models were developed to perform predictions of multi-time-ahead releases of multiple reservoirs, diversions of multiple canals, and streamflow and water loss/gain in a river system. This research represents the first attempt to use a multivariate Bayesian learning regression approach to develop simultaneous multi-step-ahead predictions with predictive confidence intervals for multiple outputs in a regulated river basin system. These predictions will be of potential value to reservoir and canal operators in identifying the best decisions for operation and management of irrigation water supply systems

    Boosting Classifiers for Drifting Concepts

    Get PDF
    This paper proposes a boosting-like method to train a classifier ensemble from data streams. It naturally adapts to concept drift and allows to quantify the drift in terms of its base learners. The algorithm is empirically shown to outperform learning algorithms that ignore concept drift. It performs no worse than advanced adaptive time window and example selection strategies that store all the data and are thus not suited for mining massive streams. --
    • …
    corecore