31,948 research outputs found
A Latent Source Model for Nonparametric Time Series Classification
For classifying time series, a nearest-neighbor approach is widely used in
practice with performance often competitive with or better than more elaborate
methods such as neural networks, decision trees, and support vector machines.
We develop theoretical justification for the effectiveness of
nearest-neighbor-like classification of time series. Our guiding hypothesis is
that in many applications, such as forecasting which topics will become trends
on Twitter, there aren't actually that many prototypical time series to begin
with, relative to the number of time series we have access to, e.g., topics
become trends on Twitter only in a few distinct manners whereas we can collect
massive amounts of Twitter data. To operationalize this hypothesis, we propose
a latent source model for time series, which naturally leads to a "weighted
majority voting" classification rule that can be approximated by a
nearest-neighbor classifier. We establish nonasymptotic performance guarantees
of both weighted majority voting and nearest-neighbor classification under our
model accounting for how much of the time series we observe and the model
complexity. Experimental results on synthetic data show weighted majority
voting achieving the same misclassification rate as nearest-neighbor
classification while observing less of the time series. We then use weighted
majority to forecast which news topics on Twitter become trends, where we are
able to detect such "trending topics" in advance of Twitter 79% of the time,
with a mean early advantage of 1 hour and 26 minutes, a true positive rate of
95%, and a false positive rate of 4%.Comment: Advances in Neural Information Processing Systems (NIPS 2013
Prediction of infectious disease epidemics via weighted density ensembles
Accurate and reliable predictions of infectious disease dynamics can be
valuable to public health organizations that plan interventions to decrease or
prevent disease transmission. A great variety of models have been developed for
this task, using different model structures, covariates, and targets for
prediction. Experience has shown that the performance of these models varies;
some tend to do better or worse in different seasons or at different points
within a season. Ensemble methods combine multiple models to obtain a single
prediction that leverages the strengths of each model. We considered a range of
ensemble methods that each form a predictive density for a target of interest
as a weighted sum of the predictive densities from component models. In the
simplest case, equal weight is assigned to each component model; in the most
complex case, the weights vary with the region, prediction target, week of the
season when the predictions are made, a measure of component model uncertainty,
and recent observations of disease incidence. We applied these methods to
predict measures of influenza season timing and severity in the United States,
both at the national and regional levels, using three component models. We
trained the models on retrospective predictions from 14 seasons (1997/1998 -
2010/2011) and evaluated each model's prospective, out-of-sample performance in
the five subsequent influenza seasons. In this test phase, the ensemble methods
showed overall performance that was similar to the best of the component
models, but offered more consistent performance across seasons than the
component models. Ensemble methods offer the potential to deliver more reliable
predictions to public health decision makers.Comment: 20 pages, 6 figure
- …