7,240 research outputs found
Concentration and Confidence for Discrete Bayesian Sequence Predictors
Bayesian sequence prediction is a simple technique for predicting future
symbols sampled from an unknown measure on infinite sequences over a countable
alphabet. While strong bounds on the expected cumulative error are known, there
are only limited results on the distribution of this error. We prove tight
high-probability bounds on the cumulative error, which is measured in terms of
the Kullback-Leibler (KL) divergence. We also consider the problem of
constructing upper confidence bounds on the KL and Hellinger errors similar to
those constructed from Hoeffding-like bounds in the i.i.d. case. The new
results are applied to show that Bayesian sequence prediction can be used in
the Knows What It Knows (KWIK) framework with bounds that match the
state-of-the-art.Comment: 17 page
Inferring hidden Markov models from noisy time sequences: a method to alleviate degeneracy in molecular dynamics
We present a new method for inferring hidden Markov models from noisy time
sequences without the necessity of assuming a model architecture, thus allowing
for the detection of degenerate states. This is based on the statistical
prediction techniques developed by Crutchfield et al., and generates so called
causal state models, equivalent to hidden Markov models. This method is
applicable to any continuous data which clusters around discrete values and
exhibits multiple transitions between these values such as tethered particle
motion data or Fluorescence Resonance Energy Transfer (FRET) spectra. The
algorithms developed have been shown to perform well on simulated data,
demonstrating the ability to recover the model used to generate the data under
high noise, sparse data conditions and the ability to infer the existence of
degenerate states. They have also been applied to new experimental FRET data of
Holliday Junction dynamics, extracting the expected two state model and
providing values for the transition rates in good agreement with previous
results and with results obtained using existing maximum likelihood based
methods.Comment: 19 pages, 9 figure
Prediction of time series by statistical learning: general losses and fast rates
We establish rates of convergences in time series forecasting using the
statistical learning approach based on oracle inequalities. A series of papers
extends the oracle inequalities obtained for iid observations to time series
under weak dependence conditions. Given a family of predictors and
observations, oracle inequalities state that a predictor forecasts the series
as well as the best predictor in the family up to a remainder term .
Using the PAC-Bayesian approach, we establish under weak dependence conditions
oracle inequalities with optimal rates of convergence. We extend previous
results for the absolute loss function to any Lipschitz loss function with
rates where measures the
complexity of the model. We apply the method for quantile loss functions to
forecast the french GDP. Under additional conditions on the loss functions
(satisfied by the quadratic loss function) and on the time series, we refine
the rates of convergence to . We achieve for the
first time these fast rates for uniformly mixing processes. These rates are
known to be optimal in the iid case and for individual sequences. In
particular, we generalize the results of Dalalyan and Tsybakov on sparse
regression estimation to the case of autoregression
Recent advances in directional statistics
Mainstream statistical methodology is generally applicable to data observed
in Euclidean space. There are, however, numerous contexts of considerable
scientific interest in which the natural supports for the data under
consideration are Riemannian manifolds like the unit circle, torus, sphere and
their extensions. Typically, such data can be represented using one or more
directions, and directional statistics is the branch of statistics that deals
with their analysis. In this paper we provide a review of the many recent
developments in the field since the publication of Mardia and Jupp (1999),
still the most comprehensive text on directional statistics. Many of those
developments have been stimulated by interesting applications in fields as
diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics,
image analysis, text mining, environmetrics, and machine learning. We begin by
considering developments for the exploratory analysis of directional data
before progressing to distributional models, general approaches to inference,
hypothesis testing, regression, nonparametric curve estimation, methods for
dimension reduction, classification and clustering, and the modelling of time
series, spatial and spatio-temporal data. An overview of currently available
software for analysing directional data is also provided, and potential future
developments discussed.Comment: 61 page
Bayesian Synthesis: Combining subjective analyses, with an application to ozone data
Bayesian model averaging enables one to combine the disparate predictions of
a number of models in a coherent fashion, leading to superior predictive
performance. The improvement in performance arises from averaging models that
make different predictions. In this work, we tap into perhaps the biggest
driver of different predictions---different analysts---in order to gain the
full benefits of model averaging. In a standard implementation of our method,
several data analysts work independently on portions of a data set, eliciting
separate models which are eventually updated and combined through a specific
weighting method. We call this modeling procedure Bayesian Synthesis. The
methodology helps to alleviate concerns about the sizable gap between the
foundational underpinnings of the Bayesian paradigm and the practice of
Bayesian statistics. In experimental work we show that human modeling has
predictive performance superior to that of many automatic modeling techniques,
including AIC, BIC, Smoothing Splines, CART, Bagged CART, Bayes CART, BMA and
LARS, and only slightly inferior to that of BART. We also show that Bayesian
Synthesis further improves predictive performance. Additionally, we examine the
predictive performance of a simple average across analysts, which we dub Convex
Synthesis, and find that it also produces an improvement.Comment: Published in at http://dx.doi.org/10.1214/10-AOAS444 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …