11,026 research outputs found

    Characterizing predictable classes of processes

    Get PDF
    The problem is sequence prediction in the following setting. A sequence x1,...,xn,...x_1,...,x_n,... of discrete-valued observations is generated according to some unknown probabilistic law (measure) μ\mu. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure μ\mu belongs to an arbitrary class \C of stochastic processes. We are interested in predictors ρ\rho whose conditional probabilities converge to the "true" μ\mu-conditional probabilities if any \mu\in\C is chosen to generate the data. We show that if such a predictor exists, then a predictor can also be obtained as a convex combination of a countably many elements of \C. In other words, it can be obtained as a Bayesian predictor whose prior is concentrated on a countable set. This result is established for two very different measures of performance of prediction, one of which is very strong, namely, total variation, and the other is very weak, namely, prediction in expected average Kullback-Leibler divergence

    Universality of Bayesian mixture predictors

    Full text link
    The problem is that of sequential probability forecasting for finite-valued time series. The data is generated by an unknown probability distribution over the space of all one-way infinite sequences. It is known that this measure belongs to a given set C, but the latter is completely arbitrary (uncountably infinite, without any structure given). The performance is measured with asymptotic average log loss. In this work it is shown that the minimax asymptotic performance is always attainable, and it is attained by a convex combination of a countably many measures from the set C (a Bayesian mixture). This was previously only known for the case when the best achievable asymptotic error is 0. This also contrasts previous results that show that in the non-realizable case all Bayesian mixtures may be suboptimal, while there is a predictor that achieves the optimal performance

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

    On Finding Predictors for Arbitrary Families of Processes

    Get PDF
    International audienceThe problem is sequence prediction in the following setting. A sequence x1,,xn,x_1,\dots,x_n,\dots of discrete-valued observations is generated according to some unknown probabilistic law (measure) μ\mu. After observing each outcome, it is required to give the conditional probabilities of the next observation. The measure μ\mu belongs to an arbitrary but known class CC of stochastic process measures. We are interested in predictors ρ\rho whose conditional probabilities converge (in some sense) to the ``true'' μ\mu-conditional probabilities if any μC\mu\in C is chosen to generate the sequence. The contribution of this work is in characterizing the families CC for which such predictors exist, and in providing a specific and simple form in which to look for a solution. We show that if any predictor works, then there exists a Bayesian predictor, whose prior is discrete, and which works too. We also find several sufficient and necessary conditions for the existence of a predictor, in terms of topological characterizations of the family CC, as well as in terms of local behaviour of the measures in CC, which in some cases lead to procedures for constructing such predictors. It should be emphasized that the framework is completely general: the stochastic processes considered are not required to be i.i.d., stationary, or to belong to any parametric or countable family

    Extrapolation of Stationary Random Fields

    Full text link
    We introduce basic statistical methods for the extrapolation of stationary random fields. For square integrable fields, we set out basics of the kriging extrapolation techniques. For (non--Gaussian) stable fields, which are known to be heavy tailed, we describe further extrapolation methods and discuss their properties. Two of them can be seen as direct generalizations of kriging.Comment: 52 pages, 25 figures. This is a review article, though Section 4 of the article contains new results on the weak consistency of the extrapolation methods as well as new extrapolation methods for α\alpha-stable fields with $0<\alpha\leq 1

    Aggregation of predictors for nonstationary sub-linear processes and online adaptive forecasting of time varying autoregressive processes

    Full text link
    In this work, we study the problem of aggregating a finite number of predictors for nonstationary sub-linear processes. We provide oracle inequalities relying essentially on three ingredients: (1) a uniform bound of the 1\ell^1 norm of the time varying sub-linear coefficients, (2) a Lipschitz assumption on the predictors and (3) moment conditions on the noise appearing in the linear representation. Two kinds of aggregations are considered giving rise to different moment conditions on the noise and more or less sharp oracle inequalities. We apply this approach for deriving an adaptive predictor for locally stationary time varying autoregressive (TVAR) processes. It is obtained by aggregating a finite number of well chosen predictors, each of them enjoying an optimal minimax convergence rate under specific smoothness conditions on the TVAR coefficients. We show that the obtained aggregated predictor achieves a minimax rate while adapting to the unknown smoothness. To prove this result, a lower bound is established for the minimax rate of the prediction risk for the TVAR process. Numerical experiments complete this study. An important feature of this approach is that the aggregated predictor can be computed recursively and is thus applicable in an online prediction context.Comment: Published at http://dx.doi.org/10.1214/15-AOS1345 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org
    corecore