13,095 research outputs found

    Fast rates in statistical and online learning

    Get PDF
    The speed with which a learning algorithm converges as it is presented with more data is a central problem in machine learning --- a fast rate of convergence means less data is needed for the same level of performance. The pursuit of fast rates in online and statistical learning has led to the discovery of many conditions in learning theory under which fast learning is possible. We show that most of these conditions are special cases of a single, unifying condition, that comes in two forms: the central condition for 'proper' learning algorithms that always output a hypothesis in the given model, and stochastic mixability for online algorithms that may make predictions outside of the model. We show that under surprisingly weak assumptions both conditions are, in a certain sense, equivalent. The central condition has a re-interpretation in terms of convexity of a set of pseudoprobabilities, linking it to density estimation under misspecification. For bounded losses, we show how the central condition enables a direct proof of fast rates and we prove its equivalence to the Bernstein condition, itself a generalization of the Tsybakov margin condition, both of which have played a central role in obtaining fast rates in statistical learning. Yet, while the Bernstein condition is two-sided, the central condition is one-sided, making it more suitable to deal with unbounded losses. In its stochastic mixability form, our condition generalizes both a stochastic exp-concavity condition identified by Juditsky, Rigollet and Tsybakov and Vovk's notion of mixability. Our unifying conditions thus provide a substantial step towards a characterization of fast rates in statistical learning, similar to how classical mixability characterizes constant regret in the sequential prediction with expert advice setting.Comment: 69 pages, 3 figure

    Asymptotic equivalence and adaptive estimation for robust nonparametric regression

    Get PDF
    Asymptotic equivalence theory developed in the literature so far are only for bounded loss functions. This limits the potential applications of the theory because many commonly used loss functions in statistical inference are unbounded. In this paper we develop asymptotic equivalence results for robust nonparametric regression with unbounded loss functions. The results imply that all the Gaussian nonparametric regression procedures can be robustified in a unified way. A key step in our equivalence argument is to bin the data and then take the median of each bin. The asymptotic equivalence results have significant practical implications. To illustrate the general principles of the equivalence argument we consider two important nonparametric inference problems: robust estimation of the regression function and the estimation of a quadratic functional. In both cases easily implementable procedures are constructed and are shown to enjoy simultaneously a high degree of robustness and adaptivity. Other problems such as construction of confidence sets and nonparametric hypothesis testing can be handled in a similar fashion.Comment: Published in at http://dx.doi.org/10.1214/08-AOS681 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A Tight Excess Risk Bound via a Unified PAC-Bayesian-Rademacher-Shtarkov-MDL Complexity

    Get PDF
    We present a novel notion of complexity that interpolates between and generalizes some classic existing complexity notions in learning theory: for estimators like empirical risk minimization (ERM) with arbitrary bounded losses, it is upper bounded in terms of data-independent Rademacher complexity; for generalized Bayesian estimators, it is upper bounded by the data-dependent information complexity (also known as stochastic or PAC-Bayesian, KL(posteriorprior)\mathrm{KL}(\text{posterior} \operatorname{\|} \text{prior}) complexity. For (penalized) ERM, the new complexity reduces to (generalized) normalized maximum likelihood (NML) complexity, i.e. a minimax log-loss individual-sequence regret. Our first main result bounds excess risk in terms of the new complexity. Our second main result links the new complexity via Rademacher complexity to L2(P)L_2(P) entropy, thereby generalizing earlier results of Opper, Haussler, Lugosi, and Cesa-Bianchi who did the log-loss case with LL_\infty. Together, these results recover optimal bounds for VC- and large (polynomial entropy) classes, replacing localized Rademacher complexity by a simpler analysis which almost completely separates the two aspects that determine the achievable rates: 'easiness' (Bernstein) conditions and model complexity.Comment: 38 page

    Wireless model-based predictive networked control system over cooperative wireless network

    Get PDF
    Owing to their distributed architecture, networked control systems (NCSs) are proven to be feasible in scenarios where a spatially distributed feedback control system is required. Traditionally, such NCSs operate over real-time wired networks. Recently, in order to achieve the utmost flexibility, scalability, ease of deployment, and maintainability, wireless networks such as IEEE 802.11 wireless local area networks (LANs) are being preferred over dedicated wired networks. However, conventional NCSs with event-triggered controllers and actuators cannot operate over such general purpose wireless networks since the stability of the system is compromised due to unbounded delays and unpredictable packet losses that are typical in the wireless medium. Approaching the wireless networked control problem from two perspectives, this work introduces a practical wireless NCS and an implementation of a cooperative medium access control protocol that work jointly to achieve decent control under severe impairments, such as unbounded delay, bursts of packet loss and ambient wireless traffic. The proposed system is evaluated on a dedicated test platform under numerous scenarios and significant performance gains are observed, making cooperative communications a strong candidate for improving the reliability of industrial wireless networks

    Fast learning rates in statistical inference through aggregation

    Get PDF
    We develop minimax optimal risk bounds for the general learning task consisting in predicting as well as the best function in a reference set G\mathcal{G} up to the smallest possible additive term, called the convergence rate. When the reference set is finite and when nn denotes the size of the training data, we provide minimax convergence rates of the form C(logGn)vC(\frac{\log|\mathcal{G}|}{n})^v with tight evaluation of the positive constant CC and with exact 0<v10<v\le1, the latter value depending on the convexity of the loss function and on the level of noise in the output distribution. The risk upper bounds are based on a sequential randomized algorithm, which at each step concentrates on functions having both low risk and low variance with respect to the previous step prediction function. Our analysis puts forward the links between the probabilistic and worst-case viewpoints, and allows to obtain risk bounds unachievable with the standard statistical learning approach. One of the key ideas of this work is to use probabilistic inequalities with respect to appropriate (Gibbs) distributions on the prediction function space instead of using them with respect to the distribution generating the data. The risk lower bounds are based on refinements of the Assouad lemma taking particularly into account the properties of the loss function. Our key example to illustrate the upper and lower bounds is to consider the LqL_q-regression setting for which an exhaustive analysis of the convergence rates is given while qq ranges in [1;+[[1;+\infty[.Comment: Published in at http://dx.doi.org/10.1214/08-AOS623 the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Dynamic Metric Learning from Pairwise Comparisons

    Full text link
    Recent work in distance metric learning has focused on learning transformations of data that best align with specified pairwise similarity and dissimilarity constraints, often supplied by a human observer. The learned transformations lead to improved retrieval, classification, and clustering algorithms due to the better adapted distance or similarity measures. Here, we address the problem of learning these transformations when the underlying constraint generation process is nonstationary. This nonstationarity can be due to changes in either the ground-truth clustering used to generate constraints or changes in the feature subspaces in which the class structure is apparent. We propose Online Convex Ensemble StrongLy Adaptive Dynamic Learning (OCELAD), a general adaptive, online approach for learning and tracking optimal metrics as they change over time that is highly robust to a variety of nonstationary behaviors in the changing metric. We apply the OCELAD framework to an ensemble of online learners. Specifically, we create a retro-initialized composite objective mirror descent (COMID) ensemble (RICE) consisting of a set of parallel COMID learners with different learning rates, demonstrate RICE-OCELAD on both real and synthetic data sets and show significant performance improvements relative to previously proposed batch and online distance metric learning algorithms.Comment: to appear Allerton 2016. arXiv admin note: substantial text overlap with arXiv:1603.0367

    Damping mechanisms for oscillations in solar prominences

    Full text link
    Small amplitude oscillations are a commonly observed feature in prominences/filaments. These oscillations appear to be of local nature, are associated to the fine structure of prominence plasmas, and simultaneous flows and counterflows are also present. The existing observational evidence reveals that small amplitude oscillations, after excited, are damped in short spatial and temporal scales by some as yet not well determined physical mechanism(s). Commonly, these oscillations have been interpreted in terms of linear magnetohydrodynamic (MHD) waves, and this paper reviews the theoretical damping mechanisms that have been recently put forward in order to explain the observed attenuation scales. These mechanisms include thermal effects, through non-adiabatic processes, mass flows, resonant damping in non-uniform media, and partial ionization effects. The relevance of each mechanism is assessed by comparing the spatial and time scales produced by each of them with those obtained from observations. Also, the application of the latest theoretical results to perform prominence seismology is discussed, aiming to determine physical parameters in prominence plasmas that are difficult to measure by direct means.Comment: 36 pages, 16 figures, Space Science Reviews (accepted
    corecore