2,012 research outputs found

    On the Sample Complexity of Predictive Sparse Coding

    Full text link
    The goal of predictive sparse coding is to learn a representation of examples as sparse linear combinations of elements from a dictionary, such that a learned hypothesis linear in the new representation performs well on a predictive task. Predictive sparse coding algorithms recently have demonstrated impressive performance on a variety of supervised tasks, but their generalization properties have not been studied. We establish the first generalization error bounds for predictive sparse coding, covering two settings: 1) the overcomplete setting, where the number of features k exceeds the original dimensionality d; and 2) the high or infinite-dimensional setting, where only dimension-free bounds are useful. Both learning bounds intimately depend on stability properties of the learned sparse encoder, as measured on the training sample. Consequently, we first present a fundamental stability result for the LASSO, a result characterizing the stability of the sparse codes with respect to perturbations to the dictionary. In the overcomplete setting, we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with respect to d and k. In the high or infinite-dimensional setting, we show a dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and s, where s is an upper bound on the number of non-zeros in the sparse code for any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed considerably using a new notion of coding margin. Old Sparse Coding Stability Theorem still in new version, now as Theorem 2. Presentation of all proofs simplified/improved considerably. Paper reorganized. Empirical analysis showing new coding margin is non-trivial on real dataset

    On uniform definability of types over finite sets

    Full text link
    In this paper, using definability of types over indiscernible sequences as a template, we study a property of formulas and theories called "uniform definability of types over finite sets" (UDTFS). We explore UDTFS and show how it relates to well-known properties in model theory. We recall that stable theories and weakly o-minimal theories have UDTFS and UDTFS implies dependence. We then show that all dp-minimal theories have UDTFS.Comment: 17 pages, 0 figure

    Local Rademacher complexities

    Full text link
    We propose new bounds on the error of learning algorithms in terms of a data-dependent notion of complexity. The estimates we establish give optimal rates and are based on a local and empirical version of Rademacher averages, in the sense that the Rademacher averages are computed from the data, on a subset of functions with small empirical error. We present some applications to classification and prediction with convex function classes, and with kernel classes in particular.Comment: Published at http://dx.doi.org/10.1214/009053605000000282 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

    A library management information system in a multi-campus environment

    Get PDF
    The Office of Library Services in the Central Administration of the State University of New York (SUNY) has, since 1975, been developing a library management information system based on the analysis of library and other bibliographic and academic data which are available in machine readable form. Although primarily designed for the SUNY libraries, the processes are applicable in other academic libraries because of the general availability of the data used in the system. The task has changed over the years as new ideas and opportunities were realized, as new appreciations of the obtained results were attained, and as the technical environment has evolved. Nonetheless, the fundamental structure of the system design has not changed since the first ideas in 1974. This is an interim report. Progress has been agonizingly slow for two reasons. First, the difficulty of obtaining support and resources has been a real hindrance; the work has been squeezed into overcrowded schedules and ever-straitening budgets. Second, many of the machine-readable data which one confidently felt would be available in the late 1970s or very early 1980s are still not available. Some years, at least, will pass before the work can be completed as we see it now. Who knows what new ideas and opportunities will emerge as new results become available? Nonetheless, enough has been achieved to justify this report.published or submitted for publicatio
    • …
    corecore