2,012 research outputs found
On the Sample Complexity of Predictive Sparse Coding
The goal of predictive sparse coding is to learn a representation of examples
as sparse linear combinations of elements from a dictionary, such that a
learned hypothesis linear in the new representation performs well on a
predictive task. Predictive sparse coding algorithms recently have demonstrated
impressive performance on a variety of supervised tasks, but their
generalization properties have not been studied. We establish the first
generalization error bounds for predictive sparse coding, covering two
settings: 1) the overcomplete setting, where the number of features k exceeds
the original dimensionality d; and 2) the high or infinite-dimensional setting,
where only dimension-free bounds are useful. Both learning bounds intimately
depend on stability properties of the learned sparse encoder, as measured on
the training sample. Consequently, we first present a fundamental stability
result for the LASSO, a result characterizing the stability of the sparse codes
with respect to perturbations to the dictionary. In the overcomplete setting,
we present an estimation error bound that decays as \tilde{O}(sqrt(d k/m)) with
respect to d and k. In the high or infinite-dimensional setting, we show a
dimension-free bound that is \tilde{O}(sqrt(k^2 s / m)) with respect to k and
s, where s is an upper bound on the number of non-zeros in the sparse code for
any training data point.Comment: Sparse Coding Stability Theorem from version 1 has been relaxed
considerably using a new notion of coding margin. Old Sparse Coding Stability
Theorem still in new version, now as Theorem 2. Presentation of all proofs
simplified/improved considerably. Paper reorganized. Empirical analysis
showing new coding margin is non-trivial on real dataset
On uniform definability of types over finite sets
In this paper, using definability of types over indiscernible sequences as a
template, we study a property of formulas and theories called "uniform
definability of types over finite sets" (UDTFS). We explore UDTFS and show how
it relates to well-known properties in model theory. We recall that stable
theories and weakly o-minimal theories have UDTFS and UDTFS implies dependence.
We then show that all dp-minimal theories have UDTFS.Comment: 17 pages, 0 figure
Local Rademacher complexities
We propose new bounds on the error of learning algorithms in terms of a
data-dependent notion of complexity. The estimates we establish give optimal
rates and are based on a local and empirical version of Rademacher averages, in
the sense that the Rademacher averages are computed from the data, on a subset
of functions with small empirical error. We present some applications to
classification and prediction with convex function classes, and with kernel
classes in particular.Comment: Published at http://dx.doi.org/10.1214/009053605000000282 in the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Earth Observations Division version of the Laboratory for Applications of Remote Sensing system (EOD-LARSYS) user guide for the IBM 370/148. Volume 2: User's reference manual
There are no author-identified significant results in this report
A library management information system in a multi-campus environment
The Office of Library Services in the Central Administration of the State
University of New York (SUNY) has, since 1975, been developing a library
management information system based on the analysis of library and other
bibliographic and academic data which are available in machine readable
form. Although primarily designed for the SUNY libraries, the processes
are applicable in other academic libraries because of the general availability
of the data used in the system. The task has changed over the years as
new ideas and opportunities were realized, as new appreciations of the
obtained results were attained, and as the technical environment has
evolved. Nonetheless, the fundamental structure of the system design has
not changed since the first ideas in 1974.
This is an interim report. Progress has been agonizingly slow for two
reasons. First, the difficulty of obtaining support and resources has been a
real hindrance; the work has been squeezed into overcrowded schedules
and ever-straitening budgets. Second, many of the machine-readable data
which one confidently felt would be available in the late 1970s or very early
1980s are still not available. Some years, at least, will pass before the work
can be completed as we see it now. Who knows what new ideas and
opportunities will emerge as new results become available? Nonetheless,
enough has been achieved to justify this report.published or submitted for publicatio
- …