29,509 research outputs found

    Sample Complexity of Dictionary Learning and other Matrix Factorizations

    Get PDF
    Many modern tools in machine learning and signal processing, such as sparse dictionary learning, principal component analysis (PCA), non-negative matrix factorization (NMF), KK-means clustering, etc., rely on the factorization of a matrix obtained by concatenating high-dimensional vectors from a training collection. While the idealized task would be to optimize the expected quality of the factors over the underlying distribution of training vectors, it is achieved in practice by minimizing an empirical average over the considered collection. The focus of this paper is to provide sample complexity estimates to uniformly control how much the empirical average deviates from the expected cost function. Standard arguments imply that the performance of the empirical predictor also exhibit such guarantees. The level of genericity of the approach encompasses several possible constraints on the factors (tensor product structure, shift-invariance, sparsity \ldots), thus providing a unified perspective on the sample complexity of several widely used matrix factorization schemes. The derived generalization bounds behave proportional to log⁥(n)/n\sqrt{\log(n)/n} w.r.t.\ the number of samples nn for the considered matrix factorization techniques.Comment: to appea

    Efficient sphere-covering and converse measure concentration via generalized coding theorems

    Full text link
    Suppose A is a finite set equipped with a probability measure P and let M be a ``mass'' function on A. We give a probabilistic characterization of the most efficient way in which A^n can be almost-covered using spheres of a fixed radius. An almost-covering is a subset C_n of A^n, such that the union of the spheres centered at the points of C_n has probability close to one with respect to the product measure P^n. An efficient covering is one with small mass M^n(C_n); n is typically large. With different choices for M and the geometry on A our results give various corollaries as special cases, including Shannon's data compression theorem, a version of Stein's lemma (in hypothesis testing), and a new converse to some measure concentration inequalities on discrete spaces. Under mild conditions, we generalize our results to abstract spaces and non-product measures.Comment: 29 pages. See also http://www.stat.purdue.edu/~yiannis

    Measuring cell adhesion forces with the atomic force microscope at the molecular level

    Get PDF
    In the past 25 years many techniques have been developed to characterize cell adhesion and to quantify adhesion forces. Atomic force microscopy (AFM) has been used to measure forces in the pico-newton range, an experimental technique known as force spectroscopy. We modified such an AFM to measure adhesion forces between live cells or between cells and surfaces. This strategy required functionalizing the surface of the sensors for immobilizing the cell. We used Dictyostelium discoideum cells which respond to starvation by surface expression of the adhesion molecule csA and consequent aggregation to measure the adhesion force of a single csA-csA bond. Relevant experimental parameters include the duration of contact between the interacting surfaces, the force against which this contact is maintained, the number and specificity of interacting adhesion molecules and the constituents of the medium in which the interaction occurs. This technology also permits the measurement of the viscoelastic properties of single cells or cell layers. Copyright (C) 2002 S, Karger AG, Basel

    The Sample Complexity of Dictionary Learning

    Full text link
    A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a set of signals to be represented. Can we expect that the representation found by such a dictionary for a previously unseen example from the same source will have L_2 error of the same magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study this questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L_2 error in representation when the dictionary is used. For the case of l_1 regularized coefficient selection we provide a generalization bound of the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the number of elements in the dictionary, lambda is a bound on the l_1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m)) under an assumption on the level of orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results further yield fast rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher complexity. We provide similar results in a general setting using kernels with weak smoothness requirements

    Almost Lossless Analog Compression without Phase Information

    Full text link
    We propose an information-theoretic framework for phase retrieval. Specifically, we consider the problem of recovering an unknown n-dimensional vector x up to an overall sign factor from m=Rn phaseless measurements with compression rate R and derive a general achievability bound for R. Surprisingly, it turns out that this bound on the compression rate is the same as the one for almost lossless analog compression obtained by Wu and Verd\'u (2010): Phaseless linear measurements are as good as linear measurements with full phase information in the sense that ignoring the sign of m measurements only leaves us with an ambiguity with respect to an overall sign factor of x

    A Remark on Unified Error Exponents: Hypothesis Testing, Data Compression and Measure Concentration

    Get PDF
    Let A be finite set equipped with a probability distribution P, and let M be a “mass” function on A. A characterization is given for the most efficient way in which A n can be covered using spheres of a fixed radius. A covering is a subset C n of A n with the property that most of the elements of A n are within some fixed distance from at least one element of C n , and “most of the elements” means a set whose probability is exponentially close to one (with respect to the product distribution P n ). An efficient covering is one with small mass M n (C n ). With different choices for the geometry on A, this characterization gives various corollaries as special cases, including Marton’s error-exponents theorem in lossy data compression, Hoeffding’s optimal hypothesis testing exponents, and a new sharp converse to some measure concentration inequalities on discrete spaces
    • 

    corecore