29,509 research outputs found
Sample Complexity of Dictionary Learning and other Matrix Factorizations
Many modern tools in machine learning and signal processing, such as sparse
dictionary learning, principal component analysis (PCA), non-negative matrix
factorization (NMF), -means clustering, etc., rely on the factorization of a
matrix obtained by concatenating high-dimensional vectors from a training
collection. While the idealized task would be to optimize the expected quality
of the factors over the underlying distribution of training vectors, it is
achieved in practice by minimizing an empirical average over the considered
collection. The focus of this paper is to provide sample complexity estimates
to uniformly control how much the empirical average deviates from the expected
cost function. Standard arguments imply that the performance of the empirical
predictor also exhibit such guarantees. The level of genericity of the approach
encompasses several possible constraints on the factors (tensor product
structure, shift-invariance, sparsity \ldots), thus providing a unified
perspective on the sample complexity of several widely used matrix
factorization schemes. The derived generalization bounds behave proportional to
w.r.t.\ the number of samples for the considered matrix
factorization techniques.Comment: to appea
Efficient sphere-covering and converse measure concentration via generalized coding theorems
Suppose A is a finite set equipped with a probability measure P and let M be
a ``mass'' function on A. We give a probabilistic characterization of the most
efficient way in which A^n can be almost-covered using spheres of a fixed
radius. An almost-covering is a subset C_n of A^n, such that the union of the
spheres centered at the points of C_n has probability close to one with respect
to the product measure P^n. An efficient covering is one with small mass
M^n(C_n); n is typically large. With different choices for M and the geometry
on A our results give various corollaries as special cases, including Shannon's
data compression theorem, a version of Stein's lemma (in hypothesis testing),
and a new converse to some measure concentration inequalities on discrete
spaces. Under mild conditions, we generalize our results to abstract spaces and
non-product measures.Comment: 29 pages. See also http://www.stat.purdue.edu/~yiannis
Measuring cell adhesion forces with the atomic force microscope at the molecular level
In the past 25 years many techniques have been developed to characterize cell adhesion and to quantify adhesion forces. Atomic force microscopy (AFM) has been used to measure forces in the pico-newton range, an experimental technique known as force spectroscopy. We modified such an AFM to measure adhesion forces between live cells or between cells and surfaces. This strategy required functionalizing the surface of the sensors for immobilizing the cell. We used Dictyostelium discoideum cells which respond to starvation by surface expression of the adhesion molecule csA and consequent aggregation to measure the adhesion force of a single csA-csA bond. Relevant experimental parameters include the duration of contact between the interacting surfaces, the force against which this contact is maintained, the number and specificity of interacting adhesion molecules and the constituents of the medium in which the interaction occurs. This technology also permits the measurement of the viscoelastic properties of single cells or cell layers. Copyright (C) 2002 S, Karger AG, Basel
The Sample Complexity of Dictionary Learning
A large set of signals can sometimes be described sparsely using a
dictionary, that is, every element can be represented as a linear combination
of few elements from the dictionary. Algorithms for various signal processing
applications, including classification, denoising and signal separation, learn
a dictionary from a set of signals to be represented. Can we expect that the
representation found by such a dictionary for a previously unseen example from
the same source will have L_2 error of the same magnitude as those for the
given examples? We assume signals are generated from a fixed distribution, and
study this questions from a statistical learning theory perspective.
We develop generalization bounds on the quality of the learned dictionary for
two types of constraints on the coefficient selection, as measured by the
expected L_2 error in representation when the dictionary is used. For the case
of l_1 regularized coefficient selection we provide a generalization bound of
the order of O(sqrt(np log(m lambda)/m)), where n is the dimension, p is the
number of elements in the dictionary, lambda is a bound on the l_1 norm of the
coefficient vector and m is the number of samples, which complements existing
results. For the case of representing a new signal as a combination of at most
k dictionary elements, we provide a bound of the order O(sqrt(np log(m k)/m))
under an assumption on the level of orthogonality of the dictionary (low Babel
function). We further show that this assumption holds for most dictionaries in
high dimensions in a strong probabilistic sense. Our results further yield fast
rates of order 1/m as opposed to 1/sqrt(m) using localized Rademacher
complexity. We provide similar results in a general setting using kernels with
weak smoothness requirements
Almost Lossless Analog Compression without Phase Information
We propose an information-theoretic framework for phase retrieval.
Specifically, we consider the problem of recovering an unknown n-dimensional
vector x up to an overall sign factor from m=Rn phaseless measurements with
compression rate R and derive a general achievability bound for R.
Surprisingly, it turns out that this bound on the compression rate is the same
as the one for almost lossless analog compression obtained by Wu and Verd\'u
(2010): Phaseless linear measurements are as good as linear measurements with
full phase information in the sense that ignoring the sign of m measurements
only leaves us with an ambiguity with respect to an overall sign factor of x
A Remark on Unified Error Exponents: Hypothesis Testing, Data Compression and Measure Concentration
Let A be finite set equipped with a probability distribution P, and let M be a âmassâ function on A. A characterization is given for the most efficient way in which A n can be covered using spheres of a fixed radius. A covering is a subset C n of A n with the property that most of the elements of A n are within some fixed distance from at least one element of C n , and âmost of the elementsâ means a set whose probability is exponentially close to one (with respect to the product distribution P n ). An efficient covering is one with small mass M n (C n ). With different choices for the geometry on A, this characterization gives various corollaries as special cases, including Martonâs error-exponents theorem in lossy data compression, Hoeffdingâs optimal hypothesis testing exponents, and a new sharp converse to some measure concentration inequalities on discrete spaces
- âŠ