515 research outputs found
Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework
The support recovery problem consists of determining a sparse subset of a set
of variables that is relevant in generating a set of observations, and arises
in a diverse range of settings such as compressive sensing, and subset
selection in regression, and group testing. In this paper, we take a unified
approach to support recovery problems, considering general probabilistic models
relating a sparse data vector to an observation vector. We study the
information-theoretic limits of both exact and partial support recovery, taking
a novel approach motivated by thresholding techniques in channel coding. We
provide general achievability and converse bounds characterizing the trade-off
between the error probability and number of measurements, and we specialize
these to the linear, 1-bit, and group testing models. In several cases, our
bounds not only provide matching scaling laws in the necessary and sufficient
number of measurements, but also sharp thresholds with matching constant
factors. Our approach has several advantages over previous approaches: For the
achievability part, we obtain sharp thresholds under broader scalings of the
sparsity level and other parameters (e.g., signal-to-noise ratio) compared to
several previous works, and for the converse part, we not only provide
conditions under which the error probability fails to vanish, but also
conditions under which it tends to one.Comment: Accepted to IEEE Transactions on Information Theory; presented in
part at ISIT 2015 and SODA 201
Kernel Conjugate Gradient Methods with Random Projections
We propose and study kernel conjugate gradient methods (KCGM) with random
projections for least-squares regression over a separable Hilbert space.
Considering two types of random projections generated by randomized sketches
and Nystr\"{o}m subsampling, we prove optimal statistical results with respect
to variants of norms for the algorithms under a suitable stopping rule.
Particularly, our results show that if the projection dimension is proportional
to the effective dimension of the problem, KCGM with randomized sketches can
generalize optimally, while achieving a computational advantage. As a
corollary, we derive optimal rates for classic KCGM in the case that the target
function may not be in the hypothesis space, filling a theoretical gap.Comment: 43 pages, 2 figure
Phase Transitions in the Pooled Data Problem
In this paper, we study the pooled data problem of identifying the labels
associated with a large collection of items, based on a sequence of pooled
tests revealing the counts of each label within the pool. In the noiseless
setting, we identify an exact asymptotic threshold on the required number of
tests with optimal decoding, and prove a phase transition between complete
success and complete failure. In addition, we present a novel noisy variation
of the problem, and provide an information-theoretic framework for
characterizing the required number of tests for general random noise models.
Our results reveal that noise can make the problem considerably more difficult,
with strict increases in the scaling laws even at low noise levels. Finally, we
demonstrate similar behavior in an approximate recovery setting, where a given
number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201
Near-Optimal Noisy Group Testing via Separate Decoding of Items
The group testing problem consists of determining a small set of defective
items from a larger set of items based on a number of tests, and is relevant in
applications such as medical testing, communication protocols, pattern
matching, and more. In this paper, we revisit an efficient algorithm for noisy
group testing in which each item is decoded separately (Malyutov and Mateev,
1980), and develop novel performance guarantees via an information-theoretic
framework for general noise models. For the special cases of no noise and
symmetric noise, we find that the asymptotic number of tests required for
vanishing error probability is within a factor of the
information-theoretic optimum at low sparsity levels, and that with a small
fraction of allowed incorrectly decoded items, this guarantee extends to all
sublinear sparsity levels. In addition, we provide a converse bound showing
that if one tries to move slightly beyond our low-sparsity achievability
threshold using separate decoding of items and i.i.d. randomized testing, the
average number of items decoded incorrectly approaches that of a trivial
decoder.Comment: Submitted to IEEE Journal of Selected Topics in Signal Processin
A Primal-Dual Algorithmic Framework for Constrained Convex Minimization
We present a primal-dual algorithmic framework to obtain approximate
solutions to a prototypical constrained convex optimization problem, and
rigorously characterize how common structural assumptions affect the numerical
efficiency. Our main analysis technique provides a fresh perspective on
Nesterov's excessive gap technique in a structured fashion and unifies it with
smoothing and primal-dual methods. For instance, through the choices of a dual
smoothing strategy and a center point, our framework subsumes decomposition
algorithms, augmented Lagrangian as well as the alternating direction
method-of-multipliers methods as its special cases, and provides optimal
convergence rates on the primal objective residual as well as the primal
feasibility gap of the iterates for all.Comment: This paper consists of 54 pages with 7 tables and 12 figure
Convergence of the Exponentiated Gradient Method with Armijo Line Search
Consider the problem of minimizing a convex differentiable function on the
probability simplex, spectrahedron, or set of quantum density matrices. We
prove that the exponentiated gradient method with Armjo line search always
converges to the optimum, if the sequence of the iterates possesses a strictly
positive limit point (element-wise for the vector case, and with respect to the
Lowner partial ordering for the matrix case). To the best our knowledge, this
is the first convergence result for a mirror descent-type method that only
requires differentiability. The proof exploits self-concordant likeness of the
log-partition function, which is of independent interest.Comment: 18 page
On the linear convergence of the stochastic gradient method with constant step-size
The strong growth condition (SGC) is known to be a sufficient condition for
linear convergence of the stochastic gradient method using a constant step-size
(SGM-CS). In this paper, we provide a necessary condition, for the
linear convergence of SGM-CS, that is weaker than SGC. Moreover, when this
necessary is violated up to a additive perturbation , we show that both
the projected stochastic gradient method using a constant step-size (PSGM-CS)
and the proximal stochastic gradient method exhibit linear convergence to a
noise dominated region, whose distance to the optimal solution is proportional
to
- …