4,424 research outputs found
On Finding a Subset of Healthy Individuals from a Large Population
In this paper, we derive mutual information based upper and lower bounds on
the number of nonadaptive group tests required to identify a given number of
"non defective" items from a large population containing a small number of
"defective" items. We show that a reduction in the number of tests is
achievable compared to the approach of first identifying all the defective
items and then picking the required number of non-defective items from the
complement set. In the asymptotic regime with the population size , to identify non-defective items out of a population
containing defective items, when the tests are reliable, our results show
that measurements are
sufficient, where is a constant independent of and , and
is a bounded function of and . Further, in the nonadaptive group
testing setup, we obtain rigorous upper and lower bounds on the number of tests
under both dilution and additive noise models. Our results are derived using a
general sparse signal model, by virtue of which, they are also applicable to
other important sparse signal based applications such as compressive sensing.Comment: 32 pages, 2 figures, 3 tables, revised version of a paper submitted
to IEEE Trans. Inf. Theor
Nearly Optimal Sparse Group Testing
Group testing is the process of pooling arbitrary subsets from a set of
items so as to identify, with a minimal number of tests, a "small" subset of
defective items. In "classical" non-adaptive group testing, it is known
that when is substantially smaller than , tests are
both information-theoretically necessary and sufficient to guarantee recovery
with high probability. Group testing schemes in the literature meeting this
bound require most items to be tested times, and most tests
to incorporate items.
Motivated by physical considerations, we study group testing models in which
the testing procedure is constrained to be "sparse". Specifically, we consider
(separately) scenarios in which (a) items are finitely divisible and hence may
participate in at most tests; or (b) tests are
size-constrained to pool no more than items per test. For both
scenarios we provide information-theoretic lower bounds on the number of tests
required to guarantee high probability recovery. In both scenarios we provide
both randomized constructions (under both -error and zero-error
reconstruction guarantees) and explicit constructions of designs with
computationally efficient reconstruction algorithms that require a number of
tests that are optimal up to constant or small polynomial factors in some
regimes of and . The randomized design/reconstruction
algorithm in the -sized test scenario is universal -- independent of the
value of , as long as . We also investigate the effect of
unreliability/noise in test outcomes. For the full abstract, please see the
full text PDF
The Capacity of Adaptive Group Testing
We define capacity for group testing problems and deduce bounds for the
capacity of a variety of noisy models, based on the capacity of equivalent
noisy communication channels. For noiseless adaptive group testing we prove an
information-theoretic lower bound which tightens a bound of Chan et al. This
can be combined with a performance analysis of a version of Hwang's adaptive
group testing algorithm, in order to deduce the capacity of noiseless and
erasure group testing models.Comment: 5 page
Discovery of low-dimensional structure in high-dimensional inference problems
Many learning and inference problems involve high-dimensional data such as images, video or genomic data, which cannot be processed efficiently using conventional methods due to their dimensionality. However, high-dimensional data often exhibit an inherent low-dimensional structure, for instance they can often be represented sparsely in some basis or domain. The discovery of an underlying low-dimensional structure is important to develop more robust and efficient analysis and processing algorithms.
The first part of the dissertation investigates the statistical complexity of sparse recovery problems, including sparse linear and nonlinear regression models, feature selection and graph estimation. We present a framework that unifies sparse recovery problems and construct an analogy to channel coding in classical information theory. We perform an information-theoretic analysis to derive bounds on the number of samples required to reliably recover sparsity patterns independent of any specific recovery algorithm. In particular, we show that sample complexity can be tightly characterized using a mutual information formula similar to channel coding results. Next, we derive major extensions to this framework, including dependent input variables and a lower bound for sequential adaptive recovery schemes, which helps determine whether adaptivity provides performance gains. We compute statistical complexity bounds for various sparse recovery problems, showing our analysis improves upon the existing bounds and leads to intuitive results for new applications.
In the second part, we investigate methods for improving the computational complexity of subgraph detection in graph-structured data, where we aim to discover anomalous patterns present in a connected subgraph of a given graph. This problem arises in many applications such as detection of network intrusions, community detection, detection of anomalous events in surveillance videos or disease outbreaks. Since optimization over connected subgraphs is a combinatorial and computationally difficult problem, we propose a convex relaxation that offers a principled approach to incorporating connectivity and conductance constraints on candidate subgraphs. We develop a novel nearly-linear time algorithm to solve the relaxed problem, establish convergence and consistency guarantees and demonstrate its feasibility and performance with experiments on real networks
Info-Greedy sequential adaptive compressed sensing
We present an information-theoretic framework for sequential adaptive
compressed sensing, Info-Greedy Sensing, where measurements are chosen to
maximize the extracted information conditioned on the previous measurements. We
show that the widely used bisection approach is Info-Greedy for a family of
-sparse signals by connecting compressed sensing and blackbox complexity of
sequential query algorithms, and present Info-Greedy algorithms for Gaussian
and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse
Info-Greedy measurements. Numerical examples demonstrate the good performance
of the proposed algorithms using simulated and real data: Info-Greedy Sensing
shows significant improvement over random projection for signals with sparse
and low-rank covariance matrices, and adaptivity brings robustness when there
is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear
in IEEE Journal Selected Topics on Signal Processin
Computationally Tractable Algorithms for Finding a Subset of Non-defective Items from a Large Population
In the classical non-adaptive group testing setup, pools of items are tested
together, and the main goal of a recovery algorithm is to identify the
"complete defective set" given the outcomes of different group tests. In
contrast, the main goal of a "non-defective subset recovery" algorithm is to
identify a "subset" of non-defective items given the test outcomes. In this
paper, we present a suite of computationally efficient and analytically
tractable non-defective subset recovery algorithms. By analyzing the
probability of error of the algorithms, we obtain bounds on the number of tests
required for non-defective subset recovery with arbitrarily small probability
of error. Our analysis accounts for the impact of both the additive noise
(false positives) and dilution noise (false negatives). By comparing with the
information theoretic lower bounds, we show that the upper bounds on the number
of tests are order-wise tight up to a factor, where is the number
of defective items. We also provide simulation results that compare the
relative performance of the different algorithms and provide further insights
into their practical utility. The proposed algorithms significantly outperform
the straightforward approaches of testing items one-by-one, and of first
identifying the defective set and then choosing the non-defective items from
the complement set, in terms of the number of measurements required to ensure a
given success rate.Comment: In this revision: Unified some proofs and reorganized the paper,
corrected a small mistake in one of the proofs, added more reference
Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework
The support recovery problem consists of determining a sparse subset of a set
of variables that is relevant in generating a set of observations, and arises
in a diverse range of settings such as compressive sensing, and subset
selection in regression, and group testing. In this paper, we take a unified
approach to support recovery problems, considering general probabilistic models
relating a sparse data vector to an observation vector. We study the
information-theoretic limits of both exact and partial support recovery, taking
a novel approach motivated by thresholding techniques in channel coding. We
provide general achievability and converse bounds characterizing the trade-off
between the error probability and number of measurements, and we specialize
these to the linear, 1-bit, and group testing models. In several cases, our
bounds not only provide matching scaling laws in the necessary and sufficient
number of measurements, but also sharp thresholds with matching constant
factors. Our approach has several advantages over previous approaches: For the
achievability part, we obtain sharp thresholds under broader scalings of the
sparsity level and other parameters (e.g., signal-to-noise ratio) compared to
several previous works, and for the converse part, we not only provide
conditions under which the error probability fails to vanish, but also
conditions under which it tends to one.Comment: Accepted to IEEE Transactions on Information Theory; presented in
part at ISIT 2015 and SODA 201
- …