4,424 research outputs found

    On Finding a Subset of Healthy Individuals from a Large Population

    Full text link
    In this paper, we derive mutual information based upper and lower bounds on the number of nonadaptive group tests required to identify a given number of "non defective" items from a large population containing a small number of "defective" items. We show that a reduction in the number of tests is achievable compared to the approach of first identifying all the defective items and then picking the required number of non-defective items from the complement set. In the asymptotic regime with the population size NN \rightarrow \infty, to identify LL non-defective items out of a population containing KK defective items, when the tests are reliable, our results show that CsK1o(1)(Φ(α0,β0)+o(1))\frac{C_s K}{1-o(1)} (\Phi(\alpha_0, \beta_0) + o(1)) measurements are sufficient, where CsC_s is a constant independent of N,KN, K and LL, and Φ(α0,β0)\Phi(\alpha_0, \beta_0) is a bounded function of α0limNLNK\alpha_0 \triangleq \lim_{N\rightarrow \infty} \frac{L}{N-K} and β0limNKNK\beta_0 \triangleq \lim_{N\rightarrow \infty} \frac{K} {N-K}. Further, in the nonadaptive group testing setup, we obtain rigorous upper and lower bounds on the number of tests under both dilution and additive noise models. Our results are derived using a general sparse signal model, by virtue of which, they are also applicable to other important sparse signal based applications such as compressive sensing.Comment: 32 pages, 2 figures, 3 tables, revised version of a paper submitted to IEEE Trans. Inf. Theor

    Nearly Optimal Sparse Group Testing

    Full text link
    Group testing is the process of pooling arbitrary subsets from a set of nn items so as to identify, with a minimal number of tests, a "small" subset of dd defective items. In "classical" non-adaptive group testing, it is known that when dd is substantially smaller than nn, Θ(dlog(n))\Theta(d\log(n)) tests are both information-theoretically necessary and sufficient to guarantee recovery with high probability. Group testing schemes in the literature meeting this bound require most items to be tested Ω(log(n))\Omega(\log(n)) times, and most tests to incorporate Ω(n/d)\Omega(n/d) items. Motivated by physical considerations, we study group testing models in which the testing procedure is constrained to be "sparse". Specifically, we consider (separately) scenarios in which (a) items are finitely divisible and hence may participate in at most γo(log(n))\gamma \in o(\log(n)) tests; or (b) tests are size-constrained to pool no more than ρo(n/d)\rho \in o(n/d)items per test. For both scenarios we provide information-theoretic lower bounds on the number of tests required to guarantee high probability recovery. In both scenarios we provide both randomized constructions (under both ϵ\epsilon-error and zero-error reconstruction guarantees) and explicit constructions of designs with computationally efficient reconstruction algorithms that require a number of tests that are optimal up to constant or small polynomial factors in some regimes of n,d,γ,n, d, \gamma, and ρ\rho. The randomized design/reconstruction algorithm in the ρ\rho-sized test scenario is universal -- independent of the value of dd, as long as ρo(n/d)\rho \in o(n/d). We also investigate the effect of unreliability/noise in test outcomes. For the full abstract, please see the full text PDF

    The Capacity of Adaptive Group Testing

    Full text link
    We define capacity for group testing problems and deduce bounds for the capacity of a variety of noisy models, based on the capacity of equivalent noisy communication channels. For noiseless adaptive group testing we prove an information-theoretic lower bound which tightens a bound of Chan et al. This can be combined with a performance analysis of a version of Hwang's adaptive group testing algorithm, in order to deduce the capacity of noiseless and erasure group testing models.Comment: 5 page

    Discovery of low-dimensional structure in high-dimensional inference problems

    Full text link
    Many learning and inference problems involve high-dimensional data such as images, video or genomic data, which cannot be processed efficiently using conventional methods due to their dimensionality. However, high-dimensional data often exhibit an inherent low-dimensional structure, for instance they can often be represented sparsely in some basis or domain. The discovery of an underlying low-dimensional structure is important to develop more robust and efficient analysis and processing algorithms. The first part of the dissertation investigates the statistical complexity of sparse recovery problems, including sparse linear and nonlinear regression models, feature selection and graph estimation. We present a framework that unifies sparse recovery problems and construct an analogy to channel coding in classical information theory. We perform an information-theoretic analysis to derive bounds on the number of samples required to reliably recover sparsity patterns independent of any specific recovery algorithm. In particular, we show that sample complexity can be tightly characterized using a mutual information formula similar to channel coding results. Next, we derive major extensions to this framework, including dependent input variables and a lower bound for sequential adaptive recovery schemes, which helps determine whether adaptivity provides performance gains. We compute statistical complexity bounds for various sparse recovery problems, showing our analysis improves upon the existing bounds and leads to intuitive results for new applications. In the second part, we investigate methods for improving the computational complexity of subgraph detection in graph-structured data, where we aim to discover anomalous patterns present in a connected subgraph of a given graph. This problem arises in many applications such as detection of network intrusions, community detection, detection of anomalous events in surveillance videos or disease outbreaks. Since optimization over connected subgraphs is a combinatorial and computationally difficult problem, we propose a convex relaxation that offers a principled approach to incorporating connectivity and conductance constraints on candidate subgraphs. We develop a novel nearly-linear time algorithm to solve the relaxed problem, establish convergence and consistency guarantees and demonstrate its feasibility and performance with experiments on real networks

    Info-Greedy sequential adaptive compressed sensing

    Full text link
    We present an information-theoretic framework for sequential adaptive compressed sensing, Info-Greedy Sensing, where measurements are chosen to maximize the extracted information conditioned on the previous measurements. We show that the widely used bisection approach is Info-Greedy for a family of kk-sparse signals by connecting compressed sensing and blackbox complexity of sequential query algorithms, and present Info-Greedy algorithms for Gaussian and Gaussian Mixture Model (GMM) signals, as well as ways to design sparse Info-Greedy measurements. Numerical examples demonstrate the good performance of the proposed algorithms using simulated and real data: Info-Greedy Sensing shows significant improvement over random projection for signals with sparse and low-rank covariance matrices, and adaptivity brings robustness when there is a mismatch between the assumed and the true distributions.Comment: Preliminary results presented at Allerton Conference 2014. To appear in IEEE Journal Selected Topics on Signal Processin

    Computationally Tractable Algorithms for Finding a Subset of Non-defective Items from a Large Population

    Full text link
    In the classical non-adaptive group testing setup, pools of items are tested together, and the main goal of a recovery algorithm is to identify the "complete defective set" given the outcomes of different group tests. In contrast, the main goal of a "non-defective subset recovery" algorithm is to identify a "subset" of non-defective items given the test outcomes. In this paper, we present a suite of computationally efficient and analytically tractable non-defective subset recovery algorithms. By analyzing the probability of error of the algorithms, we obtain bounds on the number of tests required for non-defective subset recovery with arbitrarily small probability of error. Our analysis accounts for the impact of both the additive noise (false positives) and dilution noise (false negatives). By comparing with the information theoretic lower bounds, we show that the upper bounds on the number of tests are order-wise tight up to a log2K\log^2K factor, where KK is the number of defective items. We also provide simulation results that compare the relative performance of the different algorithms and provide further insights into their practical utility. The proposed algorithms significantly outperform the straightforward approaches of testing items one-by-one, and of first identifying the defective set and then choosing the non-defective items from the complement set, in terms of the number of measurements required to ensure a given success rate.Comment: In this revision: Unified some proofs and reorganized the paper, corrected a small mistake in one of the proofs, added more reference

    Limits on Support Recovery with Probabilistic Models: An Information-Theoretic Framework

    Get PDF
    The support recovery problem consists of determining a sparse subset of a set of variables that is relevant in generating a set of observations, and arises in a diverse range of settings such as compressive sensing, and subset selection in regression, and group testing. In this paper, we take a unified approach to support recovery problems, considering general probabilistic models relating a sparse data vector to an observation vector. We study the information-theoretic limits of both exact and partial support recovery, taking a novel approach motivated by thresholding techniques in channel coding. We provide general achievability and converse bounds characterizing the trade-off between the error probability and number of measurements, and we specialize these to the linear, 1-bit, and group testing models. In several cases, our bounds not only provide matching scaling laws in the necessary and sufficient number of measurements, but also sharp thresholds with matching constant factors. Our approach has several advantages over previous approaches: For the achievability part, we obtain sharp thresholds under broader scalings of the sparsity level and other parameters (e.g., signal-to-noise ratio) compared to several previous works, and for the converse part, we not only provide conditions under which the error probability fails to vanish, but also conditions under which it tends to one.Comment: Accepted to IEEE Transactions on Information Theory; presented in part at ISIT 2015 and SODA 201
    corecore