35 research outputs found

    Sketching via hashing: from heavy hitters to compressed sensing to sparse fourier transform

    Get PDF
    Sketching via hashing is a popular and useful method for processing large data sets. Its basic idea is as follows. Suppose that we have a large multi-set of elements S=[formula], and we would like to identify the elements that occur “frequently" in S. The algorithm starts by selecting a hash function h that maps the elements into an array c[1…m]. The array entries are initialized to 0. Then, for each element a ∈ S, the algorithm increments c[h(a)]. At the end of the process, each array entry c[j] contains the count of all data elements a ∈ S mapped to j

    Group testing:an information theory perspective

    Get PDF
    The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations and Trends in Communications and Information Theor

    Inference And Learning: Computational Difficulty And Efficiency

    Get PDF
    In this thesis, we mainly investigate two collections of problems: statistical network inference and model selection in regression. The common feature shared by these two types of problems is that they typically exhibit an interesting phenomenon in terms of computational difficulty and efficiency. For statistical network inference, our goal is to infer the network structure based on a noisy observation of the network. Statistically, we model the network as generated from the structural information with the presence of noise, for example, planted submatrix model (for bipartite weighted graph), stochastic block model, and Watts-Strogatz model. As the relative amount of ``signal-to-noise\u27\u27 varies, the problems exhibit different stages of computational difficulty. On the theoretical side, we investigate these stages through characterizing the transition thresholds on the ``signal-to-noise\u27\u27 ratio, for the aforementioned models. On the methodological side, we provide new computationally efficient procedures to reconstruct the network structure for each model. For model selection in regression, our goal is to learn a ``good\u27\u27 model based on a certain model class from the observed data sequences (feature and response pairs), when the model can be misspecified. More concretely, we study two model selection problems: to learn from general classes of functions based on i.i.d. data with minimal assumptions, and to select from the sparse linear model class based on possibly adversarially chosen data in a sequential fashion. We develop new theoretical and algorithmic tools beyond empirical risk minimization to study these problems from a learning theory point of view

    Model-Based and Graph-Based Priors for Group Testing

    Full text link
    The goal of the group testing problem is to identify a set of defective items within a larger set of items, using suitably-designed tests whose outcomes indicate whether any defective item is present. In this paper, we study how the number of tests can be significantly decreased by leveraging the structural dependencies between the items, i.e., by incorporating prior information. To do so, we pursue two different perspectives: (i) As a generalization of the uniform combinatorial prior, we consider the case that the defective set is uniform over a \emph{subset} of all possible sets of a given size, and study how this impacts the information-theoretic limits on the number of tests for approximate recovery; (ii) As a generalization of the i.i.d.~prior, we introduce a new class of priors based on the Ising model, where the associated graph represents interactions between items. We show that this naturally leads to an Integer Quadratic Program decoder, which can be converted to an Integer Linear Program and/or relaxed to a non-integer variant for improved computational complexity, while maintaining strong empirical recovery performance.Comment: IEEE Transactions on Signal Processin
    corecore