Search CORE

9 research outputs found

Phase Transitions in the Pooled Data Problem

Author: Cevher Volkan
Scarlett Jonathan
Publication venue
Publication date: 05/09/2017
Field of study

In this paper, we study the pooled data problem of identifying the labels associated with a large collection of items, based on a sequence of pooled tests revealing the counts of each label within the pool. In the noiseless setting, we identify an exact asymptotic threshold on the required number of tests with optimal decoding, and prove a phase transition between complete success and complete failure. In addition, we present a novel noisy variation of the problem, and provide an information-theoretic framework for characterizing the required number of tests for general random noise models. Our results reveal that noise can make the problem considerably more difficult, with strict increases in the scaling laws even at low noise levels. Finally, we demonstrate similar behavior in an approximate recovery setting, where a given number of errors is allowed in the decoded labels.Comment: Accepted to NIPS 201

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Group Testing with Runlength Constraints for Topological Molecular Storage

Author: Agarwal Abhishek
Milenkovic Olgica
Pattabiraman Srilakshmi
Ribeiro João
Publication venue
Publication date: 13/01/2020
Field of study

Motivated by applications in topological DNA-based data storage, we introduce and study a novel setting of Non-Adaptive Group Testing (NAGT) with runlength constraints on the columns of the test matrix, in the sense that any two 1's must be separated by a run of at least d 0's. We describe and analyze a probabilistic construction of a runlength-constrained scheme in the zero-error and vanishing error settings, and show that the number of tests required by this construction is optimal up to logarithmic factors in the runlength constraint d and the number of defectives k in both cases. Surprisingly, our results show that runlength-constrained NAGT is not more demanding than unconstrained NAGT when d=O(k), and that for almost all choices of d and k it is not more demanding than NAGT with a column Hamming weight constraint only. Towards obtaining runlength-constrained Quantitative NAGT (QNAGT) schemes with good parameters, we also provide lower bounds for this setting and a nearly optimal probabilistic construction of a QNAGT scheme with a column Hamming weight constraint

arXiv.org e-Print Archive

Crossref

Learning from Label Proportions: Bootstrapping Supervised Learners via Belief Propagation

Author: Havaldar Shreyas
Raghuveer Aravindan
Sareen Shubhi
Shanmugam Karthikeyan
Sharma Navodita
Publication venue
Publication date: 30/10/2023
Field of study

Learning from Label Proportions (LLP) is a learning problem where only aggregate level labels are available for groups of instances, called bags, during training, and the aim is to get the best performance at the instance-level on the test data. This setting arises in domains like advertising and medicine due to privacy considerations. We propose a novel algorithmic framework for this problem that iteratively performs two main steps. For the first step (Pseudo Labeling) in every iteration, we define a Gibbs distribution over binary instance labels that incorporates a) covariate information through the constraint that instances with similar covariates should have similar labels and b) the bag level aggregated label. We then use Belief Propagation (BP) to marginalize the Gibbs distribution to obtain pseudo labels. In the second step (Embedding Refinement), we use the pseudo labels to provide supervision for a learner that yields a better embedding. Further, we iterate on the two steps again by using the second step's embeddings as new covariates for the next iteration. In the final iteration, a classifier is trained using the pseudo labels. Our algorithm displays strong gains against several SOTA baselines (up to 15%) for the LLP Binary Classification problem on various dataset types - tabular and Image. We achieve these improvements with minimal computational overhead above standard supervised learning due to Belief Propagation, for large bag sizes, even for a million samples.Comment: Accepted at Regulatable ML @ NeurIPS 202

arXiv.org e-Print Archive

Group testing:an information theory perspective

Author: Aldridge Matthew
Johnson Oliver
Scarlett Jonathan
Publication venue: 'Now Publishers'
Publication date: 01/01/2019
Field of study

The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations and Trends in Communications and Information Theor

arXiv.org e-Print Archive

White Rose Research Online

CERN Document Server

Explore Bristol Research

Algorithmic and Coding-theoretic Methods for Group Testing and Private Information Retrieval

Author: Karimi Esmaeil
Publication venue
Publication date: 07/02/2023
Field of study

In the first part of this dissertation, we consider the Group Testing (GT) problem and its two variants, the Quantitative GT (QGT) problem and the Coin Weighing (CW) problem. An instance of the GT problem includes a ground set of items that includes a small subset of defective items. The GT procedure consists of a number of tests, such that each test indicates whether or not a given subset of items includes one or more defective items. The goal of the GT procedure is to identify the subset of defective items with the minimum number of tests. Motivated by practical scenarios where the outcome of the tests can be affected by noise, we focus on the noisy GT setting, in which the outcome of a test can be flipped with some probability. In the noisy GT setting, the goal is to identify the set of defective items with high probability. We investigate the performance of two variants of the Belief Propagation (BP) algorithm for decoding of noisy non-adaptive GT under the combinatorial model for defective items. Through extensive simulations, we show that the proposed algorithms achieve higher success probability and lower false-negative and false-positive rates when compared to the traditional BP algorithm. We also consider a variation of the probabilistic GT model in which the prior probability of each item to be defective is not uniform and in which there is a certain amount of side information on the distribution of the defective items available to the GT algorithm. This dissertation focuses on leveraging the side information for improving the performance of decoding algorithms for noisy GT. First, we propose a probabilistic model, referred to as an interaction model, that captures the side information about the probability distribution of the defective items. Next, we present a decoding scheme, based on BP, that leverages the interaction model to improve the decoding accuracy. Our results indicate that the proposed algorithm achieves higher success probability and lower false-negative and false-positive rates when compared to the traditional BP, especially in the high noise regime. In the QGT problem, the result of a test reveals the number of defective items in the tested group. This is in contrast to the standard GT where the result of each test is either 1 or 0 depending on whether the tested group contains any defective items or not. In this dissertation, we study the QGT problem for the combinatorial and probabilistic models of defective items. We propose non-adaptive QGT algorithms using sparse graph codes over bi-regular and irregular bipartite graphs, and binary t-error-correcting BCH codes. The proposed schemes provide exact recovery with a probabilistic guarantee, i.e. recover all the defective items with high probability. The proposed schemes outperform existing non-adaptive QGT schemes for the sub-linear regime in terms of the number of tests required to identify all defective items with high probability. The CW problem lies at the intersection of GT and compressed sensing problems. Given a collection of coins and the total weight of the coins, where the weight of each coin is an unknown integer, the problem is to determine the weight of each coin by weighing subsets of coins on a spring scale. The goal is to minimize the average number of weighings over all possible weight configurations. Toward this goal, we propose and analyze a simple and effective adaptive weighing strategy. This is the first non-trivial achievable upper bound on the minimum expected required number of weighings. In the second part of this dissertation, we focus on the private information retrieval problem. In many practical settings, the user needs to retrieve information messages from a server in a periodic manner, over multiple rounds of communication. The messages are retrieved one at a time and the identity of future requests is not known to the server. We study the private information retrieval protocols that ensure that the identities of all the messages retrieved from the server are protected. This scenario can occur in practical settings such as periodic content download from text and multimedia repositories. We refer to this problem of minimizing the rate of data download as online private information retrieval problem. Following the previous line of work by Kadhe et al., we assume that the user knows a subset of messages in the database as side information. The identities of these messages are initially unknown to the server. Focusing on scalar-linear settings, we characterize the per-round capacity, i.e., the maximum achievable download rate at each round. The key idea of our achievability scheme is to combine the data downloaded during the current round and the previous rounds with the original side information messages and use the resulting data as side information for the subsequent rounds

Texas A&M Repository