38 research outputs found

    Group testing:an information theory perspective

    Get PDF
    The group testing problem concerns discovering a small number of defective items within a large population by performing tests on pools of items. A test is positive if the pool contains at least one defective, and negative if it contains no defectives. This is a sparse inference problem with a combinatorial flavour, with applications in medical testing, biology, telecommunications, information technology, data science, and more. In this monograph, we survey recent developments in the group testing problem from an information-theoretic perspective. We cover several related developments: efficient algorithms with practical storage and computation requirements, achievability bounds for optimal decoding methods, and algorithm-independent converse bounds. We assess the theoretical guarantees not only in terms of scaling laws, but also in terms of the constant factors, leading to the notion of the {\em rate} of group testing, indicating the amount of information learned per test. Considering both noiseless and noisy settings, we identify several regimes where existing algorithms are provably optimal or near-optimal, as well as regimes where there remains greater potential for improvement. In addition, we survey results concerning a number of variations on the standard group testing problem, including partial recovery criteria, adaptive algorithms with a limited number of stages, constrained test designs, and sublinear-time algorithms.Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations and Trends in Communications and Information Theor

    Reducing the Number of Tests for COVID-19 Infection via Group Testing Methodologies

    Get PDF
    Total economic shutdown being detrimental to a nation’s prosperity, most governments are reopening businesses and schools with the requirement of frequent and mass-scale testing to determine each person’s status of COVID-19 infection. Obviously, the costs add up quickly and impose a heavy economic toll. As a way out of this dilemma, employers and administrators should consider seriously the application of group testing methodologies. Group testing methods check samples in batches, rather than individually, for the presence of a disease. If the group tests positive, then the group is investigated further to identify who all are positive. On the other hand, if the group tests negative, not just once but also a second or a third time, then everyone within the group is cleared for activity. With a carefully chosen protocol, group testing costs can be 30-80% lower than those of individual testing, with the savings being higher when prevalence of the disease is lower

    Interference Mitigation in Large Random Wireless Networks

    Full text link
    A central problem in the operation of large wireless networks is how to deal with interference -- the unwanted signals being sent by transmitters that a receiver is not interested in. This thesis looks at ways of combating such interference. In Chapters 1 and 2, we outline the necessary information and communication theory background, including the concept of capacity. We also include an overview of a new set of schemes for dealing with interference known as interference alignment, paying special attention to a channel-state-based strategy called ergodic interference alignment. In Chapter 3, we consider the operation of large regular and random networks by treating interference as background noise. We consider the local performance of a single node, and the global performance of a very large network. In Chapter 4, we use ergodic interference alignment to derive the asymptotic sum-capacity of large random dense networks. These networks are derived from a physical model of node placement where signal strength decays over the distance between transmitters and receivers. (See also arXiv:1002.0235 and arXiv:0907.5165.) In Chapter 5, we look at methods of reducing the long time delays incurred by ergodic interference alignment. We analyse the tradeoff between reducing delay and lowering the communication rate. (See also arXiv:1004.0208.) In Chapter 6, we outline a problem that is equivalent to the problem of pooled group testing for defective items. We then present some new work that uses information theoretic techniques to attack group testing. We introduce for the first time the concept of the group testing channel, which allows for modelling of a wide range of statistical error models for testing. We derive new results on the number of tests required to accurately detect defective items, including when using sequential `adaptive' tests.Comment: PhD thesis, University of Bristol, 201

    Information-Theoretic and Algorithmic Thresholds for Group Testing

    Get PDF
    In the group testing problem we aim to identify a small number of infected individuals within a large population. We avail ourselves to a procedure that can test a group of multiple individuals, with the test result coming out positive iff at least one individual in the group is infected. With all tests conducted in parallel, what is the least number of tests required to identify the status of all individuals? In a recent test design [Aldridge et al. 2016] the individuals are assigned to test groups randomly, with every individual joining an equal number of groups. We pinpoint the sharp threshold for the number of tests required in this randomised design so that it is information-theoretically possible to infer the infection status of every individual. Moreover, we analyse two efficient inference algorithms. These results settle conjectures from [Aldridge et al. 2014, Johnson et al. 2019]

    A Fast Binary Splitting Approach to Non-Adaptive Group Testing

    Get PDF
    In this paper, we consider the problem of noiseless non-adaptive group testing under the for-each recovery guarantee, also known as probabilistic group testing. In the case of nn items and kk defectives, we provide an algorithm attaining high-probability recovery with O(klog⁥n)O(k \log n) scaling in both the number of tests and runtime, improving on the best known O(k2log⁥k⋅log⁥n)O(k^2 \log k \cdot \log n) runtime previously available for any algorithm that only uses O(klog⁥n)O(k \log n) tests. Our algorithm bears resemblance to Hwang's adaptive generalized binary splitting algorithm (Hwang, 1972); we recursively work with groups of items of geometrically vanishing sizes, while maintaining a list of "possibly defective" groups and circumventing the need for adaptivity. While the most basic form of our algorithm requires Ω(n)\Omega(n) storage, we also provide a low-storage variant based on hashing, with similar recovery guarantees.Comment: Accepted to RANDOM 202

    Coding for storage and testing

    Get PDF
    The problem of reconstructing strings from substring information has found many applications due to its importance in genomic data sequencing and DNA- and polymer-based data storage. Motivated by platforms that use chains of binary synthetic polymers as the recording media and read the content via tandem mass spectrometers, we propose new a family of codes that allows for both unique string reconstruction and correction of multiple mass errors. We first consider the paradigm where the masses of substrings of the input string form the evidence set. We consider two approaches: The first approach pertains to asymmetric errors and the error-correction is achieved by introducing redundancy that scales linearly with the number of errors and logarithmically with the length of the string. The proposed construction allows for the string to be uniquely reconstructed based only on its erroneous substring composition multiset. The asymptotic code rate of the scheme is one, and decoding is accomplished via a simplified version of the Backtracking algorithm used for the Turnpike problem. For symmetric errors, we use a polynomial characterization of the mass information and adapt polynomial evaluation code constructions for this setting. In the process, we develop new efficient decoding algorithms for a constant number of composition errors. The second part of this dissertation addresses a practical paradigm that requires reconstructing mixtures of strings based on the union of compositions of their prefixes and suffixes, generated by mass spectrometry devices. We describe new coding methods that allow for unique joint reconstruction of subsets of strings selected from a code and provide upper and lower bounds on the asymptotic rate of the underlying codebooks. Our code constructions combine properties of binary BhB_h and Dyck strings and can be extended to accommodate missing substrings in the pool. In the final chapter of this dissertation, we focus on group testing. We begin with a review of the gold-standard testing protocol for Covid-19, real-time, reverse transcription PCR, and its properties and associated measurement data such as amplification curves that can guide the development of appropriate and accurate adaptive group testing protocols. We then proceed to examine various off-the-shelf group testing methods for Covid-19, and identify their strengths and weaknesses for the application at hand. Finally, we present a collection of new analytical results for adaptive semiquantitative group testing with combinatorial priors, including performance bounds, algorithmic solutions, and noisy testing protocols. The worst-case paradigm extends and improves upon prior work on semiquantitative group testing with and without specialized PCR noise models

    Differential MR/GR Activation in Mice Results in Emotional States Beneficial or Impairing for Cognition

    Get PDF
    Corticosteroids regulate stress response and influence emotion, learning, and memory via two receptors in the brain, the high-affinity mineralocorticoid (MR) and low-affinity glucocorticoid receptor (GR). We test the hypothesis that MR- and GR-mediated effects interact in emotion and cognition when a novel situation is encountered that is relevant for a learning process. By adrenalectomy and additional constant corticosterone supplement we obtained four groups of male C57BL/6J mice with differential chronic MR and GR activations. Using a hole board task, we found that mice with continuous predominant MR and moderate GR activations were fast learners that displayed low anxiety and arousal together with high directed explorative behavior. Progressive corticosterone concentrations with predominant action via GR induced strong emotional arousal at the expense of cognitive performance. These findings underline the importance of a balanced MR/GR system for emotional and cognitive functioning that is critical for mental health

    Panic, irrationality, herding: Three ambiguous terms in crowd dynamics research

    Get PDF
    Background: The three terms “panic”, “irrationality” and “herding” are ubiquitous in the crowd dynamics literature and have a strong influence on both modelling and management practices. The terms are also commonly shared between the scientific and non-scientific domains. The pervasiveness of the use of these terms is to the point where their underlying assumptions have often been treated as common knowledge by both experts and lay persons. Yet, at the same time, the literature on crowd dynamics presents ample debate, contradiction and inconsistency on these topics. Method: This review is the first to systematically revisit these three terms in a unified study to highlight the scope of this debate. We extracted from peer-reviewed journal articles direct quotes that offer a definition, conceptualisation or supporting/contradicting evidence on these terms and/or their underlying theories. To further examine the suitability of the term herding, a secondary and more detailed analysis is also conducted on studies that have specifically investigated this phenomenon in empirical settings. Results. The review shows that (i) there is no consensus on the definition for the terms panic and irrationality; and that (ii) the literature is highly divided along discipline lines on how accurate these theories/terminologies are for describing human escape behaviour. The review reveals a complete division and disconnection between studies published by social scientists and those from the physical science domain; also, between studies whose main focus is on numerical simulation versus those with empirical focus. (iii) Despite the ambiguity of the definitions and the missing consensus in the literature, these terms are still increasingly and persistently mentioned in crowd evacuation studies. (iv) Different to panic and irrationality, there is relative consistency in definitions of the term herding, with the term usually being associated with ‘(blind) imitation’. However, based on the findings of empirical studies, we argue why, despite the relative consistency in meaning, (v) the term herding itself lacks adequate nuance and accuracy for describing the role of ‘social influence’ in escape behaviour. Our conclusions also emphasise the importance of distinguishing between the social influence on various aspects of evacuation behaviour and avoiding generalisation across various behavioural layers. Conclusions. We argue that the use of these three terms in the scientific literature does not contribute constructively to extending the knowledge or to improving the modelling capabilities in the field of crowd dynamics. This is largely due to the ambiguity of these terms, the overly simplistic nature of their assumptions, or the fact that the theories they represent are not readily verifiable. Recommendations: We suggest that it would be beneficial for advancing this research field that the phenomena related to these three terms are clearly defined by more tangible and quantifiable terms and be formulated as verifiable hypotheses, so they can be operationalised for empirical testing

    Preemptive mobile code protection using spy agents

    Get PDF
    This thesis introduces 'spy agents' as a new security paradigm for evaluating trust in remote hosts in mobile code scenarios. In this security paradigm, a spy agent, i.e. a mobile agent which circulates amongst a number of remote hosts, can employ a variety of techniques in order to both appear 'normal' and suggest to a malicious host that it can 'misuse' the agent's data or code without being held accountable. A framework for the operation and deployment of such spy agents is described. Subsequently, a number of aspects of the operation of such agents within this framework are analysed in greater detail. The set of spy agent routes needs to be constructed in a manner that enables hosts to be identified from a set of detectable agent-specific outcomes. The construction of route sets that both reduce the probability of spy agent detection and support identification of the origin of a malicious act is analysed in the context of combinatorial group testing theory. Solutions to the route set design problem are proposed. A number of spy agent application scenarios are introduced and analysed, including: a) the implementation of a mobile code email honeypot system for identifying email privacy infringers, b) the design of sets of agent routes that enable malicious host detection even when hosts collude, and c) the evaluation of the credibility of host classification results in the presence of inconsistent host behaviour. Spy agents can be used in a wide range of applications, and it appears that each application creates challenging new research problems, notably in the design of appropriate agent route sets

    Doctor of Philosophy

    Get PDF
    dissertationLinked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data
    corecore