8 research outputs found
The capacity of non-identical adaptive group testing
We consider the group testing problem, in the case where the items are
defective independently but with non-constant probability. We introduce and
analyse an algorithm to solve this problem by grouping items together
appropriately. We give conditions under which the algorithm performs
essentially optimally in the sense of information-theoretic capacity. We use
concentration of measure results to bound the probability that this algorithm
requires many more tests than the expected number. This has applications to the
allocation of spectrum to cognitive radios, in the case where a database gives
prior information that a particular band will be occupied.Comment: To be presented at Allerton 201
Poisson Group Testing: A Probabilistic Model for Boolean Compressed Sensing
We introduce a novel probabilistic group testing framework, termed Poisson
group testing, in which the number of defectives follows a right-truncated
Poisson distribution. The Poisson model has a number of new applications,
including dynamic testing with diminishing relative rates of defectives. We
consider both nonadaptive and semi-adaptive identification methods. For
nonadaptive methods, we derive a lower bound on the number of tests required to
identify the defectives with a probability of error that asymptotically
converges to zero; in addition, we propose test matrix constructions for which
the number of tests closely matches the lower bound. For semi-adaptive methods,
we describe a lower bound on the expected number of tests required to identify
the defectives with zero error probability. In addition, we propose a
stage-wise reconstruction algorithm for which the expected number of tests is
only a constant factor away from the lower bound. The methods rely only on an
estimate of the average number of defectives, rather than on the individual
probabilities of subjects being defective
Optimal Dorfman Group Testing For Symmetric Distributions
We study Dorfman's classical group testing protocol in a novel setting where
individual specimen statuses are modeled as exchangeable random variables. We
are motivated by infectious disease screening. In that case, specimens which
arrive together for testing often originate from the same community and so
their statuses may exhibit positive correlation. Dorfman's protocol screens a
population of n specimens for a binary trait by partitioning it into
nonoverlapping groups, testing these, and only individually retesting the
specimens of each positive group. The partition is chosen to minimize the
expected number of tests under a probabilistic model of specimen statuses. We
relax the typical assumption that these are independent and indentically
distributed and instead model them as exchangeable random variables. In this
case, their joint distribution is symmetric in the sense that it is invariant
under permutations. We give a characterization of such distributions in terms
of a function q where q(h) is the marginal probability that any group of size h
tests negative. We use this interpretable representation to show that the set
partitioning problem arising in Dorfman's protocol can be reduced to an integer
partitioning problem and efficiently solved. We apply these tools to an
empirical dataset from the COVID-19 pandemic. The methodology helps explain the
unexpectedly high empirical efficiency reported by the original investigators.Comment: 20 pages w/o references, 2 figure
Graph-based techniques for compression and reconstruction of sparse sources
The main goal of this thesis is to develop lossless compression schemes for analog and binary sources. All the considered compression schemes have as common feature that the encoder can be represented by a graph, so they can be studied employing tools from modern coding theory.
In particular, this thesis is focused on two compression problems: the group testing and the noiseless compressed sensing problems. Although both problems may seem unrelated, in the thesis they are shown to be very close. Furthermore, group testing has the same mathematical formulation as non-linear binary source compression schemes that use the OR operator. In this thesis, the similarities between these problems are exploited.
The group testing problem is aimed at identifying the defective subjects of a population with as few tests as possible. Group testing schemes can be divided into two groups: adaptive and non-adaptive group testing schemes. The former schemes generate tests sequentially and exploit the partial decoding results to attempt to reduce the overall number of tests required to label all members of the population, whereas non-adaptive schemes perform all the test in parallel and attempt to label as many subjects as possible.
Our contributions to the group testing problem are both theoretical and practical. We propose a novel adaptive scheme aimed to efficiently perform the testing process. Furthermore, we develop tools to predict the performance of both adaptive and non-adaptive schemes when the number of subjects to be tested is large. These tools allow to characterize the performance of adaptive and non-adaptive group testing schemes without simulating them.
The goal of the noiseless compressed sensing problem is to retrieve a signal from its lineal projection version in a lower-dimensional space. This can be done only whenever the amount of null components of the original signal is large enough. Compressed sensing deals with the design of sampling schemes and reconstruction algorithms that manage to reconstruct the original signal vector with as few samples as possible.
In this thesis we pose the compressed sensing problem within a probabilistic framework, as opposed to the classical compression sensing formulation. Recent results in the state of the art show that this approach is more efficient than the classical one.
Our contributions to noiseless compressed sensing are both theoretical and practical. We deduce a necessary and sufficient matrix design condition to guarantee that the reconstruction is lossless. Regarding the design of practical schemes, we propose two novel reconstruction algorithms based on message passing over the sparse representation of the matrix, one of them with very low computational complexity.El objetivo principal de la tesis es el desarrollo de esquemas de compresión sin pérdidas para fuentes analógicas y binarias. Los esquemas analizados tienen en común la representación del compresor mediante un grafo; esto ha permitido emplear en su estudio las herramientas de codificación modernas. Más concretamente la tesis estudia dos problemas de compresión en particular: el diseño de experimentos de testeo comprimido de poblaciones (de sangre, de presencia de elementos contaminantes, secuenciado de ADN, etcétera) y el muestreo comprimido de señales reales en ausencia de ruido. A pesar de que a primera vista parezcan problemas totalmente diferentes, en la tesis mostramos que están muy relacionados. Adicionalmente, el problema de testeo comprimido de poblaciones tiene una formulación matemática idéntica a los códigos de compresión binarios no lineales basados en puertas OR. En la tesis se explotan las similitudes entre todos estos problemas. Existen dos aproximaciones al testeo de poblaciones: el testeo adaptativo y el no adaptativo. El primero realiza los test de forma secuencial y explota los resultados parciales de estos para intentar reducir el número total de test necesarios, mientras que el segundo hace todos los test en bloque e intenta extraer el máximo de datos posibles de los test. Nuestras contribuciones al problema de testeo comprimido han sido tanto teóricas como prácticas. Hemos propuesto un nuevo esquema adaptativo para realizar eficientemente el proceso de testeo. Además hemos desarrollado herramientas que permiten predecir el comportamiento tanto de los esquemas adaptativos como de los esquemas no adaptativos cuando el número de sujetos a testear es elevado. Estas herramientas permiten anticipar las prestaciones de los esquemas de testeo sin necesidad de simularlos. El objetivo del muestreo comprimido es recuperar una señal a partir de su proyección lineal en un espacio de menor dimensión. Esto sólo es posible si se asume que la señal original tiene muchas componentes que son cero. El problema versa sobre el diseño de matrices y algoritmos de reconstrucción que permitan implementar esquemas de muestreo y reconstrucción con un número mÃnimo de muestras. A diferencia de la formulación clásica de muestreo comprimido, en esta tesis se ha empleado un modelado probabilÃstico de la señal. Referencias recientes en la literatura demuestran que este enfoque permite conseguir esquemas de compresión y descompresión más eficientes. Nuestras contribuciones en el campo de muestreo comprimido de fuentes analógicas dispersas han sido también teóricas y prácticas. Por un lado, la deducción de la condición necesaria y suficiente que debe garantizar la matriz de muestreo para garantizar que se puede reconstruir unÃvocamente la secuencia de fuente. Por otro lado, hemos propuesto dos algoritmos, uno de ellos de baja complejidad computacional, que permiten reconstruir la señal original basados en paso de mensajes entre los nodos de la representación gráfica de la matriz de proyección.Postprint (published version
Group testing:an information theory perspective
The group testing problem concerns discovering a small number of defective
items within a large population by performing tests on pools of items. A test
is positive if the pool contains at least one defective, and negative if it
contains no defectives. This is a sparse inference problem with a combinatorial
flavour, with applications in medical testing, biology, telecommunications,
information technology, data science, and more. In this monograph, we survey
recent developments in the group testing problem from an information-theoretic
perspective. We cover several related developments: efficient algorithms with
practical storage and computation requirements, achievability bounds for
optimal decoding methods, and algorithm-independent converse bounds. We assess
the theoretical guarantees not only in terms of scaling laws, but also in terms
of the constant factors, leading to the notion of the {\em rate} of group
testing, indicating the amount of information learned per test. Considering
both noiseless and noisy settings, we identify several regimes where existing
algorithms are provably optimal or near-optimal, as well as regimes where there
remains greater potential for improvement. In addition, we survey results
concerning a number of variations on the standard group testing problem,
including partial recovery criteria, adaptive algorithms with a limited number
of stages, constrained test designs, and sublinear-time algorithms.Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations
and Trends in Communications and Information Theor