7 research outputs found
Derandomization and Group Testing
The rapid development of derandomization theory, which is a fundamental area
in theoretical computer science, has recently led to many surprising
applications outside its initial intention. We will review some recent such
developments related to combinatorial group testing. In its most basic setting,
the aim of group testing is to identify a set of "positive" individuals in a
population of items by taking groups of items and asking whether there is a
positive in each group.
In particular, we will discuss explicit constructions of optimal or
nearly-optimal group testing schemes using "randomness-conducting" functions.
Among such developments are constructions of error-correcting group testing
schemes using randomness extractors and condensers, as well as threshold group
testing schemes from lossless condensers.Comment: Invited Paper in Proceedings of 48th Annual Allerton Conference on
Communication, Control, and Computing, 201
Optimal Nested Test Plan for Combinatorial Quantitative Group Testing
We consider the quantitative group testing problem where the objective is to
identify defective items in a given population based on results of tests
performed on subsets of the population. Under the quantitative group testing
model, the result of each test reveals the number of defective items in the
tested group. The minimum number of tests achievable by nested test plans was
established by Aigner and Schughart in 1985 within a minimax framework. The
optimal nested test plan offering this performance, however, was not obtained.
In this work, we establish the optimal nested test plan in closed form. This
optimal nested test plan is also order optimal among all test plans as the
population size approaches infinity. Using heavy-hitter detection as a case
study, we show via simulation examples orders of magnitude improvement of the
group testing approach over two prevailing sampling-based approaches in
detection accuracy and counter consumption. Other applications include anomaly
detection and wideband spectrum sensing in cognitive radio systems
Group testing:an information theory perspective
The group testing problem concerns discovering a small number of defective
items within a large population by performing tests on pools of items. A test
is positive if the pool contains at least one defective, and negative if it
contains no defectives. This is a sparse inference problem with a combinatorial
flavour, with applications in medical testing, biology, telecommunications,
information technology, data science, and more. In this monograph, we survey
recent developments in the group testing problem from an information-theoretic
perspective. We cover several related developments: efficient algorithms with
practical storage and computation requirements, achievability bounds for
optimal decoding methods, and algorithm-independent converse bounds. We assess
the theoretical guarantees not only in terms of scaling laws, but also in terms
of the constant factors, leading to the notion of the {\em rate} of group
testing, indicating the amount of information learned per test. Considering
both noiseless and noisy settings, we identify several regimes where existing
algorithms are provably optimal or near-optimal, as well as regimes where there
remains greater potential for improvement. In addition, we survey results
concerning a number of variations on the standard group testing problem,
including partial recovery criteria, adaptive algorithms with a limited number
of stages, constrained test designs, and sublinear-time algorithms.Comment: Survey paper, 140 pages, 19 figures. To be published in Foundations
and Trends in Communications and Information Theor
Algorithmic advances in learning from large dimensional matrices and scientific data
University of Minnesota Ph.D. dissertation.May 2018. Major: Computer Science. Advisor: Yousef Saad. 1 computer file (PDF); xi, 196 pages.This thesis is devoted to answering a range of questions in machine learning and data analysis related to large dimensional matrices and scientific data. Two key research objectives connect the different parts of the thesis: (a) development of fast, efficient, and scalable algorithms for machine learning which handle large matrices and high dimensional data; and (b) design of learning algorithms for scientific data applications. The work combines ideas from multiple, often non-traditional, fields leading to new algorithms, new theory, and new insights in different applications. The first of the three parts of this thesis explores numerical linear algebra tools to develop efficient algorithms for machine learning with reduced computation cost and improved scalability. Here, we first develop inexpensive algorithms combining various ideas from linear algebra and approximation theory for matrix spectrum related problems such as numerical rank estimation, matrix function trace estimation including log-determinants, Schatten norms, and other spectral sums. We also propose a new method which simultaneously estimates the dimension of the dominant subspace of covariance matrices and obtains an approximation to the subspace. Next, we consider matrix approximation problems such as low rank approximation, column subset selection, and graph sparsification. We present a new approach based on multilevel coarsening to compute these approximations for large sparse matrices and graphs. Lastly, on the linear algebra front, we devise a novel algorithm based on rank shrinkage for the dictionary learning problem, learning a small set of dictionary columns which best represent the given data. The second part of this thesis focuses on exploring novel non-traditional applications of information theory and codes, particularly in solving problems related to machine learning and high dimensional data analysis. Here, we first propose new matrix sketching methods using codes for obtaining low rank approximations of matrices and solving least squares regression problems. Next, we demonstrate that codewords from certain coding scheme perform exceptionally well for the group testing problem. Lastly, we present a novel machine learning application for coding theory, that of solving large scale multilabel classification problems. We propose a new algorithm for multilabel classification which is based on group testing and codes. The algorithm has a simple inexpensive prediction method, and the error correction capabilities of codes are exploited for the first time to correct prediction errors. The third part of the thesis focuses on devising robust and stable learning algorithms, which yield results that are interpretable from specific scientific application viewpoint. We present Union of Intersections (UoI), a flexible, modular, and scalable framework for statistical-machine learning problems. We then adapt this framework to develop new algorithms for matrix decomposition problems such as nonnegative matrix factorization (NMF) and CUR decomposition. We apply these new methods to data from Neuroscience applications in order to obtain insights into the functionality of the brain. Finally, we consider the application of material informatics, learning from materials data. Here, we deploy regression techniques on materials data to predict physical properties of materials