7,105 research outputs found
GROTESQUE: Noisy Group Testing (Quick and Efficient)
Group-testing refers to the problem of identifying (with high probability) a
(small) subset of defectives from a (large) set of items via a "small"
number of "pooled" tests. For ease of presentation in this work we focus on the
regime when D = \cO{N^{1-\gap}} for some \gap > 0. The tests may be
noiseless or noisy, and the testing procedure may be adaptive (the pool
defining a test may depend on the outcome of a previous test), or non-adaptive
(each test is performed independent of the outcome of other tests). A rich body
of literature demonstrates that tests are
information-theoretically necessary and sufficient for the group-testing
problem, and provides algorithms that achieve this performance. However, it is
only recently that reconstruction algorithms with computational complexity that
is sub-linear in have started being investigated (recent work by
\cite{GurI:04,IndN:10, NgoP:11} gave some of the first such algorithms). In the
scenario with adaptive tests with noisy outcomes, we present the first scheme
that is simultaneously order-optimal (up to small constant factors) in both the
number of tests and the decoding complexity (\cO{D\log(N)} in both the
performance metrics). The total number of stages of our adaptive algorithm is
"small" (\cO{\log(D)}). Similarly, in the scenario with non-adaptive tests
with noisy outcomes, we present the first scheme that is simultaneously
near-optimal in both the number of tests and the decoding complexity (via an
algorithm that requires \cO{D\log(D)\log(N)} tests and has a decoding
complexity of {}. Finally, we present an
adaptive algorithm that only requires 2 stages, and for which both the number
of tests and the decoding complexity scale as {}. For all three settings the probability of error of our
algorithms scales as \cO{1/(poly(D)}.Comment: 26 pages, 5 figure
Detecting Activations over Graphs using Spanning Tree Wavelet Bases
We consider the detection of activations over graphs under Gaussian noise,
where signals are piece-wise constant over the graph. Despite the wide
applicability of such a detection algorithm, there has been little success in
the development of computationally feasible methods with proveable theoretical
guarantees for general graph topologies. We cast this as a hypothesis testing
problem, and first provide a universal necessary condition for asymptotic
distinguishability of the null and alternative hypotheses. We then introduce
the spanning tree wavelet basis over graphs, a localized basis that reflects
the topology of the graph, and prove that for any spanning tree, this approach
can distinguish null from alternative in a low signal-to-noise regime. Lastly,
we improve on this result and show that using the uniform spanning tree in the
basis construction yields a randomized test with stronger theoretical
guarantees that in many cases matches our necessary conditions. Specifically,
we obtain near-optimal performance in edge transitive graphs, -nearest
neighbor graphs, and -graphs
Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs
We develop data structures for dynamic closest pair problems with arbitrary
distance functions, that do not necessarily come from any geometric structure
on the objects. Based on a technique previously used by the author for
Euclidean closest pairs, we show how to insert and delete objects from an
n-object set, maintaining the closest pair, in O(n log^2 n) time per update and
O(n) space. With quadratic space, we can instead use a quadtree-like structure
to achieve an optimal time bound, O(n) per update. We apply these data
structures to hierarchical clustering, greedy matching, and TSP heuristics, and
discuss other potential applications in machine learning, Groebner bases, and
local improvement algorithms for partition and placement problems. Experiments
show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at
the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp.
619-628. For source code and experimental results, see
http://www.ics.uci.edu/~eppstein/projects/pairs
Recommended from our members
A Haystack Heuristic for Autoimmune Disease Biomarker Discovery Using Next-Gen Immune Repertoire Sequencing Data.
Large-scale DNA sequencing of immunological repertoires offers an opportunity for the discovery of novel biomarkers for autoimmune disease. Available bioinformatics techniques however, are not adequately suited for elucidating possible biomarker candidates from within large immunosequencing datasets due to unsatisfactory scalability and sensitivity. Here, we present the Haystack Heuristic, an algorithm customized to computationally extract disease-associated motifs from next-generation-sequenced repertoires by contrasting disease and healthy subjects. This technique employs a local-search graph-theory approach to discover novel motifs in patient data. We apply the Haystack Heuristic to nine million B-cell receptor sequences obtained from nearly 100 individuals in order to elucidate a new motif that is significantly associated with multiple sclerosis. Our results demonstrate the effectiveness of the Haystack Heuristic in computing possible biomarker candidates from high throughput sequencing data and could be generalized to other datasets
- …