Search CORE

7,105 research outputs found

GROTESQUE: Noisy Group Testing (Quick and Efficient)

Author: Bakshi Mayank
Cai Sheng
Jaggi Sidharth
Jahangoshahi Mohammad
Publication venue
Publication date: 01/01/2013
Field of study

Group-testing refers to the problem of identifying (with high probability) a (small) subset of

D

defectives from a (large) set of

N

items via a "small" number of "pooled" tests. For ease of presentation in this work we focus on the regime when D = \cO{N^{1-\gap}} for some \gap > 0. The tests may be noiseless or noisy, and the testing procedure may be adaptive (the pool defining a test may depend on the outcome of a previous test), or non-adaptive (each test is performed independent of the outcome of other tests). A rich body of literature demonstrates that

\Theta(D\log(N))

tests are information-theoretically necessary and sufficient for the group-testing problem, and provides algorithms that achieve this performance. However, it is only recently that reconstruction algorithms with computational complexity that is sub-linear in

N

have started being investigated (recent work by \cite{GurI:04,IndN:10, NgoP:11} gave some of the first such algorithms). In the scenario with adaptive tests with noisy outcomes, we present the first scheme that is simultaneously order-optimal (up to small constant factors) in both the number of tests and the decoding complexity (\cO{D\log(N)} in both the performance metrics). The total number of stages of our adaptive algorithm is "small" (\cO{\log(D)}). Similarly, in the scenario with non-adaptive tests with noisy outcomes, we present the first scheme that is simultaneously near-optimal in both the number of tests and the decoding complexity (via an algorithm that requires \cO{D\log(D)\log(N)} tests and has a decoding complexity of {

{\cal O}(D(\log N+\log^{2}D))

}. Finally, we present an adaptive algorithm that only requires 2 stages, and for which both the number of tests and the decoding complexity scale as {

{\cal O}(D(\log N+\log^{2}D))

}. For all three settings the probability of error of our algorithms scales as \cO{1/(poly(D)}.Comment: 26 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Explore Bristol Research

Detecting Activations over Graphs using Spanning Tree Wavelet Bases

Author: Krishnamurthy Akshay
Sharpnack James
Singh Aarti
Publication venue
Publication date: 12/07/2012
Field of study

We consider the detection of activations over graphs under Gaussian noise, where signals are piece-wise constant over the graph. Despite the wide applicability of such a detection algorithm, there has been little success in the development of computationally feasible methods with proveable theoretical guarantees for general graph topologies. We cast this as a hypothesis testing problem, and first provide a universal necessary condition for asymptotic distinguishability of the null and alternative hypotheses. We then introduce the spanning tree wavelet basis over graphs, a localized basis that reflects the topology of the graph, and prove that for any spanning tree, this approach can distinguish null from alternative in a low signal-to-noise regime. Lastly, we improve on this result and show that using the uniform spanning tree in the basis construction yields a randomized test with stronger theoretical guarantees that in many cases matches our necessary conditions. Specifically, we obtain near-optimal performance in edge transitive graphs,

k

-nearest neighbor graphs, and

\epsilon

-graphs

arXiv.org e-Print Archive

CiteSeerX

Fast Hierarchical Clustering and Other Applications of Dynamic Closest Pairs

Author: BENTLEY J.L.
BUCHBERGER B.
David Eppstein
DURAN B. S.
GOTOH O.
MATIAS Y.
SUPOWIT K.J.
YIANILOS P.N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/1998
Field of study

We develop data structures for dynamic closest pair problems with arbitrary distance functions, that do not necessarily come from any geometric structure on the objects. Based on a technique previously used by the author for Euclidean closest pairs, we show how to insert and delete objects from an n-object set, maintaining the closest pair, in O(n log^2 n) time per update and O(n) space. With quadratic space, we can instead use a quadtree-like structure to achieve an optimal time bound, O(n) per update. We apply these data structures to hierarchical clustering, greedy matching, and TSP heuristics, and discuss other potential applications in machine learning, Groebner bases, and local improvement algorithms for partition and placement problems. Experiments show our new methods to be faster in practice than previously used heuristics.Comment: 20 pages, 9 figures. A preliminary version of this paper appeared at the 9th ACM-SIAM Symp. on Discrete Algorithms, San Francisco, 1998, pp. 619-628. For source code and experimental results, see http://www.ics.uci.edu/~eppstein/projects/pairs

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

A Haystack Heuristic for Autoimmune Disease Biomarker Discovery Using Next-Gen Immune Repertoire Sequencing Data.

Author: Apeltsin Leonard
Sirota Marina
von Büdingen H-Christian
Wang Shengzhi
Publication venue: eScholarship, University of California
Publication date: 01/07/2017
Field of study

Large-scale DNA sequencing of immunological repertoires offers an opportunity for the discovery of novel biomarkers for autoimmune disease. Available bioinformatics techniques however, are not adequately suited for elucidating possible biomarker candidates from within large immunosequencing datasets due to unsatisfactory scalability and sensitivity. Here, we present the Haystack Heuristic, an algorithm customized to computationally extract disease-associated motifs from next-generation-sequenced repertoires by contrasting disease and healthy subjects. This technique employs a local-search graph-theory approach to discover novel motifs in patient data. We apply the Haystack Heuristic to nine million B-cell receptor sequences obtained from nearly 100 individuals in order to elucidate a new motif that is significantly associated with multiple sclerosis. Our results demonstrate the effectiveness of the Haystack Heuristic in computing possible biomarker candidates from high throughput sequencing data and could be generalized to other datasets

eScholarship - University of California