Search CORE

52,081 research outputs found

Nearly Optimal Sparse Group Testing

Author: Gandikota Venkata
Grigorescu Elena
Jaggi Sidharth
Zhou Samson
Publication venue
Publication date: 19/09/2018
Field of study

Group testing is the process of pooling arbitrary subsets from a set of

n

items so as to identify, with a minimal number of tests, a "small" subset of

d

defective items. In "classical" non-adaptive group testing, it is known that when

d

is substantially smaller than

n

\Theta(d\log(n))

tests are both information-theoretically necessary and sufficient to guarantee recovery with high probability. Group testing schemes in the literature meeting this bound require most items to be tested

\Omega(\log(n))

times, and most tests to incorporate

\Omega(n/d)

items. Motivated by physical considerations, we study group testing models in which the testing procedure is constrained to be "sparse". Specifically, we consider (separately) scenarios in which (a) items are finitely divisible and hence may participate in at most

\gamma \in o(\log(n))

tests; or (b) tests are size-constrained to pool no more than

\rho \in o(n/d)

items per test. For both scenarios we provide information-theoretic lower bounds on the number of tests required to guarantee high probability recovery. In both scenarios we provide both randomized constructions (under both

\epsilon

-error and zero-error reconstruction guarantees) and explicit constructions of designs with computationally efficient reconstruction algorithms that require a number of tests that are optimal up to constant or small polynomial factors in some regimes of

n, d, \gamma,

and

\rho

. The randomized design/reconstruction algorithm in the

\rho

-sized test scenario is universal -- independent of the value of

d

, as long as

\rho \in o(n/d)

. We also investigate the effect of unreliability/noise in test outcomes. For the full abstract, please see the full text PDF

arXiv.org e-Print Archive

Explore Bristol Research

Recommended from our members

On optimal designs for clinical trials: An updated review

Author: Ryeznik Yevgen
Sverdlov Oleksandr
Wong Weng Kee
Publication venue: eScholarship, University of California
Publication date: 09/12/2019
Field of study

Optimization of clinical trial designs can help investigators achieve higher qualityresults for the given resource constraints. The present paper gives an overviewof optimal designs for various important problems that arise in different stages ofclinical drug development, including phase I dose–toxicity studies; phase I/II studiesthat consider early efficacy and toxicity outcomes simultaneously; phase IIdose–response studies driven by multiple comparisons (MCP), modeling techniques(Mod), or their combination (MCP–Mod); phase III randomized controlled multiarmmulti-objective clinical trials to test difference among several treatment groups;and population pharmacokinetics–pharmacodynamics experiments. We find thatmodern literature is very rich with optimal design methodologies that can be utilizedby clinical researchers to improve efficiency of drug development

eScholarship - University of California

Blind Multiclass Ensemble Classification

Author: Giannakis Georgios B.
Pagès-Zamora Alba
Traganitis Panagiotis A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

The rising interest in pattern recognition and data analytics has spurred the development of innovative machine learning algorithms and tools. However, as each algorithm has its strengths and limitations, one is motivated to judiciously fuse multiple algorithms in order to find the "best" performing one, for a given dataset. Ensemble learning aims at such high-performance meta-algorithm, by combining the outputs from multiple algorithms. The present work introduces a blind scheme for learning from ensembles of classifiers, using a moment matching method that leverages joint tensor and matrix factorization. Blind refers to the combiner who has no knowledge of the ground-truth labels that each classifier has been trained on. A rigorous performance analysis is derived and the proposed scheme is evaluated on synthetic and real datasets.Comment: To appear in IEEE Transactions in Signal Processin

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Constraining the Number of Positive Responses in Adaptive, Non-Adaptive, and Two-Stage Group Testing

Author: De Bonis Annalisa
Publication venue
Publication date: 01/01/2016
Field of study

Group testing is a well known search problem that consists in detecting the defective members of a set of objects O by performing tests on properly chosen subsets (pools) of the given set O. In classical group testing the goal is to find all defectives by using as few tests as possible. We consider a variant of classical group testing in which one is concerned not only with minimizing the total number of tests but aims also at reducing the number of tests involving defective elements. The rationale behind this search model is that in many practical applications the devices used for the tests are subject to deterioration due to exposure to or interaction with the defective elements. In this paper we consider adaptive, non-adaptive and two-stage group testing. For all three considered scenarios, we derive upper and lower bounds on the number of "yes" responses that must be admitted by any strategy performing at most a certain number t of tests. In particular, for the adaptive case we provide an algorithm that uses a number of "yes" responses that exceeds the given lower bound by a small constant. Interestingly, this bound can be asymptotically attained also by our two-stage algorithm, which is a phenomenon analogous to the one occurring in classical group testing. For the non-adaptive scenario we give almost matching upper and lower bounds on the number of "yes" responses. In particular, we give two constructions both achieving the same asymptotic bound. An interesting feature of one of these constructions is that it is an explicit construction. The bounds for the non-adaptive and the two-stage cases follow from the bounds on the optimal sizes of new variants of d-cover free families and (p,d)-cover free families introduced in this paper, which we believe may be of interest also in other contexts

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Salerno

How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition

Author: Anderson-Cook Christine M.
Fugate Michael L.
Lu Lu
Myers Kary L.
Pawley Norma
Quinlan Kevin R.
Publication venue: 'Wiley'
Publication date: 01/01/2019
Field of study

Data competitions rely on real-time leaderboards to rank competitor entries and stimulate algorithm improvement. While such competitions have become quite popular and prevalent, particularly in supervised learning formats, their implementations by the host are highly variable. Without careful planning, a supervised learning competition is vulnerable to overfitting, where the winning solutions are so closely tuned to the particular set of provided data that they cannot generalize to the underlying problem of interest to the host. This paper outlines some important considerations for strategically designing relevant and informative data sets to maximize the learning outcome from hosting a competition based on our experience. It also describes a post-competition analysis that enables robust and efficient assessment of the strengths and weaknesses of solutions from different competitors, as well as greater understanding of the regions of the input space that are well-solved. The post-competition analysis, which complements the leaderboard, uses exploratory data analysis and generalized linear models (GLMs). The GLMs not only expand the range of results we can explore, they also provide more detailed analysis of individual sub-questions including similarities and differences between algorithms across different types of scenarios, universally easy or hard regions of the input space, and different learning objectives. When coupled with a strategically planned data generation approach, the methods provide richer and more informative summaries to enhance the interpretation of results beyond just the rankings on the leaderboard. The methods are illustrated with a recently completed competition to evaluate algorithms capable of detecting, identifying, and locating radioactive materials in an urban environment.Comment: 36 page

arXiv.org e-Print Archive

USFSP Digital Archive

Scholar Commons - University of South Florida