52,081 research outputs found
Nearly Optimal Sparse Group Testing
Group testing is the process of pooling arbitrary subsets from a set of
items so as to identify, with a minimal number of tests, a "small" subset of
defective items. In "classical" non-adaptive group testing, it is known
that when is substantially smaller than , tests are
both information-theoretically necessary and sufficient to guarantee recovery
with high probability. Group testing schemes in the literature meeting this
bound require most items to be tested times, and most tests
to incorporate items.
Motivated by physical considerations, we study group testing models in which
the testing procedure is constrained to be "sparse". Specifically, we consider
(separately) scenarios in which (a) items are finitely divisible and hence may
participate in at most tests; or (b) tests are
size-constrained to pool no more than items per test. For both
scenarios we provide information-theoretic lower bounds on the number of tests
required to guarantee high probability recovery. In both scenarios we provide
both randomized constructions (under both -error and zero-error
reconstruction guarantees) and explicit constructions of designs with
computationally efficient reconstruction algorithms that require a number of
tests that are optimal up to constant or small polynomial factors in some
regimes of and . The randomized design/reconstruction
algorithm in the -sized test scenario is universal -- independent of the
value of , as long as . We also investigate the effect of
unreliability/noise in test outcomes. For the full abstract, please see the
full text PDF
Recommended from our members
On optimal designs for clinical trials: An updated review
Optimization of clinical trial designs can help investigators achieve higher qualityresults for the given resource constraints. The present paper gives an overviewof optimal designs for various important problems that arise in different stages ofclinical drug development, including phase I doseâtoxicity studies; phase I/II studiesthat consider early efficacy and toxicity outcomes simultaneously; phase IIdoseâresponse studies driven by multiple comparisons (MCP), modeling techniques(Mod), or their combination (MCPâMod); phase III randomized controlled multiarmmulti-objective clinical trials to test difference among several treatment groups;and population pharmacokineticsâpharmacodynamics experiments. We find thatmodern literature is very rich with optimal design methodologies that can be utilizedby clinical researchers to improve efficiency of drug development
Blind Multiclass Ensemble Classification
The rising interest in pattern recognition and data analytics has spurred the
development of innovative machine learning algorithms and tools. However, as
each algorithm has its strengths and limitations, one is motivated to
judiciously fuse multiple algorithms in order to find the "best" performing
one, for a given dataset. Ensemble learning aims at such high-performance
meta-algorithm, by combining the outputs from multiple algorithms. The present
work introduces a blind scheme for learning from ensembles of classifiers,
using a moment matching method that leverages joint tensor and matrix
factorization. Blind refers to the combiner who has no knowledge of the
ground-truth labels that each classifier has been trained on. A rigorous
performance analysis is derived and the proposed scheme is evaluated on
synthetic and real datasets.Comment: To appear in IEEE Transactions in Signal Processin
Constraining the Number of Positive Responses in Adaptive, Non-Adaptive, and Two-Stage Group Testing
Group testing is a well known search problem that consists in detecting the
defective members of a set of objects O by performing tests on properly chosen
subsets (pools) of the given set O. In classical group testing the goal is to
find all defectives by using as few tests as possible. We consider a variant of
classical group testing in which one is concerned not only with minimizing the
total number of tests but aims also at reducing the number of tests involving
defective elements. The rationale behind this search model is that in many
practical applications the devices used for the tests are subject to
deterioration due to exposure to or interaction with the defective elements. In
this paper we consider adaptive, non-adaptive and two-stage group testing. For
all three considered scenarios, we derive upper and lower bounds on the number
of "yes" responses that must be admitted by any strategy performing at most a
certain number t of tests. In particular, for the adaptive case we provide an
algorithm that uses a number of "yes" responses that exceeds the given lower
bound by a small constant. Interestingly, this bound can be asymptotically
attained also by our two-stage algorithm, which is a phenomenon analogous to
the one occurring in classical group testing. For the non-adaptive scenario we
give almost matching upper and lower bounds on the number of "yes" responses.
In particular, we give two constructions both achieving the same asymptotic
bound. An interesting feature of one of these constructions is that it is an
explicit construction. The bounds for the non-adaptive and the two-stage cases
follow from the bounds on the optimal sizes of new variants of d-cover free
families and (p,d)-cover free families introduced in this paper, which we
believe may be of interest also in other contexts
How to Host a Data Competition: Statistical Advice for Design and Analysis of a Data Competition
Data competitions rely on real-time leaderboards to rank competitor entries
and stimulate algorithm improvement. While such competitions have become quite
popular and prevalent, particularly in supervised learning formats, their
implementations by the host are highly variable. Without careful planning, a
supervised learning competition is vulnerable to overfitting, where the winning
solutions are so closely tuned to the particular set of provided data that they
cannot generalize to the underlying problem of interest to the host. This paper
outlines some important considerations for strategically designing relevant and
informative data sets to maximize the learning outcome from hosting a
competition based on our experience. It also describes a post-competition
analysis that enables robust and efficient assessment of the strengths and
weaknesses of solutions from different competitors, as well as greater
understanding of the regions of the input space that are well-solved. The
post-competition analysis, which complements the leaderboard, uses exploratory
data analysis and generalized linear models (GLMs). The GLMs not only expand
the range of results we can explore, they also provide more detailed analysis
of individual sub-questions including similarities and differences between
algorithms across different types of scenarios, universally easy or hard
regions of the input space, and different learning objectives. When coupled
with a strategically planned data generation approach, the methods provide
richer and more informative summaries to enhance the interpretation of results
beyond just the rankings on the leaderboard. The methods are illustrated with a
recently completed competition to evaluate algorithms capable of detecting,
identifying, and locating radioactive materials in an urban environment.Comment: 36 page
- âŠ