46,810 research outputs found
Committee-Based Sample Selection for Probabilistic Classifiers
In many real-world learning tasks, it is expensive to acquire a sufficient
number of labeled examples for training. This paper investigates methods for
reducing annotation cost by `sample selection'. In this approach, during
training the learning program examines many unlabeled examples and selects for
labeling only those that are most informative at each stage. This avoids
redundantly labeling examples that contribute little new information. Our work
follows on previous research on Query By Committee, extending the
committee-based paradigm to the context of probabilistic classification. We
describe a family of empirical methods for committee-based sample selection in
probabilistic classification models, which evaluate the informativeness of an
example by measuring the degree of disagreement between several model variants.
These variants (the committee) are drawn randomly from a probability
distribution conditioned by the training set labeled so far. The method was
applied to the real-world natural language processing task of stochastic
part-of-speech tagging. We find that all variants of the method achieve a
significant reduction in annotation cost, although their computational
efficiency differs. In particular, the simplest variant, a two member committee
with no parameters to tune, gives excellent results. We also show that sample
selection yields a significant reduction in the size of the model used by the
tagger
Session 4: Evolutionary Indeterminism
Proceedings of the Pittsburgh Workshop in History and Philosophy of Biology, Center for Philosophy of Science, University of Pittsburgh, March 23-24 2001 Session 4: Evolutionary Indeterminis
Group Testing with Probabilistic Tests: Theory, Design and Application
Identification of defective members of large populations has been widely
studied in the statistics community under the name of group testing. It
involves grouping subsets of items into different pools and detecting defective
members based on the set of test results obtained for each pool.
In a classical noiseless group testing setup, it is assumed that the sampling
procedure is fully known to the reconstruction algorithm, in the sense that the
existence of a defective member in a pool results in the test outcome of that
pool to be positive. However, this may not be always a valid assumption in some
cases of interest. In particular, we consider the case where the defective
items in a pool can become independently inactive with a certain probability.
Hence, one may obtain a negative test result in a pool despite containing some
defective items. As a result, any sampling and reconstruction method should be
able to cope with two different types of uncertainty, i.e., the unknown set of
defective items and the partially unknown, probabilistic testing procedure.
In this work, motivated by the application of detecting infected people in
viral epidemics, we design non-adaptive sampling procedures that allow
successful identification of the defective items through a set of probabilistic
tests. Our design requires only a small number of tests to single out the
defective items. In particular, for a population of size and at most
defective items with activation probability , our results show that tests is sufficient if the sampling procedure should
work for all possible sets of defective items, while
tests is enough to be successful for any single set of defective items.
Moreover, we show that the defective members can be recovered using a simple
reconstruction algorithm with complexity of .Comment: Full version of the conference paper "Compressed Sensing with
Probabilistic Measurements: A Group Testing Solution" appearing in
proceedings of the 47th Annual Allerton Conference on Communication, Control,
and Computing, 2009 (arXiv:0909.3508). To appear in IEEE Transactions on
Information Theor
Probabilistic learning for selective dissemination of information
New methods and new systems are needed to filter or to selectively distribute the increasing volume of electronic information being produced nowadays. An effective information filtering system is one that provides the exact information that fulfills user's interests with the minimum effort by the user to describe it. Such a system will have to be adaptive to the user changing interest. In this paper we describe and evaluate a learning model for information filtering which is an adaptation of the generalized probabilistic model of information retrieval. The model is based on the concept of 'uncertainty sampling', a technique that allows for relevance feedback both on relevant and nonrelevant documents. The proposed learning model is the core of a prototype information filtering system called ProFile
Parallel Implementation of Efficient Search Schemes for the Inference of Cancer Progression Models
The emergence and development of cancer is a consequence of the accumulation
over time of genomic mutations involving a specific set of genes, which
provides the cancer clones with a functional selective advantage. In this work,
we model the order of accumulation of such mutations during the progression,
which eventually leads to the disease, by means of probabilistic graphic
models, i.e., Bayesian Networks (BNs). We investigate how to perform the task
of learning the structure of such BNs, according to experimental evidence,
adopting a global optimization meta-heuristics. In particular, in this work we
rely on Genetic Algorithms, and to strongly reduce the execution time of the
inference -- which can also involve multiple repetitions to collect
statistically significant assessments of the data -- we distribute the
calculations using both multi-threading and a multi-node architecture. The
results show that our approach is characterized by good accuracy and
specificity; we also demonstrate its feasibility, thanks to a 84x reduction of
the overall execution time with respect to a traditional sequential
implementation
- …