138,966 research outputs found
Committee-Based Sample Selection for Probabilistic Classifiers
In many real-world learning tasks, it is expensive to acquire a sufficient
number of labeled examples for training. This paper investigates methods for
reducing annotation cost by `sample selection'. In this approach, during
training the learning program examines many unlabeled examples and selects for
labeling only those that are most informative at each stage. This avoids
redundantly labeling examples that contribute little new information. Our work
follows on previous research on Query By Committee, extending the
committee-based paradigm to the context of probabilistic classification. We
describe a family of empirical methods for committee-based sample selection in
probabilistic classification models, which evaluate the informativeness of an
example by measuring the degree of disagreement between several model variants.
These variants (the committee) are drawn randomly from a probability
distribution conditioned by the training set labeled so far. The method was
applied to the real-world natural language processing task of stochastic
part-of-speech tagging. We find that all variants of the method achieve a
significant reduction in annotation cost, although their computational
efficiency differs. In particular, the simplest variant, a two member committee
with no parameters to tune, gives excellent results. We also show that sample
selection yields a significant reduction in the size of the model used by the
tagger
Pranab Kumar Sen: Life and works
In this article, we describe briefly the highlights and various
accomplishments in the personal as well as the academic life of Professor
Pranab Kumar Sen.Comment: Published in at http://dx.doi.org/10.1214/193940307000000013 the IMS
Collections (http://www.imstat.org/publications/imscollections.htm) by the
Institute of Mathematical Statistics (http://www.imstat.org
Universal lossless source coding with the Burrows Wheeler transform
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n â â, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory source
Shrinkage Estimators in Online Experiments
We develop and analyze empirical Bayes Stein-type estimators for use in the
estimation of causal effects in large-scale online experiments. While online
experiments are generally thought to be distinguished by their large sample
size, we focus on the multiplicity of treatment groups. The typical analysis
practice is to use simple differences-in-means (perhaps with covariate
adjustment) as if all treatment arms were independent. In this work we develop
consistent, small bias, shrinkage estimators for this setting. In addition to
achieving lower mean squared error these estimators retain important
frequentist properties such as coverage under most reasonable scenarios. Modern
sequential methods of experimentation and optimization such as multi-armed
bandit optimization (where treatment allocations adapt over time to prior
responses) benefit from the use of our shrinkage estimators. Exploration under
empirical Bayes focuses more efficiently on near-optimal arms, improving the
resulting decisions made under uncertainty. We demonstrate these properties by
examining seventeen large-scale experiments conducted on Facebook from April to
June 2017
- âŠ