13,578 research outputs found
Two knowledge-based methods for High-Performance Sense Distribution Learning
Knowing the correct distribution of senses within a corpus can potentially boost the performance of Word Sense Disambiguation (WSD) systems by many points. We present two fully automatic and language-independent methods for computing the distribution of senses given a raw corpus of sentences. Intrinsic and extrinsic evaluations show that our methods outperform the current state of the art in sense distribution learning and the strongest baselines for the most frequent sense in multiple languages and on domain-specific test sets. Our sense distributions are available at http://trainomatic.org
Measuring complexity with zippers
Physics concepts have often been borrowed and independently developed by
other fields of science. In this perspective a significant example is that of
entropy in Information Theory. The aim of this paper is to provide a short and
pedagogical introduction to the use of data compression techniques for the
estimate of entropy and other relevant quantities in Information Theory and
Algorithmic Information Theory. We consider in particular the LZ77 algorithm as
case study and discuss how a zipper can be used for information extraction.Comment: 10 pages, 3 figure
Robust Estimators under the Imprecise Dirichlet Model
Walley's Imprecise Dirichlet Model (IDM) for categorical data overcomes
several fundamental problems which other approaches to uncertainty suffer from.
Yet, to be useful in practice, one needs efficient ways for computing the
imprecise=robust sets or intervals. The main objective of this work is to
derive exact, conservative, and approximate, robust and credible interval
estimates under the IDM for a large class of statistical estimators, including
the entropy and mutual information.Comment: 16 LaTeX page
Quantum query complexity of entropy estimation
Estimation of Shannon and R\'enyi entropies of unknown discrete distributions
is a fundamental problem in statistical property testing and an active research
topic in both theoretical computer science and information theory. Tight bounds
on the number of samples to estimate these entropies have been established in
the classical setting, while little is known about their quantum counterparts.
In this paper, we give the first quantum algorithms for estimating
-R\'enyi entropies (Shannon entropy being 1-Renyi entropy). In
particular, we demonstrate a quadratic quantum speedup for Shannon entropy
estimation and a generic quantum speedup for -R\'enyi entropy
estimation for all , including a tight bound for the
collision-entropy (2-R\'enyi entropy). We also provide quantum upper bounds for
extreme cases such as the Hartley entropy (i.e., the logarithm of the support
size of a distribution, corresponding to ) and the min-entropy case
(i.e., ), as well as the Kullback-Leibler divergence between
two distributions. Moreover, we complement our results with quantum lower
bounds on -R\'enyi entropy estimation for all .Comment: 43 pages, 1 figur
Distributional Property Testing in a Quantum World
A fundamental problem in statistics and learning theory is to test properties of distributions. We show that quantum computers can solve such problems with significant speed-ups. We also introduce a novel access model for quantum distributions, enabling the coherent preparation of quantum samples, and propose a general framework that can naturally handle both classical and quantum distributions in a unified manner. Our framework generalizes and improves previous quantum algorithms for testing closeness between unknown distributions, testing independence between two distributions, and estimating the Shannon / von Neumann entropy of distributions. For classical distributions our algorithms significantly improve the precision dependence of some earlier results. We also show that in our framework procedures for classical distributions can be directly lifted to the more general case of quantum distributions, and thus obtain the first speed-ups for testing properties of density operators that can be accessed coherently rather than only via sampling
- …