4,657 research outputs found
Quantum learning: optimal classification of qubit states
Pattern recognition is a central topic in Learning Theory with numerous
applications such as voice and text recognition, image analysis, computer
diagnosis. The statistical set-up in classification is the following: we are
given an i.i.d. training set where
represents a feature and is a label attached to that
feature. The underlying joint distribution of is unknown, but we can
learn about it from the training set and we aim at devising low error
classifiers used to predict the label of new incoming features.
Here we solve a quantum analogue of this problem, namely the classification
of two arbitrary unknown qubit states. Given a number of `training' copies from
each of the states, we would like to `learn' about them by performing a
measurement on the training set. The outcome is then used to design mesurements
for the classification of future systems with unknown labels. We find the
asymptotically optimal classification strategy and show that typically, it
performs strictly better than a plug-in strategy based on state estimation.
The figure of merit is the excess risk which is the difference between the
probability of error and the probability of error of the optimal measurement
when the states are known, that is the Helstrom measurement. We show that the
excess risk has rate and compute the exact constant of the rate.Comment: 24 pages, 4 figure
Asymptotic Bayes-optimality under sparsity of some multiple testing procedures
Within a Bayesian decision theoretic framework we investigate some asymptotic
optimality properties of a large class of multiple testing rules. A parametric
setup is considered, in which observations come from a normal scale mixture
model and the total loss is assumed to be the sum of losses for individual
tests. Our model can be used for testing point null hypotheses, as well as to
distinguish large signals from a multitude of very small effects. A rule is
defined to be asymptotically Bayes optimal under sparsity (ABOS), if within our
chosen asymptotic framework the ratio of its Bayes risk and that of the Bayes
oracle (a rule which minimizes the Bayes risk) converges to one. Our main
interest is in the asymptotic scheme where the proportion p of "true"
alternatives converges to zero. We fully characterize the class of fixed
threshold multiple testing rules which are ABOS, and hence derive conditions
for the asymptotic optimality of rules controlling the Bayesian False Discovery
Rate (BFDR). We finally provide conditions under which the popular
Benjamini-Hochberg (BH) and Bonferroni procedures are ABOS and show that for a
wide class of sparsity levels, the threshold of the former can be approximated
by a nonrandom threshold.Comment: Published in at http://dx.doi.org/10.1214/10-AOS869 the Annals of
Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical
Statistics (http://www.imstat.org
On the Bayes-optimality of F-measure maximizers
The F-measure, which has originally been introduced in information retrieval,
is nowadays routinely used as a performance metric for problems such as binary
classification, multi-label classification, and structured output prediction.
Optimizing this measure is a statistically and computationally challenging
problem, since no closed-form solution exists. Adopting a decision-theoretic
perspective, this article provides a formal and experimental analysis of
different approaches for maximizing the F-measure. We start with a Bayes-risk
analysis of related loss functions, such as Hamming loss and subset zero-one
loss, showing that optimizing such losses as a surrogate of the F-measure leads
to a high worst-case regret. Subsequently, we perform a similar type of
analysis for F-measure maximizing algorithms, showing that such algorithms are
approximate, while relying on additional assumptions regarding the statistical
distribution of the binary response variables. Furthermore, we present a new
algorithm which is not only computationally efficient but also Bayes-optimal,
regardless of the underlying distribution. To this end, the algorithm requires
only a quadratic (with respect to the number of binary responses) number of
parameters of the joint distribution. We illustrate the practical performance
of all analyzed methods by means of experiments with multi-label classification
problems
- …