16,224 research outputs found
Quantum Chi-Squared and Goodness of Fit Testing
The density matrix in quantum mechanics parameterizes the statistical
properties of the system under observation, just like a classical probability
distribution does for classical systems. The expectation value of observables
cannot be measured directly, it can only be approximated by applying classical
statistical methods to the frequencies by which certain measurement outcomes
(clicks) are obtained. In this paper, we make a detailed study of the
statistical fluctuations obtained during an experiment in which a hypothesis is
tested, i.e. the hypothesis that a certain setup produces a given quantum
state. Although the classical and quantum problem are very much related to each
other, the quantum problem is much richer due to the additional optimization
over the measurement basis. Just as in the case of classical hypothesis
testing, the confidence in quantum hypothesis testing scales exponentially in
the number of copies. In this paper, we will argue 1) that the physically
relevant data of quantum experiments is only contained in the frequencies of
the measurement outcomes, and that the statistical fluctuations of the
experiment are essential, so that the correct formulation of the conclusions of
a quantum experiment should be given in terms of hypothesis tests, 2) that the
(classical) test for distinguishing two quantum states gives rise to
the quantum divergence when optimized over the measurement basis, 3)
present a max-min characterization for the optimal measurement basis for
quantum goodness of fit testing, find the quantum measurement which leads both
to the maximal Pitman and Bahadur efficiency, and determine the associated
divergence rates.Comment: 22 Pages, with a new section on parameter estimatio
Toward Optimal Feature Selection in Naive Bayes for Text Categorization
Automated feature selection is important for text categorization to reduce
the feature size and to speed up the learning process of classifiers. In this
paper, we present a novel and efficient feature selection framework based on
the Information Theory, which aims to rank the features with their
discriminative capacity for classification. We first revisit two information
measures: Kullback-Leibler divergence and Jeffreys divergence for binary
hypothesis testing, and analyze their asymptotic properties relating to type I
and type II errors of a Bayesian classifier. We then introduce a new divergence
measure, called Jeffreys-Multi-Hypothesis (JMH) divergence, to measure
multi-distribution divergence for multi-class classification. Based on the
JMH-divergence, we develop two efficient feature selection methods, termed
maximum discrimination () and methods, for text categorization.
The promising results of extensive experiments demonstrate the effectiveness of
the proposed approaches.Comment: This paper has been submitted to the IEEE Trans. Knowledge and Data
Engineering. 14 pages, 5 figure
- …