1,222 research outputs found
Universal and Composite Hypothesis Testing via Mismatched Divergence
For the universal hypothesis testing problem, where the goal is to decide
between the known null hypothesis distribution and some other unknown
distribution, Hoeffding proposed a universal test in the nineteen sixties.
Hoeffding's universal test statistic can be written in terms of
Kullback-Leibler (K-L) divergence between the empirical distribution of the
observations and the null hypothesis distribution. In this paper a modification
of Hoeffding's test is considered based on a relaxation of the K-L divergence
test statistic, referred to as the mismatched divergence. The resulting
mismatched test is shown to be a generalized likelihood-ratio test (GLRT) for
the case where the alternate distribution lies in a parametric family of the
distributions characterized by a finite dimensional parameter, i.e., it is a
solution to the corresponding composite hypothesis testing problem. For certain
choices of the alternate distribution, it is shown that both the Hoeffding test
and the mismatched test have the same asymptotic performance in terms of error
exponents. A consequence of this result is that the GLRT is optimal in
differentiating a particular distribution from others in an exponential family.
It is also shown that the mismatched test has a significant advantage over the
Hoeffding test in terms of finite sample size performance. This advantage is
due to the difference in the asymptotic variances of the two test statistics
under the null hypothesis. In particular, the variance of the K-L divergence
grows linearly with the alphabet size, making the test impractical for
applications involving large alphabet distributions. The variance of the
mismatched divergence on the other hand grows linearly with the dimension of
the parameter space, and can hence be controlled through a prudent choice of
the function class defining the mismatched divergence.Comment: Accepted to IEEE Transactions on Information Theory, July 201
Feature Extraction for Universal Hypothesis Testing via Rank-constrained Optimization
This paper concerns the construction of tests for universal hypothesis
testing problems, in which the alternate hypothesis is poorly modeled and the
observation space is large. The mismatched universal test is a feature-based
technique for this purpose. In prior work it is shown that its
finite-observation performance can be much better than the (optimal) Hoeffding
test, and good performance depends crucially on the choice of features. The
contributions of this paper include: 1) We obtain bounds on the number of
\epsilon distinguishable distributions in an exponential family. 2) This
motivates a new framework for feature extraction, cast as a rank-constrained
optimization problem. 3) We obtain a gradient-based algorithm to solve the
rank-constrained optimization problem and prove its local convergence.Comment: 5 pages, 4 figures, submitted to ISIT 201
An Improved Composite Hypothesis Test for Markov Models with Applications in Network Anomaly Detection
Recent work has proposed the use of a composite hypothesis Hoeffding test for
statistical anomaly detection. Setting an appropriate threshold for the test
given a desired false alarm probability involves approximating the false alarm
probability. To that end, a large deviations asymptotic is typically used
which, however, often results in an inaccurate setting of the threshold,
especially for relatively small sample sizes. This, in turn, results in an
anomaly detection test that does not control well for false alarms. In this
paper, we develop a tighter approximation using the Central Limit Theorem (CLT)
under Markovian assumptions. We apply our result to a network anomaly detection
application and demonstrate its advantages over earlier work.Comment: 6 pages, 6 figures; final version for CDC 201
Generalized Error Exponents For Small Sample Universal Hypothesis Testing
The small sample universal hypothesis testing problem is investigated in this
paper, in which the number of samples is smaller than the number of
possible outcomes . The goal of this work is to find an appropriate
criterion to analyze statistical tests in this setting. A suitable model for
analysis is the high-dimensional model in which both and increase to
infinity, and . A new performance criterion based on large deviations
analysis is proposed and it generalizes the classical error exponent applicable
for large sample problems (in which ). This generalized error exponent
criterion provides insights that are not available from asymptotic consistency
or central limit theorem analysis. The following results are established for
the uniform null distribution:
(i) The best achievable probability of error decays as
for some .
(ii) A class of tests based on separable statistics, including the
coincidence-based test, attains the optimal generalized error exponents.
(iii) Pearson's chi-square test has a zero generalized error exponent and
thus its probability of error is asymptotically larger than the optimal test.Comment: 43 pages, 4 figure
- …