755,411 research outputs found
Distribution of Mutual Information
The mutual information of two random variables i and j with joint
probabilities t_ij is commonly used in learning Bayesian nets as well as in
many other fields. The chances t_ij are usually estimated by the empirical
sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual
information. To answer questions like "is I(n_ij/n) consistent with zero?" or
"what is the probability that the true mutual information is much larger than
the point estimate?" one has to go beyond the point estimate. In the Bayesian
framework one can answer these questions by utilizing a (second order) prior
distribution p(t) comprising prior information about t. From the prior p(t) one
can compute the posterior p(t|n), from which the distribution p(I|n) of the
mutual information can be calculated. We derive reliable and quickly computable
approximations for p(I|n). We concentrate on the mean, variance, skewness, and
kurtosis, and non-informative priors. For the mean we also give an exact
expression. Numerical issues and the range of validity are discussed.Comment: 8 page
Distribution of Mutual Information from Complete and Incomplete Data
Mutual information is widely used, in a descriptive way, to measure the
stochastic dependence of categorical random variables. In order to address
questions such as the reliability of the descriptive value, one must consider
sample-to-population inferential approaches. This paper deals with the
posterior distribution of mutual information, as obtained in a Bayesian
framework by a second-order Dirichlet prior distribution. The exact analytical
expression for the mean, and analytical approximations for the variance,
skewness and kurtosis are derived. These approximations have a guaranteed
accuracy level of the order O(1/n^3), where n is the sample size. Leading order
approximations for the mean and the variance are derived in the case of
incomplete samples. The derived analytical expressions allow the distribution
of mutual information to be approximated reliably and quickly. In fact, the
derived expressions can be computed with the same order of complexity needed
for descriptive mutual information. This makes the distribution of mutual
information become a concrete alternative to descriptive mutual information in
many applications which would benefit from moving to the inductive side. Some
of these prospective applications are discussed, and one of them, namely
feature selection, is shown to perform significantly better when inductive
mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
Living at the Edge: A Large Deviations Approach to the Outage MIMO Capacity
Using a large deviations approach we calculate the probability distribution
of the mutual information of MIMO channels in the limit of large antenna
numbers. In contrast to previous methods that only focused at the distribution
close to its mean (thus obtaining an asymptotically Gaussian distribution), we
calculate the full distribution, including its tails which strongly deviate
from the Gaussian behavior near the mean. The resulting distribution
interpolates seamlessly between the Gaussian approximation for rates close
to the ergodic value of the mutual information and the approach of Zheng and
Tse for large signal to noise ratios . This calculation provides us with
a tool to obtain outage probabilities analytically at any point in the parameter space, as long as the number of antennas is not too
small. In addition, this method also yields the probability distribution of
eigenvalues constrained in the subspace where the mutual information per
antenna is fixed to for a given . Quite remarkably, this eigenvalue
density is of the form of the Marcenko-Pastur distribution with square-root
singularities, and it depends on the values of and .Comment: Accepted for publication, IEEE Transactions on Information Theory
(2010). Part of this work appears in the Proc. IEEE Information Theory
Workshop, June 2009, Volos, Greec
Information In The Non-Stationary Case
Information estimates such as the ``direct method'' of Strong et al. (1998)
sidestep the difficult problem of estimating the joint distribution of response
and stimulus by instead estimating the difference between the marginal and
conditional entropies of the response. While this is an effective estimation
strategy, it tempts the practitioner to ignore the role of the stimulus and the
meaning of mutual information. We show here that, as the number of trials
increases indefinitely, the direct (or ``plug-in'') estimate of marginal
entropy converges (with probability 1) to the entropy of the time-averaged
conditional distribution of the response, and the direct estimate of the
conditional entropy converges to the time-averaged entropy of the conditional
distribution of the response. Under joint stationarity and ergodicity of the
response and stimulus, the difference of these quantities converges to the
mutual information. When the stimulus is deterministic or non-stationary the
direct estimate of information no longer estimates mutual information, which is
no longer meaningful, but it remains a measure of variability of the response
distribution across time
Robust Feature Selection by Mutual Information Distributions
Mutual information is widely used in artificial intelligence, in a
descriptive way, to measure the stochastic dependence of discrete random
variables. In order to address questions such as the reliability of the
empirical value, one must consider sample-to-population inferential approaches.
This paper deals with the distribution of mutual information, as obtained in a
Bayesian framework by a second-order Dirichlet prior distribution. The exact
analytical expression for the mean and an analytical approximation of the
variance are reported. Asymptotic approximations of the distribution are
proposed. The results are applied to the problem of selecting features for
incremental learning and classification of the naive Bayes classifier. A fast,
newly defined method is shown to outperform the traditional approach based on
empirical mutual information on a number of real data sets. Finally, a
theoretical development is reported that allows one to efficiently extend the
above methods to incomplete samples in an easy and effective way.Comment: 8 two-column page
- …