7 research outputs found
Distribution of Mutual Information
The mutual information of two random variables i and j with joint
probabilities t_ij is commonly used in learning Bayesian nets as well as in
many other fields. The chances t_ij are usually estimated by the empirical
sampling frequency n_ij/n leading to a point estimate I(n_ij/n) for the mutual
information. To answer questions like "is I(n_ij/n) consistent with zero?" or
"what is the probability that the true mutual information is much larger than
the point estimate?" one has to go beyond the point estimate. In the Bayesian
framework one can answer these questions by utilizing a (second order) prior
distribution p(t) comprising prior information about t. From the prior p(t) one
can compute the posterior p(t|n), from which the distribution p(I|n) of the
mutual information can be calculated. We derive reliable and quickly computable
approximations for p(I|n). We concentrate on the mean, variance, skewness, and
kurtosis, and non-informative priors. For the mean we also give an exact
expression. Numerical issues and the range of validity are discussed.Comment: 8 page
Robust Feature Selection by Mutual Information Distributions
Mutual information is widely used in artificial intelligence, in a
descriptive way, to measure the stochastic dependence of discrete random
variables. In order to address questions such as the reliability of the
empirical value, one must consider sample-to-population inferential approaches.
This paper deals with the distribution of mutual information, as obtained in a
Bayesian framework by a second-order Dirichlet prior distribution. The exact
analytical expression for the mean and an analytical approximation of the
variance are reported. Asymptotic approximations of the distribution are
proposed. The results are applied to the problem of selecting features for
incremental learning and classification of the naive Bayes classifier. A fast,
newly defined method is shown to outperform the traditional approach based on
empirical mutual information on a number of real data sets. Finally, a
theoretical development is reported that allows one to efficiently extend the
above methods to incomplete samples in an easy and effective way.Comment: 8 two-column page
Robust Inference of Trees
This paper is concerned with the reliable inference of optimal
tree-approximations to the dependency structure of an unknown distribution
generating data. The traditional approach to the problem measures the
dependency strength between random variables by the index called mutual
information. In this paper reliability is achieved by Walley's imprecise
Dirichlet model, which generalizes Bayesian learning with Dirichlet priors.
Adopting the imprecise Dirichlet model results in posterior interval
expectation for mutual information, and in a set of plausible trees consistent
with the data. Reliable inference about the actual tree is achieved by focusing
on the substructure common to all the plausible trees. We develop an exact
algorithm that infers the substructure in time O(m^4), m being the number of
random variables. The new algorithm is applied to a set of data sampled from a
known distribution. The method is shown to reliably infer edges of the actual
tree even when the data are very scarce, unlike the traditional approach.
Finally, we provide lower and upper credibility limits for mutual information
under the imprecise Dirichlet model. These enable the previous developments to
be extended to a full inferential method for trees.Comment: 26 pages, 7 figure
Distribution of Mutual Information from Complete and Incomplete Data
Mutual information is widely used, in a descriptive way, to measure the
stochastic dependence of categorical random variables. In order to address
questions such as the reliability of the descriptive value, one must consider
sample-to-population inferential approaches. This paper deals with the
posterior distribution of mutual information, as obtained in a Bayesian
framework by a second-order Dirichlet prior distribution. The exact analytical
expression for the mean, and analytical approximations for the variance,
skewness and kurtosis are derived. These approximations have a guaranteed
accuracy level of the order O(1/n^3), where n is the sample size. Leading order
approximations for the mean and the variance are derived in the case of
incomplete samples. The derived analytical expressions allow the distribution
of mutual information to be approximated reliably and quickly. In fact, the
derived expressions can be computed with the same order of complexity needed
for descriptive mutual information. This makes the distribution of mutual
information become a concrete alternative to descriptive mutual information in
many applications which would benefit from moving to the inductive side. Some
of these prospective applications are discussed, and one of them, namely
feature selection, is shown to perform significantly better when inductive
mutual information is used.Comment: 26 pages, LaTeX, 5 figures, 4 table
Dependency structure matrix, genetic algorithms, and effective recombination
In many different fields, researchers are often confronted by problems arising from complex systems. Simple heuristics or even enumeration works quite well on small and easy problems; however, to efficiently solve large and difficult problems, proper decomposition is the key. In this paper, investigating and analyzing interactions between components of complex systems shed some light on problem decomposition. By recognizing three bare-bones interactions-modularity, hierarchy, and overlap, facet-wise models arc developed to dissect and inspect problem decomposition in the context of genetic algorithms. The proposed genetic algorithm design utilizes a matrix representation of an interaction graph to analyze and explicitly decompose the problem. The results from this paper should benefit research both technically and scientifically. Technically, this paper develops an automated dependency structure matrix clustering technique and utilizes it to design a model-building genetic algorithm that learns and delivers the problem structure. Scientifically, the explicit interaction model describes the problem structure very well and helps researchers gain important insights through the explicitness of the procedure.This work was sponsored by Taiwan National Science Council under grant NSC97-
2218-E-002-020-MY3, U.S. Air Force Office of Scientific Research, Air Force Material
Command, USAF, under grants FA9550-06-1-0370 and FA9550-06-1-0096, U.S. National
Science Foundation under CAREER grant ECS-0547013, ITR grant DMR-03-25939 at
Materials Computation Center, grant ISS-02-09199 at US National Center for Supercomputing Applications, UIUC, and the Portuguese Foundation for Science and Technology
under grants SFRH/BD/16980/2004 and PTDC/EIA/67776/2006