Search CORE

22,431 research outputs found

The mutual information. Estimation in the sampling without replacement

Author: Gil Maria Angeles
Gil Pedro
Perez Rigoberto
Publication venue: Institute of Information Theory and Automation AS CR
Publication date: 01/01/1987
Field of study

Independence clustering (without a matrix)

Author: Ryabko Daniil
Publication venue
Publication date: 20/03/2017
Field of study

The independence clustering problem is considered in the following formulation: given a set

S

of random variables, it is required to find the finest partitioning

\{U_1,\dots,U_k\}

S

into clusters such that the clusters

U_1,\dots,U_k

are mutually independent. Since mutual independence is the target, pairwise similarity measurements are of no use, and thus traditional clustering algorithms are inapplicable. The distribution of the random variables in

S

is, in general, unknown, but a sample is available. Thus, the problem is cast in terms of time series. Two forms of sampling are considered: i.i.d.\ and stationary time series, with the main emphasis being on the latter, more general, case. A consistent, computationally tractable algorithm for each of the settings is proposed, and a number of open directions for further research are outlined

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Entropy inference and the James-Stein estimator, with application to nonlinear gene association networks

Author: Hausser Jean
Strimmer Korbinian
Publication venue
Publication date: 01/01/2008
Field of study

We present a procedure for effective estimation of entropy and mutual information from small-sample data, and apply it to the problem of inferring high-dimensional gene association networks. Specifically, we develop a James-Stein-type shrinkage estimator, resulting in a procedure that is highly efficient statistically as well as computationally. Despite its simplicity, we show that it outperforms eight other entropy estimation procedures across a diverse range of sampling scenarios and data-generating models, even in cases of severe undersampling. We illustrate the approach by analyzing E. coli gene expression data and computing an entropy-based gene-association network from gene expression data. A computer program is available that implements the proposed shrinkage estimator.Comment: 18 pages, 3 figures, 1 tabl

arXiv.org e-Print Archive

CiteSeerX

Spiral - Imperial College Digital Repository

Advances in Feature Selection with Mutual Information

Author: A. Kraskov
C. Borggaard
C. Krier
D. François
D. François
D. Scott
F. Rossi
L.F. Kozachenko
M.N. Goria
T. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The selection of features that are relevant for a prediction or classification problem is an important problem in many domains involving high-dimensional data. Selecting features helps fighting the curse of dimensionality, improving the performances of prediction or classification methods, and interpreting the application. In a nonlinear context, the mutual information is widely used as relevance criterion for features and sets of features. Nevertheless, it suffers from at least three major limitations: mutual information estimators depend on smoothing parameters, there is no theoretically justified stopping criterion in the feature selection greedy procedure, and the estimation itself suffers from the curse of dimensionality. This chapter shows how to deal with these problems. The two first ones are addressed by using resampling techniques that provide a statistical basis to select the estimator parameters and to stop the search procedure. The third one is addressed by modifying the mutual information criterion into a measure of how features are complementary (and not only informative) for the problem at hand

arXiv.org e-Print Archive