Regularized estimation of information via canonical correlation analysis on a finite-dimensional feature space


This paper aims to estimate the information between two random phenomena by using consolidated second-order statistics tools. The squared-loss mutual information, a surrogate of the Shannon mutual information, is chosen due to its property of being expressed as a second-order moment. We first review the rationale for i.i.d. discrete sources, which involves mapping the data onto the simplex space, and we highlight the links with other well-known related concepts in the literature based on local approximations of information-theoretic measures. Then, the problem is translated to analog sources by mapping the data onto the characteristic space, focusing on the adaptability between the discrete and the analog case and its limitations. The proposed approach gains interpretability and scalability for its use on large data sets, providing a unified rationale for the free regularization parameters. Moreover, the structure of the proposed mapping allows resorting to Szegö’s theorem to reduce the complexity for high dimensional mappings, exhibiting a strong duality with spectral analysis. The performance of the developed estimators is analyzed using Gaussian mixtures.This work has been supported by the Spanish Ministry of Science and Innovation through project RODIN (PID2019-105717RB- C22/MCIN/AEI/10.13039/501100011033), by the grant 2021 SGR 01033 (AGAUR, Generalitat de Catalunya), and fellowship FI 2019 by the Secretary for University and Research of the Generalitat de Catalunya and the European Social Fund.Peer ReviewedPostprint (author's final draft

    Similar works