2 research outputs found
On the Sample Complexity of HGR Maximal Correlation Functions for Large Datasets
The Hirschfeld-Gebelein-R\'{e}nyi (HGR) maximal correlation and the
corresponding functions have been shown useful in many machine learning
scenarios. In this paper, we study the sample complexity of estimating the HGR
maximal correlation functions by the alternating conditional expectation (ACE)
algorithm using training samples from large datasets. Specifically, we develop
a mathematical framework to characterize the learning errors between the
maximal correlation functions computed from the true distribution, and the
functions estimated from the ACE algorithm. For both supervised and
semi-supervised learning scenarios, we establish the analytical expressions for
the error exponents of the learning errors. Furthermore, we demonstrate that
for large datasets, the upper bounds for the sample complexity of learning the
HGR maximal correlation functions by the ACE algorithm can be expressed using
the established error exponents. Moreover, with our theoretical results, we
investigate the sampling strategy for different types of samples in
semi-supervised learning with a total sampling budget constraint, and an
optimal sampling strategy is developed to maximize the error exponent of the
learning error. Finally, the numerical simulations are presented to support our
theoretical results.Comment: Submitted to IEEE Transactions on Information Theor
On Universal Features for High-Dimensional Learning and Inference
We consider the problem of identifying universal low-dimensional features
from high-dimensional data for inference tasks in settings involving learning.
For such problems, we introduce natural notions of universality and we show a
local equivalence among them. Our analysis is naturally expressed via
information geometry, and represents a conceptually and computationally useful
analysis. The development reveals the complementary roles of the singular value
decomposition, Hirschfeld-Gebelein-R\'enyi maximal correlation, the canonical
correlation and principle component analyses of Hotelling and Pearson, Tishby's
information bottleneck, Wyner's common information, Ky Fan -norms, and
Brieman and Friedman's alternating conditional expectations algorithm. We
further illustrate how this framework facilitates understanding and optimizing
aspects of learning systems, including multinomial logistic (softmax)
regression and the associated neural network architecture, matrix factorization
methods for collaborative filtering and other applications, rank-constrained
multivariate linear regression, and forms of semi-supervised learning