488 research outputs found
A Comparative Study of Pairwise Learning Methods based on Kernel Ridge Regression
Many machine learning problems can be formulated as predicting labels for a
pair of objects. Problems of that kind are often referred to as pairwise
learning, dyadic prediction or network inference problems. During the last
decade kernel methods have played a dominant role in pairwise learning. They
still obtain a state-of-the-art predictive performance, but a theoretical
analysis of their behavior has been underexplored in the machine learning
literature.
In this work we review and unify existing kernel-based algorithms that are
commonly used in different pairwise learning settings, ranging from matrix
filtering to zero-shot learning. To this end, we focus on closed-form efficient
instantiations of Kronecker kernel ridge regression. We show that independent
task kernel ridge regression, two-step kernel ridge regression and a linear
matrix filter arise naturally as a special case of Kronecker kernel ridge
regression, implying that all these methods implicitly minimize a squared loss.
In addition, we analyze universality, consistency and spectral filtering
properties. Our theoretical results provide valuable insights in assessing the
advantages and limitations of existing pairwise learning methods.Comment: arXiv admin note: text overlap with arXiv:1606.0427
Conditional Density Estimation by Penalized Likelihood Model Selection and Applications
In this technical report, we consider conditional density estimation with a
maximum likelihood approach. Under weak assumptions, we obtain a theoretical
bound for a Kullback-Leibler type loss for a single model maximum likelihood
estimate. We use a penalized model selection technique to select a best model
within a collection. We give a general condition on penalty choice that leads
to oracle type inequality for the resulting estimate. This construction is
applied to two examples of partition-based conditional density models, models
in which the conditional density depends only in a piecewise manner from the
covariate. The first example relies on classical piecewise polynomial densities
while the second uses Gaussian mixtures with varying mixing proportion but same
mixture components. We show how this last case is related to an unsupervised
segmentation application that has been the source of our motivation to this
study.Comment: No. RR-7596 (2011
- …