4 research outputs found
Similarity-based Learning via Data Driven Embeddings
We consider the problem of classification using similarity/distance functions
over data. Specifically, we propose a framework for defining the goodness of a
(dis)similarity function with respect to a given learning task and propose
algorithms that have guaranteed generalization properties when working with
such good functions. Our framework unifies and generalizes the frameworks
proposed by [Balcan-Blum ICML 2006] and [Wang et al ICML 2007]. An attractive
feature of our framework is its adaptability to data - we do not promote a
fixed notion of goodness but rather let data dictate it. We show, by giving
theoretical guarantees that the goodness criterion best suited to a problem can
itself be learned which makes our approach applicable to a variety of domains
and problems. We propose a landmarking-based approach to obtaining a classifier
from such learned goodness criteria. We then provide a novel diversity based
heuristic to perform task-driven selection of landmark points instead of random
selection. We demonstrate the effectiveness of our goodness criteria learning
method as well as the landmark selection heuristic on a variety of
similarity-based learning datasets and benchmark UCI datasets on which our
method consistently outperforms existing approaches by a significant margin.Comment: To appear in the proceedings of NIPS 2011, 14 page