12 research outputs found
Kernel functions based on triplet comparisons
Given only information in the form of similarity triplets "Object A is more
similar to object B than to object C" about a data set, we propose two ways of
defining a kernel function on the data set. While previous approaches construct
a low-dimensional Euclidean embedding of the data set that reflects the given
similarity triplets, we aim at defining kernel functions that correspond to
high-dimensional embeddings. These kernel functions can subsequently be used to
apply any kernel method to the data set
Less but Better: Generalization Enhancement of Ordinal Embedding via Distributional Margin
In the absence of prior knowledge, ordinal embedding methods obtain new
representation for items in a low-dimensional Euclidean space via a set of
quadruple-wise comparisons. These ordinal comparisons often come from human
annotators, and sufficient comparisons induce the success of classical
approaches. However, collecting a large number of labeled data is known as a
hard task, and most of the existing work pay little attention to the
generalization ability with insufficient samples. Meanwhile, recent progress in
large margin theory discloses that rather than just maximizing the minimum
margin, both the margin mean and variance, which characterize the margin
distribution, are more crucial to the overall generalization performance. To
address the issue of insufficient training samples, we propose a margin
distribution learning paradigm for ordinal embedding, entitled Distributional
Margin based Ordinal Embedding (\textit{DMOE}). Precisely, we first define the
margin for ordinal embedding problem. Secondly, we formulate a concise
objective function which avoids maximizing margin mean and minimizing margin
variance directly but exhibits the similar effect. Moreover, an Augmented
Lagrange Multiplier based algorithm is customized to seek the optimal solution
of \textit{DMOE} effectively. Experimental studies on both simulated and
real-world datasets are provided to show the effectiveness of the proposed
algorithm.Comment: Accepted by AAAI 201
Efficient Data Analytics on Augmented Similarity Triplets
Many machine learning methods (classification, clustering, etc.) start with a
known kernel that provides similarity or distance measure between two objects.
Recent work has extended this to situations where the information about objects
is limited to comparisons of distances between three objects (triplets). Humans
find the comparison task much easier than the estimation of absolute
similarities, so this kind of data can be easily obtained using crowd-sourcing.
In this work, we give an efficient method of augmenting the triplets data, by
utilizing additional implicit information inferred from the existing data.
Triplets augmentation improves the quality of kernel-based and kernel-free data
analytics tasks. Secondly, we also propose a novel set of algorithms for common
supervised and unsupervised machine learning tasks based on triplets. These
methods work directly with triplets, avoiding kernel evaluations. Experimental
evaluation on real and synthetic datasets shows that our methods are more
accurate than the current best-known techniques
Insights into Ordinal Embedding Algorithms: A Systematic Evaluation
The objective of ordinal embedding is to find a Euclidean representation of a
set of abstract items, using only answers to triplet comparisons of the form
"Is item closer to the item or item ?". In recent years, numerous
algorithms have been proposed to solve this problem. However, there does not
exist a fair and thorough assessment of these embedding methods and therefore
several key questions remain unanswered: Which algorithms scale better with
increasing sample size or dimension? Which ones perform better when the
embedding dimension is small or few triplet comparisons are available? In our
paper, we address these questions and provide the first comprehensive and
systematic empirical evaluation of existing algorithms as well as a new neural
network approach. In the large triplet regime, we find that simple, relatively
unknown, non-convex methods consistently outperform all other algorithms,
including elaborate approaches based on neural networks or landmark approaches.
This finding can be explained by our insight that many of the non-convex
optimization approaches do not suffer from local optima. In the low triplet
regime, our neural network approach is either competitive or significantly
outperforms all the other methods. Our comprehensive assessment is enabled by
our unified library of popular embedding algorithms that leverages GPU
resources and allows for fast and accurate embeddings of millions of data
points