4 research outputs found
Efficient Data Analytics on Augmented Similarity Triplets
Many machine learning methods (classification, clustering, etc.) start with a
known kernel that provides similarity or distance measure between two objects.
Recent work has extended this to situations where the information about objects
is limited to comparisons of distances between three objects (triplets). Humans
find the comparison task much easier than the estimation of absolute
similarities, so this kind of data can be easily obtained using crowd-sourcing.
In this work, we give an efficient method of augmenting the triplets data, by
utilizing additional implicit information inferred from the existing data.
Triplets augmentation improves the quality of kernel-based and kernel-free data
analytics tasks. Secondly, we also propose a novel set of algorithms for common
supervised and unsupervised machine learning tasks based on triplets. These
methods work directly with triplets, avoiding kernel evaluations. Experimental
evaluation on real and synthetic datasets shows that our methods are more
accurate than the current best-known techniques
A Revenue Function for Comparison-Based Hierarchical Clustering
Comparison-based learning addresses the problem of learning when, instead of
explicit features or pairwise similarities, one only has access to comparisons
of the form: \emph{Object is more similar to than to .} Recently, it
has been shown that, in Hierarchical Clustering, single and complete linkage
can be directly implemented using only such comparisons while several
algorithms have been proposed to emulate the behaviour of average linkage.
Hence, finding hierarchies (or dendrograms) using only comparisons is a well
understood problem. However, evaluating their meaningfulness when no
ground-truth nor explicit similarities are available remains an open question.
In this paper, we bridge this gap by proposing a new revenue function that
allows one to measure the goodness of dendrograms using only comparisons. We
show that this function is closely related to Dasgupta's cost for hierarchical
clustering that uses pairwise similarities. On the theoretical side, we use the
proposed revenue function to resolve the open problem of whether one can
approximately recover a latent hierarchy using few triplet comparisons. On the
practical side, we present principled algorithms for comparison-based
hierarchical clustering based on the maximisation of the revenue and we
empirically compare them with existing methods.Comment: 26 pages, 6 figures, 5 tables. Transactions on Machine Learning
Research (2023
Boosting for Comparison-Based Learning
We consider the problem of classification in a comparison-based setting:
given a set of objects, we only have access to triplet comparisons of the form
"object is closer to object than to object ." In this paper we
introduce TripletBoost, a new method that can learn a classifier just from such
triplet comparisons. The main idea is to aggregate the triplets information
into weak classifiers, which can subsequently be boosted to a strong
classifier. Our method has two main advantages: (i) it is applicable to data
from any metric space, and (ii) it can deal with large scale problems using
only passively obtained and noisy triplets. We derive theoretical
generalization guarantees and a lower bound on the number of necessary
triplets, and we empirically show that our method is both competitive with
state of the art approaches and resistant to noise.Comment: This is the extended version (38 pages) of a paper accepted to the
International Joint Conference on Artificial Intelligence (IJCAI) 201