15 research outputs found

    Kernel functions based on triplet comparisons

    Full text link
    Given only information in the form of similarity triplets "Object A is more similar to object B than to object C" about a data set, we propose two ways of defining a kernel function on the data set. While previous approaches construct a low-dimensional Euclidean embedding of the data set that reflects the given similarity triplets, we aim at defining kernel functions that correspond to high-dimensional embeddings. These kernel functions can subsequently be used to apply any kernel method to the data set

    Routing and search on large scale networks

    Get PDF
    In this thesis, we address two seemingly unrelated problems, namely routing in large wireless ad hoc networks and comparison based search in image databases. However, the underlying problem is in essence similar and we can use the same strategy to attack those two problems. In both cases, the intrinsic complexity of the problem is in some sense low, and we can exploit this fact to design efficient algorithms. A wireless ad hoc network is a communication network consisting of wireless devices such as for instance laptops or cell phones. The network does not have any fixed infrastructure, and hence nodes which cannot communicate directly over the wireless medium must use intermediate nodes as relays. This immediately raises the question of how to select the relay nodes. Ideally, one would like to find a path from the source to the destination which is as short as possible. The length of the found path, also called route, typically depends on how much signaling traffic is generated in order to establish the route. This is the fundamental trade-off that we will investigate in this thesis. As mentioned above, we try and exploit the fact that the communication network is intrinsically low-dimensional, or in other words has low complexity. We show that this is indeed the case for a large class of models and that we can design efficient algorithms for routing that use this property. Low dimensionality implies that we can well embed the network in a low-dimensional space, or build simple hierarchical decompositions of the network. We use both those techniques to design routing algorithms. Comparison based search in image databases is a new problem that can be defined as follows. Given a large database of images, can a human user retrieve an image which he has in mind, or at least an image similar to that image, without going sequentially through all images? More precisely, we ask whether we can search a database of images only by making comparisons between images. As a case in point, we ask whether we can find a query image q only by asking questions of the type "does image q look more like image A or image B"? The analogous to signaling traffic for wireless networks would here be the questions we can ask human users in a learning phase anterior to the search. In other words, we would like to ask as few questions as possible to pre-process and prepare the database, while guaranteeing a certain quality of the results obtained in the search phase. As the underlying image space is not necessarily metric, this raises new questions on how to search spaces for which only rank information can be obtained. The rank of A with respect to B is k, if A is B's kth nearest neighbor. In this setup, low-dimensionality is analogous to the homogeneity of the image space. As we will see, the homogeneity can be captured by properties of the rank relationships. In turn, homogeneous spaces can be well decomposed hierarchically using comparisons. Further, it allows us to design good hash functions. To design efficient algorithms for these two problems, we can apply the same techniques mutatis mutandis. In both cases, we relied on the intuition that the problem has a low intrinsic complexity, and that we can exploit this fact. Our results come in the form of simulation results and asymptotic bounds

    Efficient Data Analytics on Augmented Similarity Triplets

    Full text link
    Many machine learning methods (classification, clustering, etc.) start with a known kernel that provides similarity or distance measure between two objects. Recent work has extended this to situations where the information about objects is limited to comparisons of distances between three objects (triplets). Humans find the comparison task much easier than the estimation of absolute similarities, so this kind of data can be easily obtained using crowd-sourcing. In this work, we give an efficient method of augmenting the triplets data, by utilizing additional implicit information inferred from the existing data. Triplets augmentation improves the quality of kernel-based and kernel-free data analytics tasks. Secondly, we also propose a novel set of algorithms for common supervised and unsupervised machine learning tasks based on triplets. These methods work directly with triplets, avoiding kernel evaluations. Experimental evaluation on real and synthetic datasets shows that our methods are more accurate than the current best-known techniques

    Insights into Ordinal Embedding Algorithms: A Systematic Evaluation

    Full text link
    The objective of ordinal embedding is to find a Euclidean representation of a set of abstract items, using only answers to triplet comparisons of the form "Is item ii closer to the item jj or item kk?". In recent years, numerous algorithms have been proposed to solve this problem. However, there does not exist a fair and thorough assessment of these embedding methods and therefore several key questions remain unanswered: Which algorithms scale better with increasing sample size or dimension? Which ones perform better when the embedding dimension is small or few triplet comparisons are available? In our paper, we address these questions and provide the first comprehensive and systematic empirical evaluation of existing algorithms as well as a new neural network approach. In the large triplet regime, we find that simple, relatively unknown, non-convex methods consistently outperform all other algorithms, including elaborate approaches based on neural networks or landmark approaches. This finding can be explained by our insight that many of the non-convex optimization approaches do not suffer from local optima. In the low triplet regime, our neural network approach is either competitive or significantly outperforms all the other methods. Our comprehensive assessment is enabled by our unified library of popular embedding algorithms that leverages GPU resources and allows for fast and accurate embeddings of millions of data points

    A Revenue Function for Comparison-Based Hierarchical Clustering

    Full text link
    Comparison-based learning addresses the problem of learning when, instead of explicit features or pairwise similarities, one only has access to comparisons of the form: \emph{Object AA is more similar to BB than to CC.} Recently, it has been shown that, in Hierarchical Clustering, single and complete linkage can be directly implemented using only such comparisons while several algorithms have been proposed to emulate the behaviour of average linkage. Hence, finding hierarchies (or dendrograms) using only comparisons is a well understood problem. However, evaluating their meaningfulness when no ground-truth nor explicit similarities are available remains an open question. In this paper, we bridge this gap by proposing a new revenue function that allows one to measure the goodness of dendrograms using only comparisons. We show that this function is closely related to Dasgupta's cost for hierarchical clustering that uses pairwise similarities. On the theoretical side, we use the proposed revenue function to resolve the open problem of whether one can approximately recover a latent hierarchy using few triplet comparisons. On the practical side, we present principled algorithms for comparison-based hierarchical clustering based on the maximisation of the revenue and we empirically compare them with existing methods.Comment: 26 pages, 6 figures, 5 tables. Transactions on Machine Learning Research (2023
    corecore