14 research outputs found

    metric-learn: Metric Learning Algorithms in Python

    Get PDF
    GitHub repository: https://github.com/scikit-learn-contrib/metric-learnmetric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT licence

    Similarity Learning for High-Dimensional Sparse Data

    Get PDF
    A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015). Matlab code: https://github.com/bellet/HDS

    metric-learn: Metric Learning Algorithms in Python

    Get PDF
    International audiencemetric-learn is an open source Python package implementing supervised and weaklysupervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT license

    Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

    Full text link
    We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

    Sensor Technologies for Intelligent Transportation Systems

    Get PDF
    Modern society faces serious problems with transportation systems, including but not limited to traffic congestion, safety, and pollution. Information communication technologies have gained increasing attention and importance in modern transportation systems. Automotive manufacturers are developing in-vehicle sensors and their applications in different areas including safety, traffic management, and infotainment. Government institutions are implementing roadside infrastructures such as cameras and sensors to collect data about environmental and traffic conditions. By seamlessly integrating vehicles and sensing devices, their sensing and communication capabilities can be leveraged to achieve smart and intelligent transportation systems. We discuss how sensor technology can be integrated with the transportation infrastructure to achieve a sustainable Intelligent Transportation System (ITS) and how safety, traffic control and infotainment applications can benefit from multiple sensors deployed in different elements of an ITS. Finally, we discuss some of the challenges that need to be addressed to enable a fully operational and cooperative ITS environment

    Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

    Get PDF
    International audienceSimilarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods
    corecore