Search CORE

14 research outputs found

metric-learn: Metric Learning Algorithms in Python

Author: Bellet Aurélien
Carey CJ
de Vazelhes William
Tang Yuan
Vauquier Nathalie
Publication venue: HAL CCSD
Publication date: 22/11/2019
Field of study

GitHub repository: https://github.com/scikit-learn-contrib/metric-learnmetric-learn is an open source Python package implementing supervised and weakly-supervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT licence

INRIA a CCSD electronic archive server

Similarity Learning for High-Dimensional Sparse Data

Author: Bellet Aurélien
Liu Kuan
Sha Fei
Publication venue
Publication date: 09/09/2019
Field of study

A good measure of similarity between data points is crucial to many tasks in machine learning. Similarity and metric learning methods learn such measures automatically from data, but they do not scale well respect to the dimensionality of the data. In this paper, we propose a method that can learn efficiently similarity measure from high-dimensional sparse data. The core idea is to parameterize the similarity measure as a convex combination of rank-one matrices with specific sparsity structures. The parameters are then optimized with an approximate Frank-Wolfe procedure to maximally satisfy relative similarity constraints on the training data. Our algorithm greedily incorporates one pair of features at a time into the similarity measure, providing an efficient way to control the number of active features and thus reduce overfitting. It enjoys very appealing convergence guarantees and its time and memory complexity depends on the sparsity of the data instead of the dimension of the feature space. Our experiments on real-world high-dimensional datasets demonstrate its potential for classification, dimensionality reduction and data exploration.Comment: 14 pages. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS 2015). Matlab code: https://github.com/bellet/HDS

arXiv.org e-Print Archive

CiteSeerX

metric-learn: Metric Learning Algorithms in Python

Author: Bellet Aurélien
Carey CJ
De Vazelhes William
Tang Yuan
Vauquier Nathalie
Publication venue: Microtome Publishing
Publication date: 01/01/2020
Field of study

International audiencemetric-learn is an open source Python package implementing supervised and weaklysupervised distance metric learning algorithms. As part of scikit-learn-contrib, it provides a unified interface compatible with scikit-learn which allows to easily perform cross-validation, model selection, and pipelining with other machine learning estimators. metric-learn is thoroughly tested and available on PyPi under the MIT license

INRIA a CCSD electronic archive server

Accurate and Fast Retrieval for Complex Non-metric Data via Neighborhood Graphs

Author: B Naidan
DD Lewis
DM Blei
DW Jacobs
E Chávez
G Chechik
GR Hjaltason
GT Toussaint
H Samet
L Boytsov
M Aumüller
S Kullback
S Robertson
T Skopal
Y Malkov
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/10/2019
Field of study

We demonstrate that a graph-based search algorithm-relying on the construction of an approximate neighborhood graph-can directly work with challenging non-metric and/or non-symmetric distances without resorting to metric-space mapping and/or distance symmetrization, which, in turn, lead to substantial performance degradation. Although the straightforward metrization and symmetrization is usually ineffective, we find that constructing an index using a modified, e.g., symmetrized, distance can improve performance. This observation paves a way to a new line of research of designing index-specific graph-construction distance functions

arXiv.org e-Print Archive

Crossref

Sensor Technologies for Intelligent Transportation Systems

Author: Contreras-Castillo Juan
Guerrero-Ibáñez Juan
Zeadally Sherali
Publication venue: UKnowledge
Publication date: 01/04/2018
Field of study

Modern society faces serious problems with transportation systems, including but not limited to traffic congestion, safety, and pollution. Information communication technologies have gained increasing attention and importance in modern transportation systems. Automotive manufacturers are developing in-vehicle sensors and their applications in different areas including safety, traffic management, and infotainment. Government institutions are implementing roadside infrastructures such as cameras and sensors to collect data about environmental and traffic conditions. By seamlessly integrating vehicles and sensing devices, their sensing and communication capabilities can be leveraged to achieve smart and intelligent transportation systems. We discuss how sensor technology can be integrated with the transportation infrastructure to achieve a sustainable Intelligent Transportation System (ITS) and how safety, traffic control and infotainment applications can benefit from multiple sensors deployed in different elements of an ITS. Finally, we discuss some of the challenges that need to be addressed to enable a fully operational and cooperative ITS environment

Multidisciplinary Digital Publishing Institute

Crossref

Directory of Open Access Journals

University of Kentucky

Escaping the Curse of Dimensionality in Similarity Learning: Efficient Frank-Wolfe Algorithm and Generalization Bounds

Author: Bellet Aurélien
Liu Kuan
Publication venue: 'Elsevier BV'
Publication date: 01/01/2019
Field of study

International audienceSimilarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical justification of our modeling choices through an analysis of the generalization error, which depends logarithmically on the sparsity of the solution rather than on the number of features. Our experiments on datasets with up to one million features demonstrate the ability of our approach to generalize well despite the high dimensionality as well as its superiority compared to several competing methods

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot