4,131 research outputs found
Large-Margin Metric Learning for Constrained Partitioning Problems
International audienceWe consider unsupervised partitioning problems based explicitly or implicitly on the minimization of Euclidean distortions, such as clustering, image or video segmentation, and other change-point detection problems. We emphasize on cases with specific structure, which include many practical situations ranging from meanbasedchange-point detection to image segmentation problems. We aim at learning a Mahalanobis metric for these unsupervised problems, leading to feature weighting and/or selection. This is done in a supervised way by assuming the availability of several (partially) labeled datasets that share the same metric. We cast the metric learning problem as a large-margin structured prediction problem, with proper definition of regularizers and losses, leading to a convex optimization problem which can be solved efficiently. Our experiments show how learning the metric can significantlyimprove performance on bioinformatics, video or image segmentation problems
Metric Learning for Temporal Sequence Alignment
In this paper, we propose to learn a Mahalanobis distance to perform
alignment of multivariate time series. The learning examples for this task are
time series for which the true alignment is known. We cast the alignment
problem as a structured prediction task, and propose realistic losses between
alignments for which the optimization is tractable. We provide experiments on
real data in the audio to audio context, where we show that the learning of a
similarity measure leads to improvements in the performance of the alignment
task. We also propose to use this metric learning framework to perform feature
selection and, from basic audio features, build a combination of these with
better performance for the alignment
Optimizing class partitioning in multi-class classification using a descriptive control language
Many of the best statistical classification algorithms are binary
classifiers, that is they can only distinguish between one of two classes. The
number of possible ways of generalizing binary classification to multi-class
increases exponentially with the number of classes. There is some indication
that the best method of doing so will depend on the dataset. As such, we are
particularly interested in data-driven solution design, whether based on prior
considerations or on empirical examination of the data. Here we demonstrate how
a recursive control language can be used to describe a multitude of different
partitioning strategies in multi-class classification, including those in most
common use. We use it both to manually construct new partitioning
configurations as well as to examine those that have been automatically
designed. Eight different strategies are tested on eight different datasets
using both support vector machines (SVM) as well as logistic regression as the
base binary classifiers. Numerical tests suggest that a one-size-fits-all
solution consisting of one-versus-one is appropriate for most datasets however
one dataset benefitted from the techniques applied in this paper. The best
solution exploited a property of the dataset to produce an uncertainty
coefficient 36\% higher (0.016 absolute gain) than one-vs.-one. Adaptive
solutions that empirically examined the data also produced gains over
one-vs.-one while also being faster.Comment: Changed title and abstract, removed section on quadratic
optimization; other than that the content is mostly the sam
Deep Transductive Semi-supervised Maximum Margin Clustering
Semi-supervised clustering is an very important topic in machine learning and
computer vision. The key challenge of this problem is how to learn a metric,
such that the instances sharing the same label are more likely close to each
other on the embedded space. However, little attention has been paid to learn
better representations when the data lie on non-linear manifold. Fortunately,
deep learning has led to great success on feature learning recently. Inspired
by the advances of deep learning, we propose a deep transductive
semi-supervised maximum margin clustering approach. More specifically, given
pairwise constraints, we exploit both labeled and unlabeled data to learn a
non-linear mapping under maximum margin framework for clustering analysis.
Thus, our model unifies transductive learning, feature learning and maximum
margin techniques in the semi-supervised clustering framework. We pretrain the
deep network structure with restricted Boltzmann machines (RBMs) layer by layer
greedily, and optimize our objective function with gradient descent. By
checking the most violated constraints, our approach updates the model
parameters through error backpropagation, in which deep features are learned
automatically. The experimental results shows that our model is significantly
better than the state of the art on semi-supervised clustering.Comment: 1
Scalable Similarity Learning using Large Margin Neighborhood Embedding
Classifying large-scale image data into object categories is an important
problem that has received increasing research attention. Given the huge amount
of data, non-parametric approaches such as nearest neighbor classifiers have
shown promising results, especially when they are underpinned by a learned
distance or similarity measurement. Although metric learning has been well
studied in the past decades, most existing algorithms are impractical to handle
large-scale data sets. In this paper, we present an image similarity learning
method that can scale well in both the number of images and the dimensionality
of image descriptors. To this end, similarity comparison is restricted to each
sample's local neighbors and a discriminative similarity measure is induced
from large margin neighborhood embedding. We also exploit the ensemble of
projections so that high-dimensional features can be processed in a set of
lower-dimensional subspaces in parallel without much performance compromise.
The similarity function is learned online using a stochastic gradient descent
algorithm in which the triplet sampling strategy is customized for quick
convergence of classification performance. The effectiveness of our proposed
model is validated on several data sets with scales varying from tens of
thousands to one million images. Recognition accuracies competitive with the
state-of-the-art performance are achieved with much higher efficiency and
scalability
Estimating Maximally Probable Constrained Relations by Mathematical Programming
Estimating a constrained relation is a fundamental problem in machine
learning. Special cases are classification (the problem of estimating a map
from a set of to-be-classified elements to a set of labels), clustering (the
problem of estimating an equivalence relation on a set) and ranking (the
problem of estimating a linear order on a set). We contribute a family of
probability measures on the set of all relations between two finite, non-empty
sets, which offers a joint abstraction of multi-label classification,
correlation clustering and ranking by linear ordering. Estimating (learning) a
maximally probable measure, given (a training set of) related and unrelated
pairs, is a convex optimization problem. Estimating (inferring) a maximally
probable relation, given a measure, is a 01-linear program. It is solved in
linear time for maps. It is NP-hard for equivalence relations and linear
orders. Practical solutions for all three cases are shown in experiments with
real data. Finally, estimating a maximally probable measure and relation
jointly is posed as a mixed-integer nonlinear program. This formulation
suggests a mathematical programming approach to semi-supervised learning.Comment: 16 page
PyTorch-BigGraph: A Large-scale Graph Embedding System
Graph embedding methods produce unsupervised node features from graphs that
can then be used for a variety of machine learning tasks. Modern graphs,
particularly in industrial applications, contain billions of nodes and
trillions of edges, which exceeds the capability of existing embedding systems.
We present PyTorch-BigGraph (PBG), an embedding system that incorporates
several modifications to traditional multi-relation embedding systems that
allow it to scale to graphs with billions of nodes and trillions of edges. PBG
uses graph partitioning to train arbitrarily large embeddings on either a
single machine or in a distributed environment. We demonstrate comparable
performance with existing embedding systems on common benchmarks, while
allowing for scaling to arbitrarily large graphs and parallelization on
multiple machines. We train and evaluate embeddings on several large social
network graphs as well as the full Freebase dataset, which contains over 100
million nodes and 2 billion edges
ruptures: change point detection in Python
ruptures is a Python library for offline change point detection. This package
provides methods for the analysis and segmentation of non-stationary signals.
Implemented algorithms include exact and approximate detection for various
parametric and non-parametric models. ruptures focuses on ease of use by
providing a well-documented and consistent interface. In addition, thanks to
its modular structure, different algorithms and models can be connected and
extended within this package
Socially Constrained Structural Learning for Groups Detection in Crowd
Modern crowd theories agree that collective behavior is the result of the
underlying interactions among small groups of individuals. In this work, we
propose a novel algorithm for detecting social groups in crowds by means of a
Correlation Clustering procedure on people trajectories. The affinity between
crowd members is learned through an online formulation of the Structural SVM
framework and a set of specifically designed features characterizing both their
physical and social identity, inspired by Proxemic theory, Granger causality,
DTW and Heat-maps. To adhere to sociological observations, we introduce a loss
function (G-MITRE) able to deal with the complexity of evaluating group
detection performances. We show our algorithm achieves state-of-the-art results
when relying on both ground truth trajectories and tracklets previously
extracted by available detector/tracker systems
Scalable Multilabel Prediction via Randomized Methods
Modeling the dependence between outputs is a fundamental challenge in
multilabel classification. In this work we show that a generic regularized
nonlinearity mapping independent predictions to joint predictions is sufficient
to achieve state-of-the-art performance on a variety of benchmark problems.
Crucially, we compute the joint predictions without ever obtaining any
independent predictions, while incorporating low-rank and smoothness
regularization. We achieve this by leveraging randomized algorithms for matrix
decomposition and kernel approximation. Furthermore, our techniques are
applicable to the multiclass setting. We apply our method to a variety of
multiclass and multilabel data sets, obtaining state-of-the-art results
- …