189 research outputs found
PAC-Bayesian Majority Vote for Late Classifier Fusion
A lot of attention has been devoted to multimedia indexing over the past few
years. In the literature, we often consider two kinds of fusion schemes: The
early fusion and the late fusion. In this paper we focus on late classifier
fusion, where one combines the scores of each modality at the decision level.
To tackle this problem, we investigate a recent and elegant well-founded
quadratic program named MinCq coming from the Machine Learning PAC-Bayes
theory. MinCq looks for the weighted combination, over a set of real-valued
functions seen as voters, leading to the lowest misclassification rate, while
making use of the voters' diversity. We provide evidence that this method is
naturally adapted to late fusion procedure. We propose an extension of MinCq by
adding an order- preserving pairwise loss for ranking, helping to improve Mean
Averaged Precision measure. We confirm the good behavior of the MinCq-based
fusion approaches with experiments on a real image benchmark.Comment: 7 pages, Research repor
A Survey on Metric Learning for Feature Vectors and Structured Data
The need for appropriate ways to measure the distance or similarity between
data is ubiquitous in machine learning, pattern recognition and data mining,
but handcrafting such good metrics for specific problems is generally
difficult. This has led to the emergence of metric learning, which aims at
automatically learning a metric from data and has attracted a lot of interest
in machine learning and related fields for the past ten years. This survey
paper proposes a systematic review of the metric learning literature,
highlighting the pros and cons of each approach. We pay particular attention to
Mahalanobis distance metric learning, a well-studied and successful framework,
but additionally present a wide range of methods that have recently emerged as
powerful alternatives, including nonlinear metric learning, similarity learning
and local metric learning. Recent trends and extensions, such as
semi-supervised metric learning, metric learning for histogram data and the
derivation of generalization guarantees, are also covered. Finally, this survey
addresses metric learning for structured data, in particular edit distance
learning, and attempts to give an overview of the remaining challenges in
metric learning for the years to come.Comment: Technical report, 59 pages. Changes in v2: fixed typos and improved
presentation. Changes in v3: fixed typos. Changes in v4: fixed typos and new
method
Similarity Learning for Provably Accurate Sparse Linear Classification
In recent years, the crucial importance of metrics in machine learning
algorithms has led to an increasing interest for optimizing distance and
similarity functions. Most of the state of the art focus on learning
Mahalanobis distances (requiring to fulfill a constraint of positive
semi-definiteness) for use in a local k-NN algorithm. However, no theoretical
link is established between the learned metrics and their performance in
classification. In this paper, we make use of the formal framework of good
similarities introduced by Balcan et al. to design an algorithm for learning a
non PSD linear similarity optimized in a nonlinear feature space, which is then
used to build a global linear classifier. We show that our approach has uniform
stability and derive a generalization bound on the classification error.
Experiments performed on various datasets confirm the effectiveness of our
approach compared to state-of-the-art methods and provide evidence that (i) it
is fast, (ii) robust to overfitting and (iii) produces very sparse classifiers.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
A New PAC-Bayesian Perspective on Domain Adaptation
We study the issue of PAC-Bayesian domain adaptation: We want to learn, from
a source domain, a majority vote model dedicated to a target one. Our
theoretical contribution brings a new perspective by deriving an upper-bound on
the target risk where the distributions' divergence---expressed as a
ratio---controls the trade-off between a source error measure and the target
voters' disagreement. Our bound suggests that one has to focus on regions where
the source data is informative.From this result, we derive a PAC-Bayesian
generalization bound, and specialize it to linear classifiers. Then, we infer a
learning algorithmand perform experiments on real data.Comment: Published at ICML 201
An Improvement to the Domain Adaptation Bound in a PAC-Bayesian context
This paper provides a theoretical analysis of domain adaptation based on the
PAC-Bayesian theory. We propose an improvement of the previous domain
adaptation bound obtained by Germain et al. in two ways. We first give another
generalization bound tighter and easier to interpret. Moreover, we provide a
new analysis of the constant term appearing in the bound that can be of high
interest for developing new algorithmic solutions.Comment: NIPS 2014 Workshop on Transfer and Multi-task learning: Theory Meets
Practice, Dec 2014, Montr{\'e}al, Canad
Learning Multipicity Tree Automata
International audienceIn this paper, we present a theoretical approach for the problem of learning multiplicity tree automata. These automata allows one to define functions which compute a number for each tree. They can be seen as a strict generalization of stochastic tree automata since they allow to define functions over any field K. A multiplicity automaton admits a support which is a non deterministic automaton. From a grammatical inference point of view, this paper presents a contribution which is original due to the combination of two important aspects. This is the first time, as far as we now, that a learning method focuses on non deterministic tree automata which computes functions over a field. The algorithm proposed in this paper stands in Angluin's exact model where a learner is allowed to use membership and equivalence queries. We show that this algorithm is polynomial in time in function of the size of the representation
Joint Distribution Optimal Transportation for Domain Adaptation
This paper deals with the unsupervised domain adaptation problem, where one
wants to estimate a prediction function in a given target domain without
any labeled sample by exploiting the knowledge available from a source domain
where labels are known. Our work makes the following assumption: there exists a
non-linear transformation between the joint feature/label space distributions
of the two domain and . We propose a solution of
this problem with optimal transport, that allows to recover an estimated target
by optimizing simultaneously the optimal coupling
and . We show that our method corresponds to the minimization of a bound on
the target error, and provide an efficient algorithmic solution, for which
convergence is proved. The versatility of our approach, both in terms of class
of hypothesis or loss functions is demonstrated with real world classification
and regression problems, for which we reach or surpass state-of-the-art
results.Comment: Accepted for publication at NIPS 201
PAC-Bayes and Domain Adaptation
We provide two main contributions in PAC-Bayesian theory for domain
adaptation where the objective is to learn, from a source distribution, a
well-performing majority vote on a different, but related, target distribution.
Firstly, we propose an improvement of the previous approach we proposed in
Germain et al. (2013), which relies on a novel distribution pseudodistance
based on a disagreement averaging, allowing us to derive a new tighter domain
adaptation bound for the target risk. While this bound stands in the spirit of
common domain adaptation works, we derive a second bound (introduced in Germain
et al., 2016) that brings a new perspective on domain adaptation by deriving an
upper bound on the target risk where the distributions' divergence-expressed as
a ratio-controls the trade-off between a source error measure and the target
voters' disagreement. We discuss and compare both results, from which we obtain
PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian
specialization to linear classifiers, we infer two learning algorithms, and we
evaluate them on real data.Comment: Neurocomputing, Elsevier, 2019. arXiv admin note: substantial text
overlap with arXiv:1503.0694
PAC-Bayesian Learning and Domain Adaptation
In machine learning, Domain Adaptation (DA) arises when the distribution gen-
erating the test (target) data differs from the one generating the learning
(source) data. It is well known that DA is an hard task even under strong
assumptions, among which the covariate-shift where the source and target
distributions diverge only in their marginals, i.e. they have the same labeling
function. Another popular approach is to consider an hypothesis class that
moves closer the two distributions while implying a low-error for both tasks.
This is a VC-dim approach that restricts the complexity of an hypothesis class
in order to get good generalization. Instead, we propose a PAC-Bayesian
approach that seeks for suitable weights to be given to each hypothesis in
order to build a majority vote. We prove a new DA bound in the PAC-Bayesian
context. This leads us to design the first DA-PAC-Bayesian algorithm based on
the minimization of the proposed bound. Doing so, we seek for a \rho-weighted
majority vote that takes into account a trade-off between three quantities. The
first two quantities being, as usual in the PAC-Bayesian approach, (a) the
complexity of the majority vote (measured by a Kullback-Leibler divergence) and
(b) its empirical risk (measured by the \rho-average errors on the source
sample). The third quantity is (c) the capacity of the majority vote to
distinguish some structural difference between the source and target samples.Comment: https://sites.google.com/site/multitradeoffs2012
- …