182 research outputs found
Stationary Mixing Bandits
We study the bandit problem where arms are associated with stationary
phi-mixing processes and where rewards are therefore dependent: the question
that arises from this setting is that of recovering some independence by
ignoring the value of some rewards. As we shall see, the bandit problem we
tackle requires us to address the exploration/exploitation/independence
trade-off. To do so, we provide a UCB strategy together with a general regret
analysis for the case where the size of the independence blocks (the ignored
rewards) is fixed and we go a step beyond by providing an algorithm that is
able to compute the size of the independence blocks from the data. Finally, we
give an analysis of our bandit problem in the restless case, i.e., in the
situation where the time counters for all mixing processes simultaneously
evolve
From Cutting Planes Algorithms to Compression Schemes and Active Learning
Cutting-plane methods are well-studied localization(and optimization)
algorithms. We show that they provide a natural framework to perform
machinelearning ---and not just to solve optimization problems posed by
machinelearning--- in addition to their intended optimization use. In
particular, theyallow one to learn sparse classifiers and provide good
compression schemes.Moreover, we show that very little effort is required to
turn them intoeffective active learning methods. This last property provides a
generic way todesign a whole family of active learning algorithms from existing
passivemethods. We present numerical simulations testifying of the relevance
ofcutting-plane methods for passive and active learning tasks.Comment: IJCNN 2015, Jul 2015, Killarney, Ireland. 2015,
\<http://www.ijcnn.org/\&g
Confusion Matrix Stability Bounds for Multiclass Classification
In this paper, we provide new theoretical results on the generalization
properties of learning algorithms for multiclass classification problems. The
originality of our work is that we propose to use the confusion matrix of a
classifier as a measure of its quality; our contribution is in the line of work
which attempts to set up and study the statistical properties of new evaluation
measures such as, e.g. ROC curves. In the confusion-based learning framework we
propose, we claim that a targetted objective is to minimize the size of the
confusion matrix C, measured through its operator norm ||C||. We derive
generalization bounds on the (size of the) confusion matrix in an extended
framework of uniform stability, adapted to the case of matrix valued loss.
Pivotal to our study is a very recent matrix concentration inequality that
generalizes McDiarmid's inequality. As an illustration of the relevance of our
theoretical results, we show how two SVM learning procedures can be proved to
be confusion-friendly. To the best of our knowledge, the present paper is the
first that focuses on the confusion matrix from a theoretical point of view
Decoy Bandits Dueling on a Poset
We adress the problem of dueling bandits defined on partially ordered sets,
or posets. In this setting, arms may not be comparable, and there may be
several (incomparable) optimal arms. We propose an algorithm, UnchainedBandits,
that efficiently finds the set of optimal arms of any poset even when pairs of
comparable arms cannot be distinguished from pairs of incomparable arms, with a
set of minimal assumptions. This algorithm relies on the concept of decoys,
which stems from social psychology. For the easier case where the
incomparability information may be accessible, we propose a second algorithm,
SlicingBandits, which takes advantage of this information and achieves a very
significant gain of performance compared to UnchainedBandits. We provide
theoretical guarantees and experimental evaluation for both algorithms
Unconfused Ultraconservative Multiclass Algorithms
We tackle the problem of learning linear classifiers from noisy datasets in a
multiclass setting. The two-class version of this problem was studied a few
years ago by, e.g. Bylander (1994) and Blum et al. (1996): in these
contributions, the proposed approaches to fight the noise revolve around a
Perceptron learning scheme fed with peculiar examples computed through a
weighted average of points from the noisy training set. We propose to build
upon these approaches and we introduce a new algorithm called UMA (for
Unconfused Multiclass additive Algorithm) which may be seen as a generalization
to the multiclass setting of the previous approaches. In order to characterize
the noise we use the confusion matrix as a multiclass extension of the
classification noise studied in the aforementioned literature. Theoretically
well-founded, UMA furthermore displays very good empirical noise robustness, as
evidenced by numerical simulations conducted on both synthetic and real data.
Keywords: Multiclass classification, Perceptron, Noisy labels, Confusion MatrixComment: ACML, Australia (2013
Chromatic PAC-Bayes Bounds for Non-IID Data: Applications to Ranking and Stationary -Mixing Processes
Pac-Bayes bounds are among the most accurate generalization bounds for
classifiers learned from independently and identically distributed (IID) data,
and it is particularly so for margin classifiers: there have been recent
contributions showing how practical these bounds can be either to perform model
selection (Ambroladze et al., 2007) or even to directly guide the learning of
linear classifiers (Germain et al., 2009). However, there are many practical
situations where the training data show some dependencies and where the
traditional IID assumption does not hold. Stating generalization bounds for
such frameworks is therefore of the utmost interest, both from theoretical and
practical standpoints. In this work, we propose the first - to the best of our
knowledge - Pac-Bayes generalization bounds for classifiers trained on data
exhibiting interdependencies. The approach undertaken to establish our results
is based on the decomposition of a so-called dependency graph that encodes the
dependencies within the data, in sets of independent data, thanks to graph
fractional covers. Our bounds are very general, since being able to find an
upper bound on the fractional chromatic number of the dependency graph is
sufficient to get new Pac-Bayes bounds for specific settings. We show how our
results can be used to derive bounds for ranking statistics (such as Auc) and
classifiers trained on data distributed according to a stationary {\ss}-mixing
process. In the way, we show how our approach seemlessly allows us to deal with
U-processes. As a side note, we also provide a Pac-Bayes generalization bound
for classifiers learned on data from stationary -mixing distributions.Comment: Long version of the AISTATS 09 paper:
http://jmlr.csail.mit.edu/proceedings/papers/v5/ralaivola09a/ralaivola09a.pd
Confusion-Based Online Learning and a Passive-Aggressive Scheme
International audienceThis paper provides the first ---to the best of our knowledge--- analysis of online learning algorithms for multiclass problems when the {\em confusion} matrix is taken as a performance measure. The work builds upon recent and elegant results on noncommutative concentration inequalities, i.e. concentration inequalities that apply to matrices, and, more precisely, to matrix martingales. We do establish generalization bounds for online learning algorithms and show how the theoretical study motivates the proposition of a new confusion-friendly learning procedure. This learning algorithm, called \copa (for COnfusion Passive-Aggressive) is a passive-aggressive learning algorithm; it is shown that the update equations for \copa can be computed analytically and, henceforth, there is no need to recourse to any optimization package to implement it
Recovery and convergence rate of the Frank-Wolfe Algorithm for the m-EXACT-SPARSE Problem
We study the properties of the Frank-Wolfe algorithm to solve the
m-EXACT-SPARSE reconstruction problem, where a signal y must be expressed as a
sparse linear combination of a predefined set of atoms, called dictionary. We
prove that when the signal is sparse enough with respect to the coherence of
the dictionary, then the iterative process implemented by the Frank-Wolfe
algorithm only recruits atoms from the support of the signal, that is the
smallest set of atoms from the dictionary that allows for a perfect
reconstruction of y. We also prove that under this same condition, there exists
an iteration beyond which the algorithm converges exponentially
The pharmacophore kernel for virtual screening with support vector machines
We introduce a family of positive definite kernels specifically optimized for
the manipulation of 3D structures of molecules with kernel methods. The kernels
are based on the comparison of the three-points pharmacophores present in the
3D structures of molecul es, a set of molecular features known to be
particularly relevant for virtual screening applications. We present a
computationally demanding exact implementation of these kernels, as well as
fast approximations related to the classical fingerprint-based approa ches.
Experimental results suggest that this new approach outperforms
state-of-the-art algorithms based on the 2D structure of mol ecules for the
detection of inhibitors of several drug targets
- …