21,623 research outputs found
Reliable ABC model choice via random forests
Approximate Bayesian computation (ABC) methods provide an elaborate approach
to Bayesian inference on complex models, including model choice. Both
theoretical arguments and simulation experiments indicate, however, that model
posterior probabilities may be poorly evaluated by standard ABC techniques. We
propose a novel approach based on a machine learning tool named random forests
to conduct selection among the highly complex models covered by ABC algorithms.
We thus modify the way Bayesian model selection is both understood and
operated, in that we rephrase the inferential goal as a classification problem,
first predicting the model that best fits the data with random forests and
postponing the approximation of the posterior probability of the predicted MAP
for a second stage also relying on random forests. Compared with earlier
implementations of ABC model choice, the ABC random forest approach offers
several potential improvements: (i) it often has a larger discriminative power
among the competing models, (ii) it is more robust against the number and
choice of statistics summarizing the data, (iii) the computing effort is
drastically reduced (with a gain in computation efficiency of at least fifty),
and (iv) it includes an approximation of the posterior probability of the
selected model. The call to random forests will undoubtedly extend the range of
size of datasets and complexity of models that ABC can handle. We illustrate
the power of this novel methodology by analyzing controlled experiments as well
as genuine population genetics datasets. The proposed methodologies are
implemented in the R package abcrf available on the CRAN.Comment: 39 pages, 15 figures, 6 table
Recommending with an Agenda: Active Learning of Private Attributes using Matrix Factorization
Recommender systems leverage user demographic information, such as age,
gender, etc., to personalize recommendations and better place their targeted
ads. Oftentimes, users do not volunteer this information due to privacy
concerns, or due to a lack of initiative in filling out their online profiles.
We illustrate a new threat in which a recommender learns private attributes of
users who do not voluntarily disclose them. We design both passive and active
attacks that solicit ratings for strategically selected items, and could thus
be used by a recommender system to pursue this hidden agenda. Our methods are
based on a novel usage of Bayesian matrix factorization in an active learning
setting. Evaluations on multiple datasets illustrate that such attacks are
indeed feasible and use significantly fewer rated items than static inference
methods. Importantly, they succeed without sacrificing the quality of
recommendations to users.Comment: This is the extended version of a paper that appeared in ACM RecSys
201
On the consistency of Multithreshold Entropy Linear Classifier
Multithreshold Entropy Linear Classifier (MELC) is a recent classifier idea
which employs information theoretic concept in order to create a multithreshold
maximum margin model. In this paper we analyze its consistency over
multithreshold linear models and show that its objective function upper bounds
the amount of misclassified points in a similar manner like hinge loss does in
support vector machines. For further confirmation we also conduct some
numerical experiments on five datasets.Comment: Presented at Theoretical Foundations of Machine Learning 2015
(http://tfml.gmum.net), final version published in Schedae Informaticae
Journa
PAC-Bayes and Domain Adaptation
We provide two main contributions in PAC-Bayesian theory for domain
adaptation where the objective is to learn, from a source distribution, a
well-performing majority vote on a different, but related, target distribution.
Firstly, we propose an improvement of the previous approach we proposed in
Germain et al. (2013), which relies on a novel distribution pseudodistance
based on a disagreement averaging, allowing us to derive a new tighter domain
adaptation bound for the target risk. While this bound stands in the spirit of
common domain adaptation works, we derive a second bound (introduced in Germain
et al., 2016) that brings a new perspective on domain adaptation by deriving an
upper bound on the target risk where the distributions' divergence-expressed as
a ratio-controls the trade-off between a source error measure and the target
voters' disagreement. We discuss and compare both results, from which we obtain
PAC-Bayesian generalization bounds. Furthermore, from the PAC-Bayesian
specialization to linear classifiers, we infer two learning algorithms, and we
evaluate them on real data.Comment: Neurocomputing, Elsevier, 2019. arXiv admin note: substantial text
overlap with arXiv:1503.0694
- …