Search CORE

15 research outputs found

On Causal and Anticausal Learning

Author: Janzing Dominik
Mooij Joris
Peters Jonas
Schoelkopf Bernhard
Sgouritsa Eleni
Zhang Kun
Publication venue
Publication date: 01/01/2012
Field of study

We consider the problem of function estimation in the case where an underlying causal model can be inferred. This has implications for popular scenarios such as covariate shift, concept drift, transfer learning and semi-supervised learning. We argue that causal knowledge may facilitate some approaches for a given problem, and rule out others. In particular, we formulate a hypothesis for when semi-supervised learning can help, and corroborate it with empirical results.Comment: Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012). arXiv admin note: substantial text overlap with arXiv:1112.273

arXiv.org e-Print Archive

Radboud Repository

MPG.PuRe

Invariant Models for Causal Transfer Learning

Author: Peters Jonas
Rojas-Carulla Mateo
Schölkopf Bernhard
Turner Richard
Publication venue
Publication date: 01/01/2018
Field of study

Methods of transfer learning try to combine knowledge from several related tasks (or domains) to improve performance on a test task. Inspired by causal methodology, we relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks. We show how this assumption can be motivated from ideas in the field of causality. We focus on the problem of Domain Generalization, in which no examples from the test task are observed. We prove that in an adversarial setting using this subset for prediction is optimal in Domain Generalization; we further provide examples, in which the tasks are sufficiently diverse and the estimator therefore outperforms pooling the data, even on average. If examples from the test task are available, we also provide a method to transfer knowledge from the training tasks and exploit all available features for prediction. However, we provide no guarantees for this method. We introduce a practical method which allows for automatic inference of the above subset and provide corresponding code. We present results on synthetic data sets and a gene deletion data set

arXiv.org e-Print Archive

Copenhagen University Research Information System

MPG.PuRe

Efficient Training of Graph-Regularized Multitask SVMs

Author: A. Torralba
C. Cortes
D. Bertsekas
K.R. Müller
M. Kloft
R. Fan
R.M. Rifkin
S. Sonnenburg
S. Sonnenburg
T. Evgeniou
T. Joachims
T.W.T.C.C. Consortium
W. Samek
Y. Xue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an optimization framework for graph-regularized multi-task SVMs based on the primal formulation of the problem. Previous approaches employ a so-called multi-task kernel (MTK) and thus are inapplicable when the numbers of training examples n is large (typically n < 20,000, even for just a few tasks). In this paper, we present a primal optimization criterion, allowing for general loss functions, and derive its dual representation. Building on the work of Hsieh et al. [1,2], we derive an algorithm for optimizing the large-margin objective and prove its convergence. Our computational experiments show a speedup of up to three orders of magnitude over LibSVM and SVMLight for several standard benchmarks as well as challenging data sets from the application domain of computational biology. Combining our optimization methodology with the COFFIN large-scale learning framework [3], we are able to train a multi-task SVM using over 1,000,000 training points stemming from 4 different tasks. An efficient C++ implementation of our algorithm is being made publicly available as a part of the SHOGUN machine learning toolbox [4]

Crossref

MPG.PuRe

A review of domain adaptation without target labels

Author: Kouw Wouter M.
Loog Marco
Publication venue
Publication date: 01/01/2019
Field of study

Domain adaptation has become a prominent problem setting in machine learning and related fields. This review asks the question: how can a classifier learn from a source domain and generalize to a target domain? We present a categorization of approaches, divided into, what we refer to as, sample-based, feature-based and inference-based methods. Sample-based methods focus on weighting individual observations during training based on their importance to the target domain. Feature-based methods revolve around on mapping, projecting and representing features such that a source classifier performs well on the target domain and inference-based methods incorporate adaptation into the parameter estimation procedure, for instance through constraints on the optimization procedure. Additionally, we review a number of conditions that allow for formulating bounds on the cross-domain generalization error. Our categorization highlights recurring ideas and raises questions important to further research.Comment: 20 pages, 5 figure

arXiv.org e-Print Archive

Crossref

An empirical study of ensemble-based semi-supervised learning approaches for imbalanced splice site datasets

Author: Ana Stanescu
Doina Caragea
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Crossref