944 research outputs found
Stable Feature Selection for Biomarker Discovery
Feature selection techniques have been used as the workhorse in biomarker
discovery applications for a long time. Surprisingly, the stability of feature
selection with respect to sampling variations has long been under-considered.
It is only until recently that this issue has received more and more attention.
In this article, we review existing stable feature selection methods for
biomarker discovery using a generic hierarchal framework. We have two
objectives: (1) providing an overview on this new yet fast growing topic for a
convenient reference; (2) categorizing existing methods under an expandable
framework for future research and development
A bagging SVM to learn from positive and unlabeled examples
We consider the problem of learning a binary classifier from a training set
of positive and unlabeled examples, both in the inductive and in the
transductive setting. This problem, often referred to as \emph{PU learning},
differs from the standard supervised classification problem by the lack of
negative examples in the training set. It corresponds to an ubiquitous
situation in many applications such as information retrieval or gene ranking,
when we have identified a set of data of interest sharing a particular
property, and we wish to automatically retrieve additional data sharing the
same property among a large and easily available pool of unlabeled data. We
propose a conceptually simple method, akin to bagging, to approach both
inductive and transductive PU learning problems, by converting them into series
of supervised binary classification problems discriminating the known positive
examples from random subsamples of the unlabeled set. We empirically
demonstrate the relevance of the method on simulated and real data, where it
performs at least as well as existing methods while being faster
Transductive Ranking on Graphs
In ranking, one is given examples of order relationships among objects, and the goal is to learn from these examples a real-valued ranking function that induces a ranking or ordering over the object space. We consider the problem of learning such a ranking function in a transductive, graph-based setting, where the object space is finite and is represented as a graph in which vertices correspond to objects and edges encode similarities between objects. Building on recent developments in regularization theory for graphs and corresponding Laplacian-based learning methods, we develop an algorithmic framework for learning ranking functions on graphs. We derive generalization bounds for our algorithms in transductive models similar to those used to study other transductive learning problems, and give experimental evidence of the potential benefits of our framework
MEG Decoding Across Subjects
Brain decoding is a data analysis paradigm for neuroimaging experiments that
is based on predicting the stimulus presented to the subject from the
concurrent brain activity. In order to make inference at the group level, a
straightforward but sometimes unsuccessful approach is to train a classifier on
the trials of a group of subjects and then to test it on unseen trials from new
subjects. The extreme difficulty is related to the structural and functional
variability across the subjects. We call this approach "decoding across
subjects". In this work, we address the problem of decoding across subjects for
magnetoencephalographic (MEG) experiments and we provide the following
contributions: first, we formally describe the problem and show that it belongs
to a machine learning sub-field called transductive transfer learning (TTL).
Second, we propose to use a simple TTL technique that accounts for the
differences between train data and test data. Third, we propose the use of
ensemble learning, and specifically of stacked generalization, to address the
variability across subjects within train data, with the aim of producing more
stable classifiers. On a face vs. scramble task MEG dataset of 16 subjects, we
compare the standard approach of not modelling the differences across subjects,
to the proposed one of combining TTL and ensemble learning. We show that the
proposed approach is consistently more accurate than the standard one
- …