8 research outputs found
Generalization Bounds for Representative Domain Adaptation
In this paper, we propose a novel framework to analyze the theoretical
properties of the learning process for a representative type of domain
adaptation, which combines data from multiple sources and one target (or
briefly called representative domain adaptation). In particular, we use the
integral probability metric to measure the difference between the distributions
of two domains and meanwhile compare it with the H-divergence and the
discrepancy distance. We develop the Hoeffding-type, the Bennett-type and the
McDiarmid-type deviation inequalities for multiple domains respectively, and
then present the symmetrization inequality for representative domain
adaptation. Next, we use the derived inequalities to obtain the Hoeffding-type
and the Bennett-type generalization bounds respectively, both of which are
based on the uniform entropy number. Moreover, we present the generalization
bounds based on the Rademacher complexity. Finally, we analyze the asymptotic
convergence and the rate of convergence of the learning process for
representative domain adaptation. We discuss the factors that affect the
asymptotic behavior of the learning process and the numerical experiments
support our theoretical findings as well. Meanwhile, we give a comparison with
the existing results of domain adaptation and the classical results under the
same-distribution assumption.Comment: arXiv admin note: substantial text overlap with arXiv:1304.157
Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important
Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods
Machine learning via transitions
This thesis presents a clear conceptual basis for theoretically studying machine learning
problems. Machine learning methods afford means to automate the discovery of
relationships in data sets. A relationship between quantities X and Y allows the prediction
of one quantity given information of the other. It is these relationships that
we make the central object of study. We call these relationships transitions.
A transition from a set X to a set Y is a function from X into the probability distributions
on Y. Beginning with this simple notion, the thesis proceeds as follows:
Utilizing tools from statistical decision theory, we develop an abstract language
for quantifying the information present in a transition.
We attack the problem of generalized supervision. Generalized supervision
is the learning of classifiers from non-ideal data. An important example of
this is the learning of classifiers from noisily labelled data. We demonstrate
the virtues of our abstract treatment by producing generic methods for solving
these problems, as well as producing generic upper bounds for our methods as
well as lower bounds for any method that attempts to solve these problems.
As a result of our study in generalized supervision, we produce means to define
procedures that are robust to certain forms of corruption. We explore, in detail,
procedures for learning classifiers that are robust to the effects of symmetric
label noise. The result is a classification algorithm that is easier to understand,
implement and parallelize than standard kernel based classification schemes,
such as the support vector machine and logistic regression. Furthermore, we
demonstrate the uniqueness of this method.
Finally, we show how many feature learning schemes can be understood via
our language. We present well motivated objectives for the task of learning features
from unlabelled data, before showing how many standard feature learning
methods (such as PCA, sparse coding, auto-encoders and so on) can be seen
as minimizing surrogates to our objective functions