8 research outputs found

    Generalization Bounds for Representative Domain Adaptation

    Full text link
    In this paper, we propose a novel framework to analyze the theoretical properties of the learning process for a representative type of domain adaptation, which combines data from multiple sources and one target (or briefly called representative domain adaptation). In particular, we use the integral probability metric to measure the difference between the distributions of two domains and meanwhile compare it with the H-divergence and the discrepancy distance. We develop the Hoeffding-type, the Bennett-type and the McDiarmid-type deviation inequalities for multiple domains respectively, and then present the symmetrization inequality for representative domain adaptation. Next, we use the derived inequalities to obtain the Hoeffding-type and the Bennett-type generalization bounds respectively, both of which are based on the uniform entropy number. Moreover, we present the generalization bounds based on the Rademacher complexity. Finally, we analyze the asymptotic convergence and the rate of convergence of the learning process for representative domain adaptation. We discuss the factors that affect the asymptotic behavior of the learning process and the numerical experiments support our theoretical findings as well. Meanwhile, we give a comparison with the existing results of domain adaptation and the classical results under the same-distribution assumption.Comment: arXiv admin note: substantial text overlap with arXiv:1304.157

    Combining heterogeneous data sources for neuroimaging based diagnosis: re-weighting and selecting what is important

    Get PDF
    Combining neuroimaging and clinical information for diagnosis, as for example behavioral tasks and genetics characteristics, is potentially beneficial but presents challenges in terms of finding the best data representation for the different sources of information. Their simple combination usually does not provide an improvement if compared with using the best source alone. In this paper, we proposed a framework based on a recent multiple kernel learning algorithm called EasyMKL and we investigated the benefits of this approach for diagnosing two different mental health diseases. The well known Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset tackling the Alzheimer Disease (AD) patients versus healthy controls classification task, and a second dataset tackling the task of classifying an heterogeneous group of depressed patients versus healthy controls. We used EasyMKL to combine a huge amount of basic kernels alongside a feature selection methodology, pursuing an optimal and sparse solution to facilitate interpretability. Our results show that the proposed approach, called EasyMKLFS, outperforms baselines (e.g. SVM and SimpleMKL), state-of-the-art random forests (RF) and feature selection (FS) methods

    Machine learning via transitions

    No full text
    This thesis presents a clear conceptual basis for theoretically studying machine learning problems. Machine learning methods afford means to automate the discovery of relationships in data sets. A relationship between quantities X and Y allows the prediction of one quantity given information of the other. It is these relationships that we make the central object of study. We call these relationships transitions. A transition from a set X to a set Y is a function from X into the probability distributions on Y. Beginning with this simple notion, the thesis proceeds as follows: Utilizing tools from statistical decision theory, we develop an abstract language for quantifying the information present in a transition. We attack the problem of generalized supervision. Generalized supervision is the learning of classifiers from non-ideal data. An important example of this is the learning of classifiers from noisily labelled data. We demonstrate the virtues of our abstract treatment by producing generic methods for solving these problems, as well as producing generic upper bounds for our methods as well as lower bounds for any method that attempts to solve these problems. As a result of our study in generalized supervision, we produce means to define procedures that are robust to certain forms of corruption. We explore, in detail, procedures for learning classifiers that are robust to the effects of symmetric label noise. The result is a classification algorithm that is easier to understand, implement and parallelize than standard kernel based classification schemes, such as the support vector machine and logistic regression. Furthermore, we demonstrate the uniqueness of this method. Finally, we show how many feature learning schemes can be understood via our language. We present well motivated objectives for the task of learning features from unlabelled data, before showing how many standard feature learning methods (such as PCA, sparse coding, auto-encoders and so on) can be seen as minimizing surrogates to our objective functions
    corecore