855 research outputs found

    Pareto-Path Multi-Task Multiple Kernel Learning

    Full text link
    A traditional and intuitively appealing Multi-Task Multiple Kernel Learning (MT-MKL) method is to optimize the sum (thus, the average) of objective functions with (partially) shared kernel function, which allows information sharing amongst tasks. We point out that the obtained solution corresponds to a single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO) problem, which considers the concurrent optimization of all task objectives involved in the Multi-Task Learning (MTL) problem. Motivated by this last observation and arguing that the former approach is heuristic, we propose a novel Support Vector Machine (SVM) MT-MKL framework, that considers an implicitly-defined set of conic combinations of task objectives. We show that solving our framework produces solutions along a path on the aforementioned PF and that it subsumes the optimization of the average of objective functions as a special case. Using algorithms we derived, we demonstrate through a series of experimental results that the framework is capable of achieving better classification performance, when compared to other similar MTL approaches.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System

    Doctor of Philosophy

    Get PDF
    dissertationThe contributions in the area of kernelized learning techniques have expanded beyond a few basic kernel functions to general kernel functions that could be learned along with the rest of a statistical learning model. This dissertation aims to explore various directions in \emph{kernel learning}, a setting where we can learn not only a model, but also glean information about the geometry of the data from which we learn, by learning a positive definite (p.d.) kernel. Throughout, we can exploit several properties of kernels that relate to their \emph{geometry} -- a facet that is often overlooked. We revisit some of the necessary mathematical background required to understand kernel learning in context, such as reproducing kernel Hilbert spaces (RKHSs), the reproducing property, the representer theorem, etc. We then cover kernelized learning with support vector machines (SVMs), multiple kernel learning (MKL), and localized kernel learning (LKL). We move on to Bochner's theorem, a tool vital to one of the kernel learning areas we explore. The main portion of the thesis is divided into two parts: (1) kernel learning with SVMs, a.k.a. MKL, and (2) learning based on Bochner's theorem. In the first part, we present efficient, accurate, and scalable algorithms based on the SVM, one that exploits multiplicative weight updates (MWU), and another that exploits local geometry. In the second part, we use Bochner's theorem to incorporate a kernel into a neural network and discover that kernel learning in this fashion, continuous kernel learning (CKL), is superior even to MKL

    Complexity-Free Generalization via Distributionally Robust Optimization

    Full text link
    Established approaches to obtain generalization bounds in data-driven optimization and machine learning mostly build on solutions from empirical risk minimization (ERM), which depend crucially on the functional complexity of the hypothesis class. In this paper, we present an alternate route to obtain these bounds on the solution from distributionally robust optimization (DRO), a recent data-driven optimization framework based on worst-case analysis and the notion of ambiguity set to capture statistical uncertainty. In contrast to the hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set geometry and its compatibility with the true loss function. Notably, when using maximum mean discrepancy as a DRO distance metric, our analysis implies, to the best of our knowledge, the first generalization bound in the literature that depends solely on the true loss function, entirely free of any complexity measures or bounds on the hypothesis class
    • …
    corecore