855 research outputs found
Pareto-Path Multi-Task Multiple Kernel Learning
A traditional and intuitively appealing Multi-Task Multiple Kernel Learning
(MT-MKL) method is to optimize the sum (thus, the average) of objective
functions with (partially) shared kernel function, which allows information
sharing amongst tasks. We point out that the obtained solution corresponds to a
single point on the Pareto Front (PF) of a Multi-Objective Optimization (MOO)
problem, which considers the concurrent optimization of all task objectives
involved in the Multi-Task Learning (MTL) problem. Motivated by this last
observation and arguing that the former approach is heuristic, we propose a
novel Support Vector Machine (SVM) MT-MKL framework, that considers an
implicitly-defined set of conic combinations of task objectives. We show that
solving our framework produces solutions along a path on the aforementioned PF
and that it subsumes the optimization of the average of objective functions as
a special case. Using algorithms we derived, we demonstrate through a series of
experimental results that the framework is capable of achieving better
classification performance, when compared to other similar MTL approaches.Comment: Accepted by IEEE Transactions on Neural Networks and Learning System
Doctor of Philosophy
dissertationThe contributions in the area of kernelized learning techniques have expanded beyond a few basic kernel functions to general kernel functions that could be learned along with the rest of a statistical learning model. This dissertation aims to explore various directions in \emph{kernel learning}, a setting where we can learn not only a model, but also glean information about the geometry of the data from which we learn, by learning a positive definite (p.d.) kernel. Throughout, we can exploit several properties of kernels that relate to their \emph{geometry} -- a facet that is often overlooked. We revisit some of the necessary mathematical background required to understand kernel learning in context, such as reproducing kernel Hilbert spaces (RKHSs), the reproducing property, the representer theorem, etc. We then cover kernelized learning with support vector machines (SVMs), multiple kernel learning (MKL), and localized kernel learning (LKL). We move on to Bochner's theorem, a tool vital to one of the kernel learning areas we explore. The main portion of the thesis is divided into two parts: (1) kernel learning with SVMs, a.k.a. MKL, and (2) learning based on Bochner's theorem. In the first part, we present efficient, accurate, and scalable algorithms based on the SVM, one that exploits multiplicative weight updates (MWU), and another that exploits local geometry. In the second part, we use Bochner's theorem to incorporate a kernel into a neural network and discover that kernel learning in this fashion, continuous kernel learning (CKL), is superior even to MKL
Complexity-Free Generalization via Distributionally Robust Optimization
Established approaches to obtain generalization bounds in data-driven
optimization and machine learning mostly build on solutions from empirical risk
minimization (ERM), which depend crucially on the functional complexity of the
hypothesis class. In this paper, we present an alternate route to obtain these
bounds on the solution from distributionally robust optimization (DRO), a
recent data-driven optimization framework based on worst-case analysis and the
notion of ambiguity set to capture statistical uncertainty. In contrast to the
hypothesis class complexity in ERM, our DRO bounds depend on the ambiguity set
geometry and its compatibility with the true loss function. Notably, when using
maximum mean discrepancy as a DRO distance metric, our analysis implies, to the
best of our knowledge, the first generalization bound in the literature that
depends solely on the true loss function, entirely free of any complexity
measures or bounds on the hypothesis class
- …