11 research outputs found
Learning Landmark-Based Ensembles with Random Fourier Features and Gradient Boosting
We propose a Gradient Boosting algorithm for learning an ensemble of kernel functions adapted to the task at hand. Unlike state-of-the-art Multiple Kernel Learning techniques that make use of a pre-computed dictionary of kernel functions to select from, at each iteration we fit a kernel by approximating it as a weighted sum of Random Fourier Features (RFF) and by optimizing their barycenter. This allows us to obtain a more versatile method, easier to setup and likely to have better performance. Our study builds on a recent result showing one can learn a kernel from RFF by computing the minimum of a PAC-Bayesian bound on the kernel alignment generalization loss, which is obtained efficiently from a closed-form solution. We conduct an experimental analysis to highlight the advantages of our method w.r.t. both Boosting-based and kernel-learning state-of-the-art methods
Fast Kernel Approximations for Latent Force Models and Convolved Multiple-Output Gaussian processes
A latent force model is a Gaussian process with a covariance function
inspired by a differential operator. Such covariance function is obtained by
performing convolution integrals between Green's functions associated to the
differential operators, and covariance functions associated to latent
functions. In the classical formulation of latent force models, the covariance
functions are obtained analytically by solving a double integral, leading to
expressions that involve numerical solutions of different types of error
functions. In consequence, the covariance matrix calculation is considerably
expensive, because it requires the evaluation of one or more of these error
functions. In this paper, we use random Fourier features to approximate the
solution of these double integrals obtaining simpler analytical expressions for
such covariance functions. We show experimental results using ordinary
differential operators and provide an extension to build general kernel
functions for convolved multiple output Gaussian processes.Comment: 10 pages, 4 figures, accepted by UAI 201
Bayesian inference of ODEs with Gaussian processes
Recent machine learning advances have proposed black-box estimation of
unknown continuous-time system dynamics directly from data. However, earlier
works are based on approximative ODE solutions or point estimates. We propose a
novel Bayesian nonparametric model that uses Gaussian processes to infer
posteriors of unknown ODE systems directly from data. We derive sparse
variational inference with decoupled functional sampling to represent vector
field posteriors. We also introduce a probabilistic shooting augmentation to
enable efficient inference from arbitrarily long trajectories. The method
demonstrates the benefit of computing vector field posteriors, with predictive
uncertainty scores outperforming alternative methods on multiple ODE learning
tasks
Error Bounds for Learning with Vector-Valued Random Features
This paper provides a comprehensive error analysis of learning with
vector-valued random features (RF). The theory is developed for RF ridge
regression in a fully general infinite-dimensional input-output setting, but
nonetheless applies to and improves existing finite-dimensional analyses. In
contrast to comparable work in the literature, the approach proposed here
relies on a direct analysis of the underlying risk functional and completely
avoids the explicit RF ridge regression solution formula in terms of random
matrices. This removes the need for concentration results in random matrix
theory or their generalizations to random operators. The main results
established in this paper include strong consistency of vector-valued RF
estimators under model misspecification and minimax optimal convergence rates
in the well-specified setting. The parameter complexity (number of random
features) and sample complexity (number of labeled data) required to achieve
such rates are comparable with Monte Carlo intuition and free from logarithmic
factors.Comment: 25 pages, 1 tabl
Novel methods for multi-view learning with applications in cyber security
Modern data is complex. It exists in many different forms, shapes and kinds. Vectors, graphs, histograms, sets, intervals, etc.: they each have distinct and varied structural properties. Tailoring models to the characteristics of various feature representations has been the subject of considerable research. In this thesis, we address the challenge of learning from data that is described by multiple heterogeneous feature representations.
This situation arises often in cyber security contexts. Data from a computer network can be represented by a graph of user authentications, a time series of network traffic, a tree of process events, etc. Each representation provides a complementary view of the holistic state of the network, and so data of this type is referred to as multi-view data. Our motivating problem in cyber security is anomaly detection: identifying unusual observations in a joint feature space, which may not appear anomalous marginally.
Our contributions include the development of novel supervised and unsupervised methods, which are applicable not only to cyber security but to multi-view data in general. We extend the generalised linear model to operate in a vector-valued reproducing kernel Hilbert space implied by an operator-valued kernel function, which can be tailored to the structural characteristics of multiple views of data. This is a highly flexible algorithm, able to predict a wide variety of response types. A distinguishing feature is the ability to simultaneously identify outlier observations with respect to the fitted model. Our proposed unsupervised learning model extends multidimensional scaling to directly map multi-view data into a shared latent space. This vector embedding captures both commonalities and disparities that exist between multiple views of the data. Throughout the thesis, we demonstrate our models using real-world cyber security datasets.Open Acces