47,713 research outputs found
Online Bayesian Multiple Kernel Bipartite Ranking
Abstract Bipartite ranking aims to maximize the area under the ROC curve (AUC) of a decision function. To tackle this problem when the data appears sequentially, existing online AUC maximization methods focus on seeking a point estimate of the decision function in a linear or predefined single kernel space, and cannot learn effective kernels automatically from the streaming data. In this paper, we first develop a Bayesian multiple kernel bipartite ranking model, which circumvents the kernel selection problem by estimating a posterior distribution over the model weights. To make our model applicable to streaming data, we then present a kernelized online Bayesian passive-aggressive learning framework by maintaining a variational approximation to the posterior based on data augmentation. Furthermore, to efficiently deal with large-scale data, we design a fixed budget strategy which can effectively control online model complexity. Extensive experimental studies confirm the superiority of our Bayesian multi-kernel approach
On the Design of LQR Kernels for Efficient Controller Learning
Finding optimal feedback controllers for nonlinear dynamic systems from data
is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful
framework for direct controller tuning from experimental trials. For selecting
the next query point and finding the global optimum, BO relies on a
probabilistic description of the latent objective function, typically a
Gaussian process (GP). As is shown herein, GPs with a common kernel choice can,
however, lead to poor learning outcomes on standard quadratic control problems.
For a first-order system, we construct two kernels that specifically leverage
the structure of the well-known Linear Quadratic Regulator (LQR), yet retain
the flexibility of Bayesian nonparametric learning. Simulations of uncertain
linear and nonlinear systems demonstrate that the LQR kernels yield superior
learning performance.Comment: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and
Control (CDC 2017
Bayesian kernel-based system identification with quantized output data
In this paper we introduce a novel method for linear system identification
with quantized output data. We model the impulse response as a zero-mean
Gaussian process whose covariance (kernel) is given by the recently proposed
stable spline kernel, which encodes information on regularity and exponential
stability. This serves as a starting point to cast our system identification
problem into a Bayesian framework. We employ Markov Chain Monte Carlo (MCMC)
methods to provide an estimate of the system. In particular, we show how to
design a Gibbs sampler which quickly converges to the target distribution.
Numerical simulations show a substantial improvement in the accuracy of the
estimates over state-of-the-art kernel-based methods when employed in
identification of systems with quantized data.Comment: Submitted to IFAC SysId 201
A new kernel-based approach to system identification with quantized output data
In this paper we introduce a novel method for linear system identification
with quantized output data. We model the impulse response as a zero-mean
Gaussian process whose covariance (kernel) is given by the recently proposed
stable spline kernel, which encodes information on regularity and exponential
stability. This serves as a starting point to cast our system identification
problem into a Bayesian framework. We employ Markov Chain Monte Carlo methods
to provide an estimate of the system. In particular, we design two methods
based on the so-called Gibbs sampler that allow also to estimate the kernel
hyperparameters by marginal likelihood maximization via the
expectation-maximization method. Numerical simulations show the effectiveness
of the proposed scheme, as compared to the state-of-the-art kernel-based
methods when these are employed in system identification with quantized data.Comment: 10 pages, 4 figure
Bayesian Approximate Kernel Regression with Variable Selection
Nonlinear kernel regression models are often used in statistics and machine
learning because they are more accurate than linear models. Variable selection
for kernel regression models is a challenge partly because, unlike the linear
regression setting, there is no clear concept of an effect size for regression
coefficients. In this paper, we propose a novel framework that provides an
effect size analog of each explanatory variable for Bayesian kernel regression
models when the kernel is shift-invariant --- for example, the Gaussian kernel.
We use function analytic properties of shift-invariant reproducing kernel
Hilbert spaces (RKHS) to define a linear vector space that: (i) captures
nonlinear structure, and (ii) can be projected onto the original explanatory
variables. The projection onto the original explanatory variables serves as an
analog of effect sizes. The specific function analytic property we use is that
shift-invariant kernel functions can be approximated via random Fourier bases.
Based on the random Fourier expansion we propose a computationally efficient
class of Bayesian approximate kernel regression (BAKR) models for both
nonlinear regression and binary classification for which one can compute an
analog of effect sizes. We illustrate the utility of BAKR by examining two
important problems in statistical genetics: genomic selection (i.e. phenotypic
prediction) and association mapping (i.e. inference of significant variants or
loci). State-of-the-art methods for genomic selection and association mapping
are based on kernel regression and linear models, respectively. BAKR is the
first method that is competitive in both settings.Comment: 22 pages, 3 figures, 3 tables; theory added; new simulations
presented; references adde
Regularized linear system identification using atomic, nuclear and kernel-based norms: the role of the stability constraint
Inspired by ideas taken from the machine learning literature, new
regularization techniques have been recently introduced in linear system
identification. In particular, all the adopted estimators solve a regularized
least squares problem, differing in the nature of the penalty term assigned to
the impulse response. Popular choices include atomic and nuclear norms (applied
to Hankel matrices) as well as norms induced by the so called stable spline
kernels. In this paper, a comparative study of estimators based on these
different types of regularizers is reported. Our findings reveal that stable
spline kernels outperform approaches based on atomic and nuclear norms since
they suitably embed information on impulse response stability and smoothness.
This point is illustrated using the Bayesian interpretation of regularization.
We also design a new class of regularizers defined by "integral" versions of
stable spline/TC kernels. Under quite realistic experimental conditions, the
new estimators outperform classical prediction error methods also when the
latter are equipped with an oracle for model order selection
Maximum Entropy Vector Kernels for MIMO system identification
Recent contributions have framed linear system identification as a
nonparametric regularized inverse problem. Relying on -type
regularization which accounts for the stability and smoothness of the impulse
response to be estimated, these approaches have been shown to be competitive
w.r.t classical parametric methods. In this paper, adopting Maximum Entropy
arguments, we derive a new penalty deriving from a vector-valued
kernel; to do so we exploit the structure of the Hankel matrix, thus
controlling at the same time complexity, measured by the McMillan degree,
stability and smoothness of the identified models. As a special case we recover
the nuclear norm penalty on the squared block Hankel matrix. In contrast with
previous literature on reweighted nuclear norm penalties, our kernel is
described by a small number of hyper-parameters, which are iteratively updated
through marginal likelihood maximization; constraining the structure of the
kernel acts as a (hyper)regularizer which helps controlling the effective
degrees of freedom of our estimator. To optimize the marginal likelihood we
adapt a Scaled Gradient Projection (SGP) algorithm which is proved to be
significantly computationally cheaper than other first and second order
off-the-shelf optimization methods. The paper also contains an extensive
comparison with many state-of-the-art methods on several Monte-Carlo studies,
which confirms the effectiveness of our procedure
- …