1,737 research outputs found
Competitive on-line learning with a convex loss function
We consider the problem of sequential decision making under uncertainty in
which the loss caused by a decision depends on the following binary
observation. In competitive on-line learning, the goal is to design decision
algorithms that are almost as good as the best decision rules in a wide
benchmark class, without making any assumptions about the way the observations
are generated. However, standard algorithms in this area can only deal with
finite-dimensional (often countable) benchmark classes. In this paper we give
similar results for decision rules ranging over an arbitrary reproducing kernel
Hilbert space. For example, it is shown that for a wide class of loss functions
(including the standard square, absolute, and log loss functions) the average
loss of the master algorithm, over the first observations, does not exceed
the average loss of the best decision rule with a bounded norm plus
. Our proof technique is very different from the standard ones and
is based on recent results about defensive forecasting. Given the probabilities
produced by a defensive forecasting algorithm, which are known to be well
calibrated and to have good resolution in the long run, we use the expected
loss minimization principle to find a suitable decision.Comment: 26 page
Koopman Kernel Regression
Many machine learning approaches for decision making, such as reinforcement
learning, rely on simulators or predictive models to forecast the
time-evolution of quantities of interest, e.g., the state of an agent or the
reward of a policy. Forecasts of such complex phenomena are commonly described
by highly nonlinear dynamical systems, making their use in optimization-based
decision-making challenging. Koopman operator theory offers a beneficial
paradigm for addressing this problem by characterizing forecasts via linear
time-invariant (LTI) ODEs, turning multi-step forecasts into sparse matrix
multiplication. Though there exists a variety of learning approaches, they
usually lack crucial learning-theoretic guarantees, making the behavior of the
obtained models with increasing data and dimensionality unclear. We address the
aforementioned by deriving a universal Koopman-invariant reproducing kernel
Hilbert space (RKHS) that solely spans transformations into LTI dynamical
systems. The resulting Koopman Kernel Regression (KKR) framework enables the
use of statistical learning tools from function approximation for novel
convergence results and generalization error bounds under weaker assumptions
than existing work. Our experiments demonstrate superior forecasting
performance compared to Koopman operator and sequential data predictors in
RKHS.Comment: Accepted to the thirty-seventh Conference on Neural Information
Processing Systems (NeurIPS 2023
Learning from Distributions via Support Measure Machines
This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distributions as mean embeddings in the
reproducing kernel Hilbert space (RKHS), we are able to apply many standard
kernel-based learning techniques in straightforward fashion. To accomplish
this, we construct a generalization of the support vector machine (SVM) called
a support measure machine (SMM). Our analyses of SMMs provides several insights
into their relationship to traditional SVMs. Based on such insights, we propose
a flexible SVM (Flex-SVM) that places different kernel functions on each
training example. Experimental results on both synthetic and real-world data
demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2
Nonparametric likelihood based estimation of linear filters for point processes
We consider models for multivariate point processes where the intensity is
given nonparametrically in terms of functions in a reproducing kernel Hilbert
space. The likelihood function involves a time integral and is consequently not
given in terms of a finite number of kernel evaluations. The main result is a
representation of the gradient of the log-likelihood, which we use to derive
computable approximations of the log-likelihood and the gradient by time
discretization. These approximations are then used to minimize the approximate
penalized log-likelihood. For time and memory efficiency the implementation
relies crucially on the use of sparse matrices. As an illustration we consider
neuron network modeling, and we use this example to investigate how the
computational costs of the approximations depend on the resolution of the time
discretization. The implementation is available in the R package ppstat.Comment: 10 pages, 3 figure
- …