16 research outputs found
Consistent Online Gaussian Process Regression Without the Sample Complexity Bottleneck
Gaussian processes provide a framework for nonlinear nonparametric Bayesian
inference widely applicable across science and engineering. Unfortunately,
their computational burden scales cubically with the training sample size,
which in the case that samples arrive in perpetuity, approaches infinity. This
issue necessitates approximations for use with streaming data, which to date
mostly lack convergence guarantees. Thus, we develop the first online Gaussian
process approximation that preserves convergence to the population posterior,
i.e., asymptotic posterior consistency, while ameliorating its intractable
complexity growth with the sample size. We propose an online compression scheme
that, following each a posteriori update, fixes an error neighborhood with
respect to the Hellinger metric centered at the current posterior, and greedily
tosses out past kernel dictionary elements until its boundary is hit. We call
the resulting method Parsimonious Online Gaussian Processes (POG). For
diminishing error radius, exact asymptotic consistency is preserved (Theorem
1(i)) at the cost of unbounded memory in the limit. On the other hand, for
constant error radius, POG converges to a neighborhood of the population
posterior (Theorem 1(ii))but with finite memory at-worst determined by the
metric entropy of the feature space (Theorem 2). Experimental results are
presented on several nonlinear regression problems which illuminates the merits
of this approach as compared with alternatives that fix the subspace dimension
defining the history of past points
Intelligent Autonomous Things on the Battlefield
Numerous, artificially intelligent, networked things will populate the
battlefield of the future, operating in close collaboration with human
warfighters, and fighting as teams in highly adversarial environments. This
chapter explores the characteristics, capabilities and intelli-gence required
of such a network of intelligent things and humans - Internet of Battle Things
(IOBT). The IOBT will experience unique challenges that are not yet well
addressed by the current generation of AI and machine learning.Comment: This is a much expanded version of an earlier conference paper
available at arXiv:803.1125
Sparse multiresolution representations with adaptive kernels
Reproducing kernel Hilbert spaces (RKHSs) are key elements of many
non-parametric tools successfully used in signal processing, statistics, and
machine learning. In this work, we aim to address three issues of the classical
RKHS based techniques. First, they require the RKHS to be known a priori, which
is unrealistic in many applications. Furthermore, the choice of RKHS affects
the shape and smoothness of the solution, thus impacting its performance.
Second, RKHSs are ill-equipped to deal with heterogeneous degrees of
smoothness, i.e., with functions that are smooth in some parts of their domain
but vary rapidly in others. Finally, the computational complexity of evaluating
the solution of these methods grows with the number of data points, rendering
these techniques infeasible for many applications. Though kernel learning,
local kernel adaptation, and sparsity have been used to address these issues,
many of these approaches are computationally intensive or forgo optimality
guarantees. We tackle these problems by leveraging a novel integral
representation of functions in RKHSs that allows for arbitrary centers and
different kernels at each center. To address the complexity issues, we then
write the function estimation problem as a sparse functional program that
explicitly minimizes the support of the representation leading to low
complexity solutions. Despite their non-convexity and infinite dimensionality,
we show these problems can be solved exactly and efficiently by leveraging
duality, and we illustrate this new approach in simulated and real data
Optimally Compressed Nonparametric Online Learning
Batch training of machine learning models based on neural networks is now
well established, whereas to date streaming methods are largely based on linear
models. To go beyond linear in the online setting, nonparametric methods are of
interest due to their universality and ability to stably incorporate new
information via convexity or Bayes' Rule. Unfortunately, when used online,
nonparametric methods suffer a "curse of dimensionality" which precludes their
use: their complexity scales at least with the time index. We survey online
compression tools which bring their memory under control and attain approximate
convergence. The asymptotic bias depends on a compression parameter that trades
off memory and accuracy. Further, the applications to robotics, communications,
economics, and power are discussed, as well as extensions to multi-agent
systems
Policy Evaluation in Continuous MDPs with Efficient Kernelized Gradient Temporal Difference
We consider policy evaluation in infinite-horizon discounted Markov decision
problems (MDPs) with infinite spaces. We reformulate this task a compositional
stochastic program with a function-valued decision variable that belongs to a
reproducing kernel Hilbert space (RKHS). We approach this problem via a new
functional generalization of stochastic quasi-gradient methods operating in
tandem with stochastic sparse subspace projections. The result is an extension
of gradient temporal difference learning that yields nonlinearly parameterized
value function estimates of the solution to the Bellman evaluation equation.
Our main contribution is a memory-efficient non-parametric stochastic method
guaranteed to converge exactly to the Bellman fixed point with probability
with attenuating step-sizes. Further, with constant step-sizes, we obtain mean
convergence to a neighborhood and that the value function estimates have finite
complexity. In the Mountain Car domain, we observe faster convergence to lower
Bellman error solutions than existing approaches with a fraction of the
required memory
Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces
Reinforcement learning consists of finding policies that maximize an expected
cumulative long-term reward in a Markov decision process with unknown
transition probabilities and instantaneous rewards. In this paper, we consider
the problem of finding such optimal policies while assuming they are continuous
functions belonging to a reproducing kernel Hilbert space (RKHS). To learn the
optimal policy we introduce a stochastic policy gradient ascent algorithm with
three unique novel features: (i) The stochastic estimates of policy gradients
are unbiased. (ii) The variance of stochastic gradients is reduced by drawing
on ideas from numerical differentiation. (iii) Policy complexity is controlled
using sparse RKHS representations. Novel feature (i) is instrumental in proving
convergence to a stationary point of the expected cumulative reward. Novel
feature (ii) facilitates reasonable convergence times. Novel feature (iii) is a
necessity in practical implementations which we show can be done in a way that
does not eliminate convergence guarantees. Numerical examples in standard
problems illustrate successful learning of policies with low complexity
representations which are close to stationary points of the expected cumulative
reward
Asynchronous Incremental Stochastic Dual Descent Algorithm for Network Resource Allocation
Stochastic network optimization problems entail finding resource allocation
policies that are optimum on an average but must be designed in an online
fashion. Such problems are ubiquitous in communication networks, where
resources such as energy and bandwidth are divided among nodes to satisfy
certain long-term objectives. This paper proposes an asynchronous incremental
dual decent resource allocation algorithm that utilizes delayed stochastic
{gradients} for carrying out its updates. The proposed algorithm is well-suited
to heterogeneous networks as it allows the computationally-challenged or
energy-starved nodes to, at times, postpone the updates. The asymptotic
analysis of the proposed algorithm is carried out, establishing dual
convergence under both, constant and diminishing step sizes. It is also shown
that with constant step size, the proposed resource allocation policy is
asymptotically near-optimal. An application involving multi-cell coordinated
beamforming is detailed, demonstrating the usefulness of the proposed
algorithm
Nonparametric Compositional Stochastic Optimization for Risk-Sensitive Kernel Learning
In this work, we address optimization problems where the objective function
is a nonlinear function of an expected value, i.e., compositional stochastic
{strongly convex programs}. We consider the case where the decision variable is
not vector-valued but instead belongs to a reproducing Kernel Hilbert Space
(RKHS), motivated by risk-aware formulations of supervised learning and Markov
Decision Processes defined over continuous spaces.
We develop the first memory-efficient stochastic algorithm for this setting,
which we call Compositional Online Learning with Kernels (COLK). COLK, at its
core a two-time-scale stochastic approximation method, addresses the fact that
(i) compositions of expected value problems cannot be addressed by classical
stochastic gradient due to the presence of the inner expectation; and (ii) the
RKHS-induced parameterization has complexity which is proportional to the
iteration index which is mitigated through greedily constructed subspace
projections. We establish almost sure convergence of COLK with attenuating
step-sizes, and linear convergence in mean to a neighborhood with constant
step-sizes, as well as the fact that its complexity is at-worst finite. The
experiments with robust formulations of supervised learning demonstrate that
COLK reliably converges, attains consistent performance across training runs,
and thus overcomes overfitting
Continuous-time Value Function Approximation in Reproducing Kernel Hilbert Spaces
Motivated by the success of reinforcement learning (RL) for discrete-time
tasks such as AlphaGo and Atari games, there has been a recent surge of
interest in using RL for continuous-time control of physical systems (cf. many
challenging tasks in OpenAI Gym and DeepMind Control Suite). Since
discretization of time is susceptible to error, it is methodologically more
desirable to handle the system dynamics directly in continuous time. However,
very few techniques exist for continuous-time RL and they lack flexibility in
value function approximation. In this paper, we propose a novel framework for
model-based continuous-time value function approximation in reproducing kernel
Hilbert spaces. The resulting framework is so flexible that it can accommodate
any kind of kernel-based approach, such as Gaussian processes and kernel
adaptive filters, and it allows us to handle uncertainties and nonstationarity
without prior knowledge about the environment or what basis functions to
employ. We demonstrate the validity of the presented framework through
experiments.Comment: NeurIPS 2018 - Advances in Neural Information Processing Systems
(with the supplementary document
Federated Classification using Parsimonious Functions in Reproducing Kernel Hilbert Spaces
Federated learning forms a global model using data collected from a
federation agent. This type of learning has two main challenges: the agents
generally don't collect data over the same distribution, and the agents have
limited capabilities of storing and transmitting data. Therefore, it is
impractical for each agent to send the entire data over the network. Instead,
each agent must form a local model and decide what information is fundamental
to the learning problem, which will be sent to a central unit. The central unit
can then form the global model using only the information received from the
agents. We propose a method that tackles these challenges. First each agent
forms a local model using a low complexity reproducing kernel Hilbert space
representation. From the model the agents identify the fundamental samples
which are sent to the central unit. The fundamental samples are obtained by
solving the dual problem. The central unit then forms the global model. We show
that the solution of the federated learner converges to that of the centralized
learner asymptotically as the sample size increases. The performance of the
proposed algorithm is evaluated using experiments with both simulated data and
real data sets from an activity recognition task, for which the data is
collected from a wearable device. The experimentation results show that the
accuracy of our method converges to that of a centralized learner with
increasing sample size