38,183 research outputs found
?????? ?????? ??????????????? ?????? ????????????
Department of Computer Science and EngineeringRecently deep reinforcement learning (DRL) algorithms show super human performances in the simulated game domains. In practical points, the sample efficiency is also one of the most important measures to determine the performance of a model. Especially for the environment of large search spaces (e.g. continuous action space), it is very critical condition to achieve the state-of-the-art performance.
In this thesis, we design a model to be applicable to multi-end games in continuous space with high sample efficiency. A multi-end game has several sub-games which are independent each other but affect the result of the game by some rules of its domain. We verify the algorithm in the environment of simulated curling.clos
On the Design of LQR Kernels for Efficient Controller Learning
Finding optimal feedback controllers for nonlinear dynamic systems from data
is hard. Recently, Bayesian optimization (BO) has been proposed as a powerful
framework for direct controller tuning from experimental trials. For selecting
the next query point and finding the global optimum, BO relies on a
probabilistic description of the latent objective function, typically a
Gaussian process (GP). As is shown herein, GPs with a common kernel choice can,
however, lead to poor learning outcomes on standard quadratic control problems.
For a first-order system, we construct two kernels that specifically leverage
the structure of the well-known Linear Quadratic Regulator (LQR), yet retain
the flexibility of Bayesian nonparametric learning. Simulations of uncertain
linear and nonlinear systems demonstrate that the LQR kernels yield superior
learning performance.Comment: 8 pages, 5 figures, to appear in 56th IEEE Conference on Decision and
Control (CDC 2017
Sequential Design for Ranking Response Surfaces
We propose and analyze sequential design methods for the problem of ranking
several response surfaces. Namely, given response surfaces over a
continuous input space , the aim is to efficiently find the index of
the minimal response across the entire . The response surfaces are not
known and have to be noisily sampled one-at-a-time. This setting is motivated
by stochastic control applications and requires joint experimental design both
in space and response-index dimensions. To generate sequential design
heuristics we investigate stepwise uncertainty reduction approaches, as well as
sampling based on posterior classification complexity. We also make connections
between our continuous-input formulation and the discrete framework of pure
regret in multi-armed bandits. To model the response surfaces we utilize
kriging surrogates. Several numerical examples using both synthetic data and an
epidemics control problem are provided to illustrate our approach and the
efficacy of respective adaptive designs.Comment: 26 pages, 7 figures (updated several sections and figures
Learning from Distributions via Support Measure Machines
This paper presents a kernel-based discriminative learning framework on
probability measures. Rather than relying on large collections of vectorial
training examples, our framework learns using a collection of probability
distributions that have been constructed to meaningfully represent training
data. By representing these probability distributions as mean embeddings in the
reproducing kernel Hilbert space (RKHS), we are able to apply many standard
kernel-based learning techniques in straightforward fashion. To accomplish
this, we construct a generalization of the support vector machine (SVM) called
a support measure machine (SMM). Our analyses of SMMs provides several insights
into their relationship to traditional SVMs. Based on such insights, we propose
a flexible SVM (Flex-SVM) that places different kernel functions on each
training example. Experimental results on both synthetic and real-world data
demonstrate the effectiveness of our proposed framework.Comment: Advances in Neural Information Processing Systems 2
Adaptive Multiple Importance Sampling for Gaussian Processes
In applications of Gaussian processes where quantification of uncertainty is
a strict requirement, it is necessary to accurately characterize the posterior
distribution over Gaussian process covariance parameters. Normally, this is
done by means of standard Markov chain Monte Carlo (MCMC) algorithms. Motivated
by the issues related to the complexity of calculating the marginal likelihood
that can make MCMC algorithms inefficient, this paper develops an alternative
inference framework based on Adaptive Multiple Importance Sampling (AMIS). This
paper studies the application of AMIS in the case of a Gaussian likelihood, and
proposes the Pseudo-Marginal AMIS for non-Gaussian likelihoods, where the
marginal likelihood is unbiasedly estimated. The results suggest that the
proposed framework outperforms MCMC-based inference of covariance parameters in
a wide range of scenarios and remains competitive for moderately large
dimensional parameter spaces.Comment: 27 page
- âŠ