17,491 research outputs found
Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces
Policy optimization methods have shown great promise in solving complex
reinforcement and imitation learning tasks. While model-free methods are
broadly applicable, they often require many samples to optimize complex
policies. Model-based methods greatly improve sample-efficiency but at the cost
of poor generalization, requiring a carefully handcrafted model of the system
dynamics for each task. Recently, hybrid methods have been successful in
trading off applicability for improved sample-complexity. However, these have
been limited to continuous action spaces. In this work, we present a new hybrid
method based on an approximation of the dynamics as an expectation over the
next state under the current policy. This relaxation allows us to derive a
novel hybrid policy gradient estimator, combining score function and pathwise
derivative estimators, that is applicable to discrete action spaces. We show
significant gains in sample complexity, ranging between and ,
when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and
Hand Mass. Our method is applicable to both discrete and continuous action
spaces, when competing pathwise methods are limited to the latter.Comment: In AAAI 2018 proceeding
Imitating Driver Behavior with Generative Adversarial Networks
The ability to accurately predict and simulate human driving behavior is
critical for the development of intelligent transportation systems. Traditional
modeling methods have employed simple parametric models and behavioral cloning.
This paper adopts a method for overcoming the problem of cascading errors
inherent in prior approaches, resulting in realistic behavior that is robust to
trajectory perturbations. We extend Generative Adversarial Imitation Learning
to the training of recurrent policies, and we demonstrate that our model
outperforms rule-based controllers and maximum likelihood models in realistic
highway simulations. Our model both reproduces emergent behavior of human
drivers, such as lane change rate, while maintaining realistic control over
long time horizons.Comment: 8 pages, 6 figure
Mechanical MNIST: A benchmark dataset for mechanical metamodels
Metamodels, or models of models, map defined model inputs to defined model outputs. Typically, metamodels are constructed by generating a dataset through sampling a direct model and training a machine learning algorithm to predict a limited number of model outputs from varying model inputs. When metamodels are constructed to be computationally cheap, they are an invaluable tool for applications ranging from topology optimization, to uncertainty quantification, to multi-scale simulation. By nature, a given metamodel will be tailored to a specific dataset. However, the most pragmatic metamodel type and structure will often be general to larger classes of problems. At present, the most pragmatic metamodel selection for dealing with mechanical data has not been thoroughly explored. Drawing inspiration from the benchmark datasets available to the computer vision research community, we introduce a benchmark data set (Mechanical MNIST) for constructing metamodels of heterogeneous material undergoing large deformation. We then show examples of how our benchmark dataset can be used, and establish baseline metamodel performance. Because our dataset is readily available, it will enable the direct quantitative comparison between different metamodeling approaches in a pragmatic manner. We anticipate that it will enable the broader community of researchers to develop improved metamodeling techniques for mechanical data that will surpass the baseline performance that we show here.Accepted manuscrip
- …