9,341 research outputs found
Adaptive Simulation-based Training of AI Decision-makers using Bayesian Optimization
This work studies how an AI-controlled dog-fighting agent with tunable
decision-making parameters can learn to optimize performance against an
intelligent adversary, as measured by a stochastic objective function evaluated
on simulated combat engagements. Gaussian process Bayesian optimization (GPBO)
techniques are developed to automatically learn global Gaussian Process (GP)
surrogate models, which provide statistical performance predictions in both
explored and unexplored areas of the parameter space. This allows a learning
engine to sample full-combat simulations at parameter values that are most
likely to optimize performance and also provide highly informative data points
for improving future predictions. However, standard GPBO methods do not provide
a reliable surrogate model for the highly volatile objective functions found in
aerial combat, and thus do not reliably identify global maxima. These issues
are addressed by novel Repeat Sampling (RS) and Hybrid Repeat/Multi-point
Sampling (HRMS) techniques. Simulation studies show that HRMS improves the
accuracy of GP surrogate models, allowing AI decision-makers to more accurately
predict performance and efficiently tune parameters.Comment: submitted to JAIS for revie
Deterministic Policy Optimization by Combining Pathwise and Score Function Estimators for Discrete Action Spaces
Policy optimization methods have shown great promise in solving complex
reinforcement and imitation learning tasks. While model-free methods are
broadly applicable, they often require many samples to optimize complex
policies. Model-based methods greatly improve sample-efficiency but at the cost
of poor generalization, requiring a carefully handcrafted model of the system
dynamics for each task. Recently, hybrid methods have been successful in
trading off applicability for improved sample-complexity. However, these have
been limited to continuous action spaces. In this work, we present a new hybrid
method based on an approximation of the dynamics as an expectation over the
next state under the current policy. This relaxation allows us to derive a
novel hybrid policy gradient estimator, combining score function and pathwise
derivative estimators, that is applicable to discrete action spaces. We show
significant gains in sample complexity, ranging between and ,
when learning parameterized policies on Cart Pole, Acrobot, Mountain Car and
Hand Mass. Our method is applicable to both discrete and continuous action
spaces, when competing pathwise methods are limited to the latter.Comment: In AAAI 2018 proceeding
Imitating Driver Behavior with Generative Adversarial Networks
The ability to accurately predict and simulate human driving behavior is
critical for the development of intelligent transportation systems. Traditional
modeling methods have employed simple parametric models and behavioral cloning.
This paper adopts a method for overcoming the problem of cascading errors
inherent in prior approaches, resulting in realistic behavior that is robust to
trajectory perturbations. We extend Generative Adversarial Imitation Learning
to the training of recurrent policies, and we demonstrate that our model
outperforms rule-based controllers and maximum likelihood models in realistic
highway simulations. Our model both reproduces emergent behavior of human
drivers, such as lane change rate, while maintaining realistic control over
long time horizons.Comment: 8 pages, 6 figure
Mechanical MNIST: A benchmark dataset for mechanical metamodels
Metamodels, or models of models, map defined model inputs to defined model outputs. Typically, metamodels are constructed by generating a dataset through sampling a direct model and training a machine learning algorithm to predict a limited number of model outputs from varying model inputs. When metamodels are constructed to be computationally cheap, they are an invaluable tool for applications ranging from topology optimization, to uncertainty quantification, to multi-scale simulation. By nature, a given metamodel will be tailored to a specific dataset. However, the most pragmatic metamodel type and structure will often be general to larger classes of problems. At present, the most pragmatic metamodel selection for dealing with mechanical data has not been thoroughly explored. Drawing inspiration from the benchmark datasets available to the computer vision research community, we introduce a benchmark data set (Mechanical MNIST) for constructing metamodels of heterogeneous material undergoing large deformation. We then show examples of how our benchmark dataset can be used, and establish baseline metamodel performance. Because our dataset is readily available, it will enable the direct quantitative comparison between different metamodeling approaches in a pragmatic manner. We anticipate that it will enable the broader community of researchers to develop improved metamodeling techniques for mechanical data that will surpass the baseline performance that we show here.Accepted manuscrip
Using Machine Learning to Emulate Agent-Based Simulations
In this proof-of-concept work, we evaluate the performance of multiple
machine-learning methods as statistical emulators for use in the analysis of
agent-based models (ABMs). Analysing ABM outputs can be challenging, as the
relationships between input parameters can be non-linear or even chaotic even
in relatively simple models, and each model run can require significant CPU
time. Statistical emulation, in which a statistical model of the ABM is
constructed to facilitate detailed model analyses, has been proposed as an
alternative to computationally costly Monte Carlo methods. Here we compare
multiple machine-learning methods for ABM emulation in order to determine the
approaches best suited to emulating the complex behaviour of ABMs. Our results
suggest that, in most scenarios, artificial neural networks (ANNs) and
gradient-boosted trees outperform Gaussian process emulators, currently the
most commonly used method for the emulation of complex computational models.
ANNs produced the most accurate model replications in scenarios with high
numbers of model runs, although training times were longer than the other
methods. We propose that agent-based modelling would benefit from using
machine-learning methods for emulation, as this can facilitate more robust
sensitivity analyses for the models while also reducing CPU time consumption
when calibrating and analysing the simulation
InfoGAIL: Interpretable Imitation Learning from Visual Demonstrations
The goal of imitation learning is to mimic expert behavior without access to
an explicit reward signal. Expert demonstrations provided by humans, however,
often show significant variability due to latent factors that are typically not
explicitly modeled. In this paper, we propose a new algorithm that can infer
the latent structure of expert demonstrations in an unsupervised way. Our
method, built on top of Generative Adversarial Imitation Learning, can not only
imitate complex behaviors, but also learn interpretable and meaningful
representations of complex behavioral data, including visual demonstrations. In
the driving domain, we show that a model learned from human demonstrations is
able to both accurately reproduce a variety of behaviors and accurately
anticipate human actions using raw visual inputs. Compared with various
baselines, our method can better capture the latent structure underlying expert
demonstrations, often recovering semantically meaningful factors of variation
in the data.Comment: 14 pages, NIPS 201
Reinforcement Learning for Robotics and Control with Active Uncertainty Reduction
Model-free reinforcement learning based methods such as Proximal Policy
Optimization, or Q-learning typically require thousands of interactions with
the environment to approximate the optimum controller which may not always be
feasible in robotics due to safety and time consumption. Model-based methods
such as PILCO or BlackDrops, while data-efficient, provide solutions with
limited robustness and complexity. To address this tradeoff, we introduce
active uncertainty reduction-based virtual environments, which are formed
through limited trials conducted in the original environment. We provide an
efficient method for uncertainty management, which is used as a metric for
self-improvement by identification of the points with maximum expected
improvement through adaptive sampling. Capturing the uncertainty also allows
for better mimicking of the reward responses of the original system. Our
approach enables the use of complex policy structures and reward functions
through a unique combination of model-based and model-free methods, while still
retaining the data efficiency. We demonstrate the validity of our method on
several classic reinforcement learning problems in OpenAI gym. We prove that
our approach offers a better modeling capacity for complex system dynamics as
compared to established methods
Preventing Posterior Collapse with Levenshtein Variational Autoencoder
Variational autoencoders (VAEs) are a standard framework for inducing latent
variable models that have been shown effective in learning text representations
as well as in text generation. The key challenge with using VAEs is the {\it
posterior collapse} problem: learning tends to converge to trivial solutions
where the generators ignore latent variables. In our Levenstein VAE, we propose
to replace the evidence lower bound (ELBO) with a new objective which is simple
to optimize and prevents posterior collapse. Intuitively, it corresponds to
generating a sequence from the autoencoder and encouraging the model to predict
an optimal continuation according to the Levenshtein distance (LD) with the
reference sentence at each time step in the generated sequence. We motivate the
method from the probabilistic perspective by showing that it is closely related
to optimizing a bound on the intractable Kullback-Leibler divergence of an
LD-based kernel density estimator from the model distribution. With this
objective, any generator disregarding latent variables will incur large
penalties and hence posterior collapse does not happen. We relate our approach
to policy distillation \cite{RossGB11} and dynamic oracles \cite{GoldbergN12}.
By considering Yelp and SNLI benchmarks, we show that Levenstein VAE produces
more informative latent representations than alternative approaches to
preventing posterior collapse
Surrogate-based toll optimization in a large-scale heterogeneously congested network
Toll optimization in a large-scale dynamic traffic network is typically
characterized by an expensive-to-evaluate objective function. In this paper, we
propose two toll level problems (TLPs) integrated with a large-scale
simulation-based dynamic traffic assignment (DTA) model of Melbourne,
Australia. The first TLP aims to control the pricing zone (PZ) through a
time-varying joint distance and delay toll (JDDT) such that the network
fundamental diagram (NFD) of the PZ does not enter the congested regime. The
second TLP is built upon the first TLP by further considering the minimization
of the heterogeneity of congestion distribution in the PZ. To solve the two
TLPs, a computationally efficient surrogate-based optimization method, i.e.,
regressing kriging (RK) with expected improvement (EI) sampling, is applied to
approximate the simulation input-output mapping, which can balance well between
local exploitation and global exploration. Results show that the two optimal
TLP solutions reduce the average travel time in the PZ (entire network) by
29.5% (1.4%) and 21.6% (2.5%), respectively. Reducing the heterogeneity of
congestion distribution achieves higher network flows in the PZ and a lower
average travel time or a larger total travel time saving in the entire network.Comment: 16 pages, 7 figure
IRLAS: Inverse Reinforcement Learning for Architecture Search
In this paper, we propose an inverse reinforcement learning method for
architecture search (IRLAS), which trains an agent to learn to search network
structures that are topologically inspired by human-designed network. Most
existing architecture search approaches totally neglect the topological
characteristics of architectures, which results in complicated architecture
with a high inference latency. Motivated by the fact that human-designed
networks are elegant in topology with a fast inference speed, we propose a
mirror stimuli function inspired by biological cognition theory to extract the
abstract topological knowledge of an expert human-design network (ResNeXt). To
avoid raising a too strong prior over the search space, we introduce inverse
reinforcement learning to train the mirror stimuli function and exploit it as a
heuristic guidance for architecture search, easily generalized to different
architecture search algorithms. On CIFAR-10, the best architecture searched by
our proposed IRLAS achieves 2.60% error rate. For ImageNet mobile setting, our
model achieves a state-of-the-art top-1 accuracy 75.28%, while being 2~4x
faster than most auto-generated architectures. A fast version of this model
achieves 10% faster than MobileNetV2, while maintaining a higher accuracy
- …