14,707 research outputs found
Differentiable Algorithm Networks for Composable Robot Learning
This paper introduces the Differentiable Algorithm Network (DAN), a
composable architecture for robot learning systems. A DAN is composed of neural
network modules, each encoding a differentiable robot algorithm and an
associated model; and it is trained end-to-end from data. DAN combines the
strengths of model-driven modular system design and data-driven end-to-end
learning. The algorithms and models act as structural assumptions to reduce the
data requirements for learning; end-to-end learning allows the modules to adapt
to one another and compensate for imperfect models and algorithms, in order to
achieve the best overall system performance. We illustrate the DAN methodology
through a case study on a simulated robot system, which learns to navigate in
complex 3-D environments with only local visual observations and an image of a
partially correct 2-D floor map.Comment: RSS 2019 camera ready. Video is available at
https://youtu.be/4jcYlTSJF4
ELM regime classification by conformal prediction on an information manifold
Characterization and control of plasma instabilities known as edge-localized modes (ELMs) is crucial for the operation of fusion reactors. Recently, machine learning methods have demonstrated good potential in making useful inferences from stochastic fusion data sets. However, traditional classification methods do not offer an inherent estimate of the goodness of their prediction. In this paper, a distance-based conformal predictor classifier integrated with a geometric-probabilistic framework is presented. The first benefit of the approach lies in its comprehensive treatment of highly stochastic fusion data sets, by modeling the measurements with probability distributions in a metric space. This enables calculation of a natural distance measure between probability distributions: the Rao geodesic distance. Second, the predictions are accompanied by estimates of their accuracy and reliability. The method is applied to the classification of regimes characterized by different types of ELMs based on the measurements of global parameters and their error bars. This yields promising success rates and outperforms state-of-the-art automatic techniques for recognizing ELM signatures. The estimates of goodness of the predictions increase the confidence of classification by ELM experts, while allowing more reliable decisions regarding plasma control and at the same time increasing the robustness of the control system
Improving Automated Driving through Planning with Human Internal States
This work examines the hypothesis that partially observable Markov decision
process (POMDP) planning with human driver internal states can significantly
improve both safety and efficiency in autonomous freeway driving. We evaluate
this hypothesis in a simulated scenario where an autonomous car must safely
perform three lane changes in rapid succession. Approximate POMDP solutions are
obtained through the partially observable Monte Carlo planning with observation
widening (POMCPOW) algorithm. This approach outperforms over-confident and
conservative MDP baselines and matches or outperforms QMDP. Relative to the MDP
baselines, POMCPOW typically cuts the rate of unsafe situations in half or
increases the success rate by 50%.Comment: Preprint before submission to IEEE Transactions on Intelligent
Transportation Systems. arXiv admin note: text overlap with arXiv:1702.0085
Model-based Reinforcement Learning with Parametrized Physical Models and Optimism-Driven Exploration
In this paper, we present a robotic model-based reinforcement learning method
that combines ideas from model identification and model predictive control. We
use a feature-based representation of the dynamics that allows the dynamics
model to be fitted with a simple least squares procedure, and the features are
identified from a high-level specification of the robot's morphology,
consisting of the number and connectivity structure of its links. Model
predictive control is then used to choose the actions under an optimistic model
of the dynamics, which produces an efficient and goal-directed exploration
strategy. We present real time experimental results on standard benchmark
problems involving the pendulum, cartpole, and double pendulum systems.
Experiments indicate that our method is able to learn a range of benchmark
tasks substantially faster than the previous best methods. To evaluate our
approach on a realistic robotic control task, we also demonstrate real time
control of a simulated 7 degree of freedom arm.Comment: 8 page
- …