53,439 research outputs found
Neural Network Memory Architectures for Autonomous Robot Navigation
This paper highlights the significance of including memory structures in
neural networks when the latter are used to learn perception-action loops for
autonomous robot navigation. Traditional navigation approaches rely on global
maps of the environment to overcome cul-de-sacs and plan feasible motions. Yet,
maintaining an accurate global map may be challenging in real-world settings. A
possible way to mitigate this limitation is to use learning techniques that
forgo hand-engineered map representations and infer appropriate control
responses directly from sensed information. An important but unexplored aspect
of such approaches is the effect of memory on their performance. This work is a
first thorough study of memory structures for deep-neural-network-based robot
navigation, and offers novel tools to train such networks from supervision and
quantify their ability to generalize to unseen scenarios. We analyze the
separation and generalization abilities of feedforward, long short-term memory,
and differentiable neural computer networks. We introduce a new method to
evaluate the generalization ability by estimating the VC-dimension of networks
with a final linear readout layer. We validate that the VC estimates are good
predictors of actual test performance. The reported method can be applied to
deep learning problems beyond robotics
Chain: A Dynamic Double Auction Framework for Matching Patient Agents
In this paper we present and evaluate a general framework for the design of
truthful auctions for matching agents in a dynamic, two-sided market. A single
commodity, such as a resource or a task, is bought and sold by multiple buyers
and sellers that arrive and depart over time. Our algorithm, Chain, provides
the first framework that allows a truthful dynamic double auction (DA) to be
constructed from a truthful, single-period (i.e. static) double-auction rule.
The pricing and matching method of the Chain construction is unique amongst
dynamic-auction rules that adopt the same building block. We examine
experimentally the allocative efficiency of Chain when instantiated on various
single-period rules, including the canonical McAfee double-auction rule. For a
baseline we also consider non-truthful double auctions populated with
zero-intelligence plus"-style learning agents. Chain-based auctions perform
well in comparison with other schemes, especially as arrival intensity falls
and agent valuations become more volatile
Probabilistic inverse reinforcement learning in unknown environments
We consider the problem of learning by demonstration from agents acting in
unknown stochastic Markov environments or games. Our aim is to estimate agent
preferences in order to construct improved policies for the same task that the
agents are trying to solve. To do so, we extend previous probabilistic
approaches for inverse reinforcement learning in known MDPs to the case of
unknown dynamics or opponents. We do this by deriving two simplified
probabilistic models of the demonstrator's policy and utility. For
tractability, we use maximum a posteriori estimation rather than full Bayesian
inference. Under a flat prior, this results in a convex optimisation problem.
We find that the resulting algorithms are highly competitive against a variety
of other methods for inverse reinforcement learning that do have knowledge of
the dynamics.Comment: Appears in Proceedings of the Twenty-Ninth Conference on Uncertainty
in Artificial Intelligence (UAI2013
Beauty and the Beast: Optimal Methods Meet Learning for Drone Racing
Autonomous micro aerial vehicles still struggle with fast and agile
maneuvers, dynamic environments, imperfect sensing, and state estimation drift.
Autonomous drone racing brings these challenges to the fore. Human pilots can
fly a previously unseen track after a handful of practice runs. In contrast,
state-of-the-art autonomous navigation algorithms require either a precise
metric map of the environment or a large amount of training data collected in
the track of interest. To bridge this gap, we propose an approach that can fly
a new track in a previously unseen environment without a precise map or
expensive data collection. Our approach represents the global track layout with
coarse gate locations, which can be easily estimated from a single
demonstration flight. At test time, a convolutional network predicts the poses
of the closest gates along with their uncertainty. These predictions are
incorporated by an extended Kalman filter to maintain optimal
maximum-a-posteriori estimates of gate locations. This allows the framework to
cope with misleading high-variance estimates that could stem from poor
observability or lack of visible gates. Given the estimated gate poses, we use
model predictive control to quickly and accurately navigate through the track.
We conduct extensive experiments in the physical world, demonstrating agile and
robust flight through complex and diverse previously-unseen race tracks. The
presented approach was used to win the IROS 2018 Autonomous Drone Race
Competition, outracing the second-placing team by a factor of two.Comment: 6 pages (+1 references
Driving with Style: Inverse Reinforcement Learning in General-Purpose Planning for Automated Driving
Behavior and motion planning play an important role in automated driving.
Traditionally, behavior planners instruct local motion planners with predefined
behaviors. Due to the high scene complexity in urban environments,
unpredictable situations may occur in which behavior planners fail to match
predefined behavior templates. Recently, general-purpose planners have been
introduced, combining behavior and local motion planning. These general-purpose
planners allow behavior-aware motion planning given a single reward function.
However, two challenges arise: First, this function has to map a complex
feature space into rewards. Second, the reward function has to be manually
tuned by an expert. Manually tuning this reward function becomes a tedious
task. In this paper, we propose an approach that relies on human driving
demonstrations to automatically tune reward functions. This study offers
important insights into the driving style optimization of general-purpose
planners with maximum entropy inverse reinforcement learning. We evaluate our
approach based on the expected value difference between learned and
demonstrated policies. Furthermore, we compare the similarity of human driven
trajectories with optimal policies of our planner under learned and
expert-tuned reward functions. Our experiments show that we are able to learn
reward functions exceeding the level of manual expert tuning without prior
domain knowledge.Comment: Appeared at IROS 2019. Accepted version. Added/updated footnote,
minor correction in preliminarie
- …