142 research outputs found
Implementing the Deep Q-Network
The Deep Q-Network proposed by Mnih et al. [2015] has become a benchmark and
building point for much deep reinforcement learning research. However,
replicating results for complex systems is often challenging since original
scientific publications are not always able to describe in detail every
important parameter setting and software engineering solution. In this paper,
we present results from our work reproducing the results of the DQN paper. We
highlight key areas in the implementation that were not covered in great detail
in the original paper to make it easier for researchers to replicate these
results, including termination conditions and gradient descent algorithms.
Finally, we discuss methods for improving the computational performance and
provide our own implementation that is designed to work with a range of
domains, and not just the original Arcade Learning Environment [Bellemare et
al., 2013]
Deep Abstract Q-Networks
We examine the problem of learning and planning on high-dimensional domains
with long horizons and sparse rewards. Recent approaches have shown great
successes in many Atari 2600 domains. However, domains with long horizons and
sparse rewards, such as Montezuma's Revenge and Venture, remain challenging for
existing methods. Methods using abstraction (Dietterich 2000; Sutton, Precup,
and Singh 1999) have shown to be useful in tackling long-horizon problems. We
combine recent techniques of deep reinforcement learning with existing
model-based approaches using an expert-provided state abstraction. We construct
toy domains that elucidate the problem of long horizons, sparse rewards and
high-dimensional inputs, and show that our algorithm significantly outperforms
previous methods on these domains. Our abstraction-based approach outperforms
Deep Q-Networks (Mnih et al. 2015) on Montezuma's Revenge and Venture, and
exhibits backtracking behavior that is absent from previous methods
Estimation for Quadrotors
This document describes standard approaches for filtering and estimation for
quadrotors, created for the Udacity Flying Cars course. We assume previous
knowledge of probability and some knowledge of linear algebra. We do not assume
previous knowledge of Kalman filters or Bayes filters. This document derives an
EKF for various models of drones in 1D, 2D, and 3D. We use the EKF and notation
as defined in Thrun et al. [13]. We also give pseudocode for the Bayes filter,
the EKF, and the Unscented Kalman filter [14]. The motivation behind this
document is the lack of a step-by-step EKF tutorial that provides the
derivations for a quadrotor helicopter. The goal of estimation is to infer the
drone's state (pose, velocity, acceleration, and biases) from its sensor values
and control inputs. This problem is challenging because sensors are noisy.
Additionally, because of weight and cost issues, many drones have limited
on-board computation so we want to estimate these values as quickly as
possible. The standard method for performing this method is the Extended Kalman
filter, a nonlinear extension of the Kalman filter which linearizes a nonlinear
transition and measurement model around the current state. However the
Unscented Kalman filter is better in almost every respect: simpler to
implement, more accurate to estimate, and comparable runtimes
Advantages and Limitations of using Successor Features for Transfer in Reinforcement Learning
One question central to Reinforcement Learning is how to learn a feature
representation that supports algorithm scaling and re-use of learned
information from different tasks. Successor Features approach this problem by
learning a feature representation that satisfies a temporal constraint. We
present an implementation of an approach that decouples the feature
representation from the reward function, making it suitable for transferring
knowledge between domains. We then assess the advantages and limitations of
using Successor Features for transfer
Scanning the Internet for ROS: A View of Security in Robotics Research
Because robots can directly perceive and affect the physical world, security
issues take on particular importance. In this paper, we describe the results of
our work on scanning the entire IPv4 address space of the Internet for
instances of the Robot Operating System (ROS), a widely used robotics platform
for research. Our results identified that a number of hosts supporting ROS are
exposed to the public Internet, thereby allowing anyone to access robotic
sensors and actuators. As a proof of concept, and with consent, we were able to
read image sensor information and move the robot of a research group in a US
university. This paper gives an overview of our findings, including the
geographic distribution of publicly-accessible platforms, the sorts of sensor
and actuator data that is available, as well as the different kinds of robots
and sensors that our scan uncovered. Additionally, we offer recommendations on
best practices to mitigate these security issues in the future.Comment: 10 page
pomdp_py: A Framework to Build and Solve POMDP Problems
In this paper, we present pomdp_py, a general purpose Partially Observable
Markov Decision Process (POMDP) library written in Python and Cython. Existing
POMDP libraries often hinder accessibility and efficient prototyping due to the
underlying programming language or interfaces, and require extra complexity in
software toolchain to integrate with robotics systems. pomdp_py features simple
and comprehensive interfaces capable of describing large discrete or continuous
(PO)MDP problems. Here, we summarize the design principles and describe in
detail the programming model and interfaces in pomdp_py. We also describe
intuitive integration of this library with ROS (Robot Operating System), which
enabled our torso-actuated robot to perform object search in 3D. Finally, we
note directions to improve and extend this library for POMDP planning and
beyond.Comment: 5 pages, 3 figures. Submitted to ICAPS 2020 Planning and Robotics
(PlanRob) Worksho
Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities
Humans can ground natural language commands to tasks at both abstract and
fine-grained levels of specificity. For instance, a human forklift operator can
be instructed to perform a high-level action, like "grab a pallet" or a
low-level action like "tilt back a little bit." While robots are also capable
of grounding language commands to tasks, previous methods implicitly assume
that all commands and tasks reside at a single, fixed level of abstraction.
Additionally, methods that do not use multiple levels of abstraction encounter
inefficient planning and execution times as they solve tasks at a single level
of abstraction with large, intractable state-action spaces closely resembling
real world complexity. In this work, by grounding commands to all the tasks or
subtasks available in a hierarchical planning framework, we arrive at a model
capable of interpreting language at multiple levels of specificity ranging from
coarse to more granular. We show that the accuracy of the grounding procedure
is improved when simultaneously inferring the degree of abstraction in language
used to communicate the task. Leveraging hierarchy also improves efficiency:
our proposed approach enables a robot to respond to a command within one second
on 90% of our tasks, while baselines take over twenty seconds on half the
tasks. Finally, we demonstrate that a real, physical robot can ground commands
at multiple levels of abstraction allowing it to efficiently plan different
subtasks within the same planning hierarchy.Comment: Updated with final version - Published as Conference Paper in
Robotics: Science and Systems 201
Planning with State Abstractions for Non-Markovian Task Specifications
Often times, we specify tasks for a robot using temporal language that can
also span different levels of abstraction. The example command ``go to the
kitchen before going to the second floor'' contains spatial abstraction, given
that ``floor'' consists of individual rooms that can also be referred to in
isolation ("kitchen", for example). There is also a temporal ordering of
events, defined by the word "before". Previous works have used Linear Temporal
Logic (LTL) to interpret temporal language (such as "before"), and Abstract
Markov Decision Processes (AMDPs) to interpret hierarchical abstractions (such
as "kitchen" and "second floor"), separately. To handle both types of commands
at once, we introduce the Abstract Product Markov Decision Process (AP-MDP), a
novel approach capable of representing non-Markovian reward functions at
different levels of abstractions. The AP-MDP framework translates LTL into its
corresponding automata, creates a product Markov Decision Process (MDP) of the
LTL specification and the environment MDP, and decomposes the problem into
subproblems to enable efficient planning with abstractions. AP-MDP performs
faster than a non-hierarchical method of solving LTL problems in over 95% of
tasks, and this number only increases as the size of the environment domain
increases. We also present a neural sequence-to-sequence model trained to
translate language commands into LTL expression, and a new corpus of
non-Markovian language commands spanning different levels of abstraction. We
test our framework with the collected language commands on a drone,
demonstrating that our approach enables a robot to efficiently solve temporal
commands at different levels of abstraction
Grounding Language Attributes to Objects using Bayesian Eigenobjects
We develop a system to disambiguate object instances within the same class
based on simple physical descriptions. The system takes as input a natural
language phrase and a depth image containing a segmented object and predicts
how similar the observed object is to the object described by the phrase. Our
system is designed to learn from only a small amount of human-labeled language
data and generalize to viewpoints not represented in the language-annotated
depth image training set. By decoupling 3D shape representation from language
representation, this method is able to ground language to novel objects using a
small amount of language-annotated depth-data and a larger corpus of unlabeled
3D object meshes, even when these objects are partially observed from unusual
viewpoints. Our system is able to disambiguate between novel objects, observed
via depth images, based on natural language descriptions. Our method also
enables view-point transfer; trained on human-annotated data on a small set of
depth images captured from frontal viewpoints, our system successfully
predicted object attributes from rear views despite having no such depth images
in its training set. Finally, we demonstrate our approach on a Baxter robot,
enabling it to pick specific objects based on human-provided natural language
descriptions
Generating Handwriting via Decoupled Style Descriptors
Representing a space of handwriting stroke styles includes the challenge of
representing both the style of each character and the overall style of the
human writer. Existing VRNN approaches to representing handwriting often do not
distinguish between these different style components, which can reduce model
capability. Instead, we introduce the Decoupled Style Descriptor (DSD) model
for handwriting, which factors both character- and writer-level styles and
allows our model to represent an overall greater space of styles. This approach
also increases flexibility: given a few examples, we can generate handwriting
in new writer styles, and also now generate handwriting of new characters
across writer styles. In experiments, our generated results were preferred over
a state of the art baseline method 88% of the time, and in a writer
identification task on 20 held-out writers, our DSDs achieved 89.38% accuracy
from a single sample word. Overall, DSDs allows us to improve both the quality
and flexibility over existing handwriting stroke generation approaches.Comment: European Conference on Computer Vision (ECCV) 202
- …