3,602 research outputs found
Learning of Generalized Manipulation Strategies in Service Robotics
This thesis makes a contribution to autonomous robotic manipulation. The core is a novel constraint-based representation of manipulation tasks suitable for flexible online motion planning. Interactive learning from natural human demonstrations is combined with parallelized optimization to enable efficient learning of complex manipulation tasks with limited training data. Prior planning results are encoded automatically into the model to reduce planning time and solve the correspondence problem
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Coordinating agents to complete a set of tasks with intercoupled temporal and
resource constraints is computationally challenging, yet human domain experts
can solve these difficult scheduling problems using paradigms learned through
years of apprenticeship. A process for manually codifying this domain knowledge
within a computational framework is necessary to scale beyond the
``single-expert, single-trainee" apprenticeship model. However, human domain
experts often have difficulty describing their decision-making processes,
causing the codification of this knowledge to become laborious. We propose a
new approach for capturing domain-expert heuristics through a pairwise ranking
formulation. Our approach is model-free and does not require enumerating or
iterating through a large state space. We empirically demonstrate that this
approach accurately learns multifaceted heuristics on a synthetic data set
incorporating job-shop scheduling and vehicle routing problems, as well as on
two real-world data sets consisting of demonstrations of experts solving a
weapon-to-target assignment problem and a hospital resource allocation problem.
We also demonstrate that policies learned from human scheduling demonstration
via apprenticeship learning can substantially improve the efficiency of a
branch-and-bound search for an optimal schedule. We employ this human-machine
collaborative optimization technique on a variant of the weapon-to-target
assignment problem. We demonstrate that this technique generates solutions
substantially superior to those produced by human domain experts at a rate up
to 9.5 times faster than an optimization approach and can be applied to
optimally solve problems twice as complex as those solved by a human
demonstrator.Comment: Portions of this paper were published in the Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and
in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper
consists of 50 pages with 11 figures and 4 table
Recommended from our members
End-to-end deep reinforcement learning in computer systems
Abstract
The growing complexity of data processing systems has long led systems designers to imagine systems (e.g. databases, schedulers) which can self-configure and adapt based on environmental cues. In this context, reinforcement learning (RL) methods have since their inception appealed to systems developers. They promise to acquire complex decision policies from raw feedback signals. Despite their conceptual popularity, RL methods are scarcely found in real-world data processing systems. Recently, RL has seen explosive growth in interest due to high profile successes when utilising large neural networks (deep reinforcement learning). Newly emerging machine learning frameworks and powerful hardware accelerators have given rise to a plethora of new potential applications.
In this dissertation, I first argue that in order to design and execute deep RL algorithms efficiently, novel software abstractions are required which can accommodate the distinct computational patterns of communication-intensive and fast-evolving algorithms. I propose an architecture which decouples logical algorithm construction from local and distributed execution semantics. I further present RLgraph, my proof-of-concept implementation of this architecture. In RLgraph, algorithm developers can explore novel designs by constructing a high-level data flow graph through combination of logical components. This dataflow graph is independent of specific backend frameworks or notions of execution, and is only later mapped to execution semantics via a staged build process. RLgraph enables high-performing algorithm implementations while maintaining flexibility for rapid prototyping.
Second, I investigate reasons for the scarcity of RL applications in systems themselves. I argue that progress in applied RL is hindered by a lack of tools for task model design which bridge the gap between systems and algorithms, and also by missing shared standards for evaluation of model capabilities. I introduce Wield, a first-of-its-kind tool for incremental model design in applied RL. Wield provides a small set of primitives which decouple systems interfaces and deployment-specific configuration from representation. Core to Wield is a novel instructive experiment protocol called progressive randomisation which helps practitioners to incrementally evaluate different dimensions of non-determinism. I demonstrate how Wield and progressive randomisation can be used to reproduce and assess prior work, and to guide implementation of novel RL applications
Planning For Non-Player Characters By Learning From Demonstration
In video games, state of the art non-player character (NPC) behavior generation typically depends on hard-coding NPC actions. In many game situations however, it is hard to foresee how an NPC should behave to appear intelligent or to accommodate human preferences for NPC behavior. We advocate the creation of a more flexible method to allow players (and developers) to train NPCs to execute novel behaviors which are not hard-coded. In particular, we investigate search-based planning approaches using demonstration to guide the search through high-dimensional spaces that represent the full state of the game. To this end, we developed the Training Graph heuristic, an extension of the Experience Graph heuristic, that guides a search smoothly and effectively even when a demonstration is unreachable in the search space, and ensures that more of the demonstrations are utilized to better train the NPC\u27s behavior. To deal with variance in the initial conditions of such planning problems, we have developed heuristics in the Multi-Heuristic A* framework to adapt demonstration trace data to new problems. We evaluate our approach in the Creation Engine game engine by modifying The Elder Scrolls V: Skyrim (Skyrim) to accommodate our NPC behavior generators and experiments. In Skyrim, players are given quests which are composed of several objectives. NPCs in the game sometimes accompany the player on quests, but state-of-the-art companion NPC AI is not sophisticated enough to behave according to arbitrary player desires. We hope that our work will lead to the creation of trainable NPC AI. This will enable novel gameplay mechanics for video game players and may augment video game production by allowing developers to train NPCs instead of hard-coding complex behaviors
A Review of Symbolic, Subsymbolic and Hybrid Methods for Sequential Decision Making
The field of Sequential Decision Making (SDM) provides tools for solving
Sequential Decision Processes (SDPs), where an agent must make a series of
decisions in order to complete a task or achieve a goal. Historically, two
competing SDM paradigms have view for supremacy. Automated Planning (AP)
proposes to solve SDPs by performing a reasoning process over a model of the
world, often represented symbolically. Conversely, Reinforcement Learning (RL)
proposes to learn the solution of the SDP from data, without a world model, and
represent the learned knowledge subsymbolically. In the spirit of
reconciliation, we provide a review of symbolic, subsymbolic and hybrid methods
for SDM. We cover both methods for solving SDPs (e.g., AP, RL and techniques
that learn to plan) and for learning aspects of their structure (e.g., world
models, state invariants and landmarks). To the best of our knowledge, no other
review in the field provides the same scope. As an additional contribution, we
discuss what properties an ideal method for SDM should exhibit and argue that
neurosymbolic AI is the current approach which most closely resembles this
ideal method. Finally, we outline several proposals to advance the field of SDM
via the integration of symbolic and subsymbolic AI
Reducing the Barrier to Entry of Complex Robotic Software: a MoveIt! Case Study
Developing robot agnostic software frameworks involves synthesizing the
disparate fields of robotic theory and software engineering while
simultaneously accounting for a large variability in hardware designs and
control paradigms. As the capabilities of robotic software frameworks increase,
the setup difficulty and learning curve for new users also increase. If the
entry barriers for configuring and using the software on robots is too high,
even the most powerful of frameworks are useless. A growing need exists in
robotic software engineering to aid users in getting started with, and
customizing, the software framework as necessary for particular robotic
applications. In this paper a case study is presented for the best practices
found for lowering the barrier of entry in the MoveIt! framework, an
open-source tool for mobile manipulation in ROS, that allows users to 1)
quickly get basic motion planning functionality with minimal initial setup, 2)
automate its configuration and optimization, and 3) easily customize its
components. A graphical interface that assists the user in configuring MoveIt!
is the cornerstone of our approach, coupled with the use of an existing
standardized robot model for input, automatically generated robot-specific
configuration files, and a plugin-based architecture for extensibility. These
best practices are summarized into a set of barrier to entry design principles
applicable to other robotic software. The approaches for lowering the entry
barrier are evaluated by usage statistics, a user survey, and compared against
our design objectives for their effectiveness to users
Artificial Intelligence Research Branch future plans
This report contains information on the activities of the Artificial Intelligence Research Branch (FIA) at NASA Ames Research Center (ARC) in 1992, as well as planned work in 1993. These activities span a range from basic scientific research through engineering development to fielded NASA applications, particularly those applications that are enabled by basic research carried out in FIA. Work is conducted in-house and through collaborative partners in academia and industry. All of our work has research themes with a dual commitment to technical excellence and applicability to NASA short, medium, and long-term problems. FIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at the Jet Propulsion Laboratory (JPL) and AI applications groups throughout all NASA centers. This report is organized along three major research themes: (1) Planning and Scheduling: deciding on a sequence of actions to achieve a set of complex goals and determining when to execute those actions and how to allocate resources to carry them out; (2) Machine Learning: techniques for forming theories about natural and man-made phenomena; and for improving the problem-solving performance of computational systems over time; and (3) Research on the acquisition, representation, and utilization of knowledge in support of diagnosis design of engineered systems and analysis of actual systems
- …