Search CORE

378 research outputs found

Evaluation of automated decisionmaking methodologies and development of an integrated robotic system simulation

Author: Almand B. J.
Depkovich T. M.
Gremban K. D.
Haley D. C.
Kelly J. H.
Krauze L. D.
Sanborn J. C.
Thomas M. M.
Publication venue
Publication date
Field of study

A generic computer simulation for manipulator systems (ROBSIM) was implemented and the specific technologies necessary to increase the role of automation in various missions were developed. The specific items developed are: (1) capability for definition of a manipulator system consisting of multiple arms, load objects, and an environment; (2) capability for kinematic analysis, requirements analysis, and response simulation of manipulator motion; (3) postprocessing options such as graphic replay of simulated motion and manipulator parameter plotting; (4) investigation and simulation of various control methods including manual force/torque and active compliances control; (5) evaluation and implementation of three obstacle avoidance methods; (6) video simulation and edge detection; and (7) software simulation validation

NASA Technical Reports Server

Learning with Training Wheels: Speeding up Training with a Simple Controller for Deep Reinforcement Learning

Author: Markham Andrew
Rosa Stefano
Trigoni Niki
Wang Sen
Xie Linhai
Publication venue
Publication date: 01/01/2018
Field of study

Deep Reinforcement Learning (DRL) has been applied successfully to many robotic applications. However, the large number of trials needed for training is a key issue. Most of existing techniques developed to improve training efficiency (e.g. imitation) target on general tasks rather than being tailored for robot applications, which have their specific context to benefit from. We propose a novel framework, Assisted Reinforcement Learning, where a classical controller (e.g. a PID controller) is used as an alternative, switchable policy to speed up training of DRL for local planning and navigation problems. The core idea is that the simple control law allows the robot to rapidly learn sensible primitives, like driving in a straight line, instead of random exploration. As the actor network becomes more advanced, it can then take over to perform more complex actions, like obstacle avoidance. Eventually, the simple controller can be discarded entirely. We show that not only does this technique train faster, it also is less sensitive to the structure of the DRL network and consistently outperforms a standard Deep Deterministic Policy Gradient network. We demonstrate the results in both simulation and real-world experiments.Comment: Published in ICRA2018. The code is now available at https://github.com/xie9187/AsDDP

arXiv.org e-Print Archive

Heriot Watt Pure

Crossref

Oxford University Research Archive

Recommended from our members

Transformational maintenance by reuse of design histories

Author: Baxter Ira D.
Publication venue: eScholarship, University of California
Publication date: 01/01/1990
Field of study

This thesis provides theory and procedures for modifying software artifacts implemented by a formal transformation process. Installing modifications requires knowing not only what transformations were applied (a derivation history) to construct the artifact, but also why the application sequence ensures that the artifact meets its specification. The derivation history and the justification are collectively called a design history. A Design Maintenance System (DMS), when provided with a formal change called a maintenance delta, revises a design history to guide construction of a new artifact. A DMS can be used to integrate a stream of deltas into a history, providing implementations as a side effect, leading to an incremental-evolution model for software construction.We provide a broadly applicable formal model of transformation systems in which specifications are performance predicates, subsuming the functional specifications which are traditional for transformation systems. Such performance predicates provide vocabulary used in the design history to describe the effect of applying sets of transformations.A nonprocedural, performance-goal-oriented Transformation Control Language (TCL) is defined to control navigation of the design space for a transformation system. Recording the execution of a TCL metaprogram directly provides a design history.A complete classification of, and representation for, the set of possible maintenance deltas is given in terms of the inputs defined by the transformation system model. Such deltas include not only specification changes, but also changes to implementation support technologies. Delta integration procedures for revising derivation histories given functional or support technology deltas are provided, based on rearranging the order of transformations in the design space. Building on these operations, integration procedures that revise the design history for each type of delta are described. An agenda-oriented TCL execution process dovetails smoothly with the integration procedures.Our DMS is compared to a number of other maintenance systems. By using an explicit delta and verified commutativity, our DMS often reuses transformations correctly when others fail

eScholarship - University of California

Using cases utility for heuristic planning improvement

Author: Borrajo Millán Daniel
García Olaya Ángel
La Rosa Tomas de
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Proceedings of: 7th International Conference on Case-Based Reasoning (ICCBR07), Belfast, Northern Ireland, UK, 13 - 16 August 2007Current efficient planners employ an informed search guided by a heuristic function that is quite expensive to compute. Thus, ordering nodes in the search tree becomes a key issue, in order to select efficiently nodes to evaluate from the successors of the current search node. In a previous work, we successfully applied a CBR approach to order nodes for evaluation, thus reducing the number of calls to the heuristic function. However, once cases were learned, they were not modified according to their utility on solving planning problems. We present in this work a scheme for learning case quality based on its utility during a validation phase. The qualities obtained determine the way in which these cases are preferred in the retrieval and replay processes. Then, the paper shows some experimental results for several benchmarks taken from the International Planning Competition (IPC). These results show the planning performance improvement when case utilities are used.Publicad

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Human Management of the Hierarchical System for the Control of Multiple Mobile Robots

Author: Adams Julie A
Publication venue: ScholarlyCommons
Publication date: 01/01/1995
Field of study

In order to take advantage of autonomous robotic systems, and yet ensure successful completion of all feasible tasks, we propose a mediation hierarchy in which an operator can interact at all system levels. Robotic systems are not robust in handling un-modeled events. Reactive behaviors may be able to guide the robot back into a modeled state and to continue. Reasoning systems may simply fail. Once a system has failed it is difficult to re-start the task from the failed state. Rather, the rule base is revised, programs altered, and the task re-tried from the beginning

ScholarlyCommons@Penn

my Human Brain Project (mHBP)

Author: Salvador José António Guerreiro Nunes Sanches
Publication venue
Publication date: 15/10/2021
Field of study

How can we make an agent that thinks like us humans? An agent that can have proprioception, intrinsic motivation, identify deception, use small amounts of energy, transfer knowledge between tasks and evolve? This is the problem that this thesis is focusing on. Being able to create a piece of software that can perform tasks like a human being, is a goal that, if achieved, will allow us to extend our own capabilities to a very high level, and have more tasks performed in a predictable fashion. This is one of the motivations for this thesis. To address this problem, we have proposed a modular architecture for Reinforcement Learning computation and developed an implementation to have this architecture exercised. This software, that we call mHBP, is created in Python using Webots as an environment for the agent, and Neo4J, a graph database, as memory. mHBP takes the sensory data or other inputs, and produces, based on the body parts / tools that the agent has available, an output consisting of actions to perform. This thesis involves experimental design with several iterations, exploring a theoretical approach to RL based on graph databases. We conclude, with our work in this thesis, that it is possible to represent episodic data in a graph, and is also possible to interconnect Webots, Python and Neo4J to support a stable architecture for Reinforcement Learning. In this work we also find a way to search for policies using the Neo4J querying language: Cypher. Another key conclusion of this work is that state representation needs to have further research to find a state definition that enables policy search to produce more useful policies. The article “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” at Research Gate with doi 10.13140/RG.2.2.30323.76327 is an outcome of this thesis.Como podemos criar um agente que pense como nós humanos? Um agente que tenha propriocepção, motivação intrínseca, seja capaz de identificar ilusão, usar pequenas quantidades de energia, transferir conhecimento entre tarefas e evoluir? Este é o problema em que se foca esta tese. Ser capaz de criar uma peça de software que desempenhe tarefas como um ser humano é um objectivo que, se conseguido, nos permitirá estender as nossas capacidades a um nível muito alto, e conseguir realizar mais tarefas de uma forma previsível. Esta é uma das motivações desta tese. Para endereçar este problema, propomos uma arquitectura modular para computação de aprendizagem por reforço e desenvolvemos uma implementação para exercitar esta arquitetura. Este software, ao qual chamamos mHBP, foi criado em Python usando o Webots como um ambiente para o agente, e o Neo4J, uma base de dados de grafos, como memória. O mHBP recebe dados sensoriais ou outros inputs, e produz, baseado nas partes do corpo / ferramentas que o agente tem disponíveis, um output que consiste em ações a desempenhar. Uma boa parte desta tese envolve desenho experimental com diversas iterações, explorando uma abordagem teórica assente em bases de dados de grafos. Concluímos, com o trabalho nesta tese, que é possível representar episódios em um grafo, e que é, também, possível interligar o Webots, com o Python e o Neo4J para suportar uma arquitetura estável para a aprendizagem por reforço. Neste trabalho, também, encontramos uma forma de procurar políticas usando a linguagem de pesquisa do Neo4J: Cypher. Outra conclusão chave deste trabalho é que a representação de estados necessita de mais investigação para encontrar uma definição de estado que permita à pesquisa de políticas produzir políticas que sejam mais úteis. O artigo “REINFORCEMENT LEARNING: A LITERATURE REVIEW (2020)” no Research Gate com o doi 10.13140/RG.2.2.30323.76327 é um sub-produto desta tese

Repositório Institucional do ISCTE-IUL