Search CORE

21 research outputs found

Towards effective planning strategies for robots in recycling

Author: Suárez Hernández Alejandro
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2018
Field of study

This work presents several ideas for planning under uncertainty. We seek to recycle electromechanical devices with a robotic arm. We resort to the Markov Decision Process formulation. In order to avoid scalability issues, we employ determinization techniques and hierarchical planning

Self-Management in Urban Traﬃc Control – an Automated Planning Perspective

Author: Jimoh Falilat
McCluskey T.L.
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2016
Field of study

Advanced urban traﬃc control systems are often based on feed-back algorithms. They use road traﬃc data which has been gathered from a couple of minutes to several years. For instance, current traﬃc control systems often operate on the basis of adaptive green phases and ﬂexible co-ordination in road (sub) networks based on measured traﬃc conditions. However, these approaches are still not very eﬃcient during unforeseen situations such as road incidents when changes in traﬃc are requested in a short time interval. For such anomalies, we argue that systems are needed that can sense, interpret and deliberate with their actions and goals to be achieved, taking into consideration continuous changes in state, required service level and environmental constraints. The requirement of such systems is that they can plan and act eﬀectively after such deliberation, so that behaviourally they appear self-aware. This chapter focuses on the design of a generic architecture for auto- nomic urban traﬃc control, to enable the network to manage itself both in normal operation and in unexpected scenarios. The reasoning and self- management aspects are implemented using automated planning techniques inspired by both the symbolic artiﬁcial intelligence and traditional control engineering.Preliminary test results of the plan generation phase of the architecture are considered and evaluated

University of Huddersfield Repository

Huddersfield Research Portal

Learning relational models with human interaction for planning in robotics

Author: Martínez Martínez David
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2017
Field of study

Automated planning has proven to be useful to solve problems where an agent has to maximize a reward function by executing actions. As planners have been improved to salve more expressive and difficult problems, there is an increasing interest in using planning to improve efficiency in robotic tasks. However, planners rely on a domain model, which has to be either handcrafted or learned. Although learning domain models can be very costly, recent approaches provide generalization capabilities and integrate human feedback to reduce the amount of experiences required to learn. In this thesis we propase new methods that allow an agent with no previous knowledge to solve certain problems more efficiently by using task planning. First, we show how to apply probabilistic planning to improve robot performance in manipulation tasks (such as cleaning the dirt or clearing the tableware on a table). Planners obtain sequences of actions that get the best result in the long term, beating reactive strategies. Second, we introduce new reinforcement learning algorithms where the agent can actively request demonstrations from a teacher to learn new actions and speed up the learning process. In particular, we propase an algorithm that allows the user to set the mínimum quality to be achieved, where a better quality also implies that a larger number of demonstrations will be requested . Moreover, the learned model is analyzed to extract the unlearned or problematic parts of the model. This information allow the agent to provide guidance to the teacher when a demonstration is requested, and to avoid irrecoverable errors. Finally, a new domain model learner is introduced that, in addition to relational probabilistic action models, can also learn exogenous effects. This learner can be integrated with existing planners and reinforcement learning algorithms to salve a wide range of problems. In summary, we improve the use of learning and task planning to salve unknown tasks. The improvements allow an agent to obtain a larger benefit from planners, learn faster, balance the number of action executions and teacher demonstrations, avoid irrecoverable errors, interact with a teacher to solve difficult problems, and adapt to the behavior of other agents by learning their dynamics. All the proposed methods were compared with state-of-the-art approaches, and were also demonstrated in different scenarios, including challenging robotic tasks.La planificación automática ha probado ser de gran utilidad para resolver problemas en los que un agente tiene que ejecutar acciones para maximizar una función de recompensa. A medida que los planificadores han sido capaces de resolver problemas cada vez más complejos, ha habido un creciente interés por utilizar dichos planificadores para mejorar la eficiencia de tareas robóticas. Sin embargo, los planificadores requieren un modelo del dominio, el cual puede ser creado a mano o aprendido. Aunque aprender modelos automáticamente puede ser costoso, recientemente han aparecido métodos que permiten la interacción persona-máquina y generalizan el conocimiento para reducir la cantidad de experiencias requeridas para aprender. En esta tesis proponemos nuevos métodos que permiten a un agente sin conocimiento previo de la tarea resolver problemas de forma más eficiente mediante el uso de planificación automática. Comenzaremos mostrando cómo aplicar planificación probabilística para mejorar la eficiencia de robots en tareas de manipulación (como limpiar suciedad o recoger una mesa). Los planificadores son capaces de obtener las secuencias de acciones que producen los mejores resultados a largo plazo, superando a las estrategias reactivas. Por otro lado, presentamos nuevos algoritmos de aprendizaje por refuerzo en los que el agente puede solicitar demostraciones a un profesor. Dichas demostraciones permiten al agente acelerar el aprendizaje o aprender nuevas acciones. En particular, proponemos un algoritmo que permite al usuario establecer la mínima suma de recompensas que es aceptable obtener, donde una recompensa más alta implica que se requerirán más demostraciones. Además, el modelo aprendido será analizado para identificar qué partes están incompletas o son problemáticas. Esta información permitirá al agente evitar errores irrecuperables y también guiar al profesor cuando se solicite una demostración. Finalmente, se ha introducido un nuevo método de aprendizaje para modelos de dominios que, además de obtener modelos relacionales de acciones probabilísticas, también puede aprender efectos exógenos. Mostraremos cómo integrar este método en algoritmos de aprendizaje por refuerzo para poder abordar una mayor cantidad de problemas. En resumen, hemos mejorado el uso de técnicas de aprendizaje y planificación para resolver tareas desconocidas a priori. Estas mejoras permiten a un agente aprovechar mejor los planificadores, aprender más rápido, elegir entre reducir el número de acciones ejecutadas o el número de demostraciones solicitadas, evitar errores irrecuperables, interactuar con un profesor para resolver problemas complejos, y adaptarse al comportamiento de otros agentes aprendiendo sus dinámicas. Todos los métodos propuestos han sido comparados con trabajos del estado del arte, y han sido evaluados en distintos escenarios, incluyendo tareas robóticas

Relational knowledge and representation for reinforcement learning

Author: Ng Jun Hao Alvin
Publication venue: Engineering and Physical Sciences
Publication date: 01/08/2022
Field of study

In reinforcement learning, an agent interacts with the environment, learns from feedback about the quality of its actions, and improves its behaviour or policy in order to maximise its expected utility. Learning efficiently in large scale problems is a major challenge. State aggregation is possible in problems with a first-order structure, allowing the agent to learn in an abstraction of the original problem which is of considerably smaller scale. One approach is to learn the Q-values of actions which are approximated by a relational function approximator. This is the basis for relational reinforcement learning (RRL). We abstract the state with first-order features which consist of only variables, thereby aggregating similar states from all problems of the same domain to abstract states. We study the limitations of RRL due to this abstraction and introduce the concepts of consistent abstraction, subsumption of problems, and abstract-equivalent problems. We propose three methods to overcome the limitations, extending the types of problems our RRL method can solve. Next, to further improve the learning efficiency, we propose to learn different types of generalised knowledge. The policy is influenced by directed exploration based on multiple types of intrinsic rewards and avoids previously encountered dead ends. In addition, we incorporate model-based techniques to provide better quality estimates of the Q-values. Transfer learning is possible by directly leveraging the generalised knowledge to accelerate learning in a new problem. Lastly, we introduce a new class of problems which considers dynamic objects and time-bounded goals. We discuss the complications these bring to RRL and present some solutions. We also propose a framework for multi-agent coordination to achieve joint goals represented by time-bounded goals by decomposing a multi-agent problem into single-agent problems. We evaluate our work empirically in six domains to demonstrate its efficacy in solving large scale problems and transfer learning