4,273 research outputs found
Recommended from our members
Metareasoning for Planning and Execution in Autonomous Systems
Metareasoning is the process by which an autonomous system optimizes, specifically monitors and controls, its own planning and execution processes in order to operate more effectively in its environment. As autonomous systems rapidly grow in sophistication and autonomy, the need for metareasoning has become critical for efficient and reliable operation in noisy, stochastic, unstructured domains for long periods of time. This is due to the uncertainty over the limitations of their reasoning capabilities and the range of their potential circumstances. However, despite considerable progress in metareasoning as a whole over the last thirty years, work on metareasoning for planning relies on several assumptions that diminish its accuracy and practical utility in autonomous systems that operate in the real world while work on metareasoning for execution has not seen much attention yet. This dissertation therefore proposes more effective metareasoning for planning while expanding the scope of metareasoning to execution to improve the efficiency of planning and the reliability of execution in autonomous systems.
In particular, we offer a two-pronged framework that introduces metareasoning for efficient planning and reliable execution in autonomous systems. We begin by proposing two forms of metareasoning for efficient planning: (1) a method that determines when to interrupt an anytime algorithm and act on the current solution by using online performance prediction and (2) a method that tunes the hyperparameters of the anytime algorithm at runtime by using deep reinforcement learning. We then propose two forms of metareasoning for reliable execution: (3) a method that recovers from exceptions that can be encountered during operation by using belief space planning and (4) a method that maintains and restores safety during operation by using probabilistic planning
Performing a piece collecting task with a Q-Learning agent
Since the early days of Artificial Intelligence (AI), researchers have tried to design intelligent machines capable of performing specific tasks with few instructions. In the 1950s, Machine Learning (ML) appeared and proposed that the goal might not be to design intelligent machines but machines that are able to learn from data. In the field of ML, Reinforcement Learning (RL) focused all the efforts on designing machines, referred to as agents, which are able to learn not from external data but from data derived from the own machine’s experiences. The key concept of RL is to force agents to learn by providing them rewards depending on the outcome of each of its experiences. Many studies have proposed different approaches to RL systems and found applications in the industrial and manufacturing domain such as supply chain management, robot navigation and control and chemical reaction optimization. The main aim of this thesis is to design an agent with a behaviour based on Reinforcement Learning, capable of performing tasks which could be extrapolated to activities and processes in an industrial environment. Specifically, the studied activity is the navigation control of a robot tasked to collect pieces placed in a two-dimensional environment. The algorithm used to guide the agent’s learning process is one of the most known and used RL methods, Q-Learning. An Artificial Neural Network (ANN) structure, the MultiLayer Perceptron (MLP), is used to approximate the values used by the agent to decide which action to take in each situation. The experiments are designed in order to validate the capability of the agent to perform the task and compare the effect and results of several improvements implemented. The results of the experiments validate the capacity of the agent to perform the task with acceptable results but indicate the agent is able to collect all the pieces in different environment configurations only when the improvements are implemented. These improvements are the addition of an experience replay memory and the observation strategy thanks, to which the agent knows what is it surrounded by. During the experimentation, comparisons between environment configurations and task complexity are done
- …