4,273 research outputs found

    Performing a piece collecting task with a Q-Learning agent

    Get PDF
    Since the early days of Artificial Intelligence (AI), researchers have tried to design intelligent machines capable of performing specific tasks with few instructions. In the 1950s, Machine Learning (ML) appeared and proposed that the goal might not be to design intelligent machines but machines that are able to learn from data. In the field of ML, Reinforcement Learning (RL) focused all the efforts on designing machines, referred to as agents, which are able to learn not from external data but from data derived from the own machine’s experiences. The key concept of RL is to force agents to learn by providing them rewards depending on the outcome of each of its experiences. Many studies have proposed different approaches to RL systems and found applications in the industrial and manufacturing domain such as supply chain management, robot navigation and control and chemical reaction optimization. The main aim of this thesis is to design an agent with a behaviour based on Reinforcement Learning, capable of performing tasks which could be extrapolated to activities and processes in an industrial environment. Specifically, the studied activity is the navigation control of a robot tasked to collect pieces placed in a two-dimensional environment. The algorithm used to guide the agent’s learning process is one of the most known and used RL methods, Q-Learning. An Artificial Neural Network (ANN) structure, the MultiLayer Perceptron (MLP), is used to approximate the values used by the agent to decide which action to take in each situation. The experiments are designed in order to validate the capability of the agent to perform the task and compare the effect and results of several improvements implemented. The results of the experiments validate the capacity of the agent to perform the task with acceptable results but indicate the agent is able to collect all the pieces in different environment configurations only when the improvements are implemented. These improvements are the addition of an experience replay memory and the observation strategy thanks, to which the agent knows what is it surrounded by. During the experimentation, comparisons between environment configurations and task complexity are done
    • …
    corecore