6 research outputs found

    A hierarchical two-tier approach to hyper-parameter optimization in reinforcement learning

    Get PDF
    Optimization of hyper-parameters in real-world applications of reinforcement learning (RL) is a key issue, because their settings determine how fast the agent will learn its policy by interacting with its environment due to the information content of data gathered. In this work, an approach that uses Bayesian optimization to perform an autonomous two-tier optimization of both representation decisions and algorithm hyper-parameters is proposed: first, categorical / structural RL hyper-parameters are taken as binary variables and optimized with an acquisition function tailored for such type of variables. Then, at a lower level of abstraction, solution-level hyper-parameters are optimized by resorting to the expected improvement acquisition function, whereas the categorical hyper-parameters found in the optimization at the upperlevelof abstraction are fixed. This two-tier approach is validated with a tabular and neural network setting of the value function, in a classic simulated control task. Results obtained are promising and open the way for more user-independent applications of reinforcement learning.Fil: Barsce, Juan Cruz. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; ArgentinaFil: Palombarini, Jorge Andrés. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Córdoba. Centro de Investigaciones y Transferencia de Villa María. Universidad Nacional de Villa María. Centro de Investigaciones y Transferencia de Villa María; Argentina. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; ArgentinaFil: Martínez, Ernesto Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Desarrollo y Diseño. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Instituto de Desarrollo y Diseño; Argentin

    SmartGantt - An intelligent system for real time rescheduling based on relational reinforcement learning

    Get PDF
    With the current trend towards cognitive manufacturing systems to deal with unforeseen events and disturbances that constantly demand real-time repair decisions, learning/reasoning skills and interactive capabilities are important functionalities for rescheduling a shop-floor on the fly taking into account several objectives and goal states. In this work, the automatic generation and update through learning of rescheduling knowledge using simulated transitions of abstract schedule states is proposed. Deictic representations of schedules based on focal points are used to define a repair policy which generates a goal-directed sequence of repair operators to face unplanned events and operational disturbances. An industrial example where rescheduling is needed due to the arrival of a new/rush order, or whenever raw material delay/shortage or machine breakdown events occur are discussed using the SmartGantt prototype for interactive rescheduling in real-time. SmartGantt demonstrates that due date compliance of orders-in-progress, negotiating delivery conditions of new orders and ensuring distributed production control can be dramatically improved by means of relational reinforcement learning and a deictic representation of rescheduling tasks.Fil: Palombarini, Jorge Andrés. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; ArgentinaFil: Martínez, Ernesto Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Desarrollo y Diseño. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Instituto de Desarrollo y Diseño; Argentin

    End-to-end on-line rescheduling from Gantt chart images using deep reinforcement learning

    No full text
    With the advent of the socio-technical manufacturing paradigm, the way in which reschedulingdecisions are taken at the shop floor has radically changed in order to guarantee highly efficient production under increasingly dynamic conditions. To cope with uncertain production environments, a drastic increase in the type and degree of automation used at the shop floor for handling unforeseen events and unplanned disturbances is required. In this work, the on-line rescheduling task is modelled as a closed-loop control problem in which an artificial autonomous agent implements a control policy generated off-line using a schedule simulator to learn schedule repair policies directly from high-dimensional sensory inputs. The rescheduling control policy is stored in a deep neural network, which is used to select repair actions in order to achieve a small set of repaired goal states. The rescheduling agent is trained using Proximal Policy Optimisation based on a wide variety of simulated transitions between schedule states using colour-rich Gantt chart images and negligible prior knowledge as inputs. An industrial example is discussed to highlight that the proposed approach enables end-to-end deep learning of successful rescheduling policies to encode task-specific control knowledge that can be understood by human experts.Fil: Palombarini, Jorge Andrés. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas; ArgentinaFil: Martínez, Ernesto Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Desarrollo y Diseño. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Instituto de Desarrollo y Diseño; Argentin

    Real-time rescheduling of production systems using relational reinforcement learning

    Get PDF
    Most scheduling methodologies developed until now have laid down good theoretical foundations, but there is still the need for real-time rescheduling methods that can work effectively in disruption management. In this work, a novel approach for automatic generation of rescheduling knowledge using Relational Reinforcement Learning (RRL) is presented. Relational representations of schedule states and repair operators enable to encode in a compact way and use in real-time rescheduling knowledge learned through intensive simulations of state transitions. An industrial example where a current schedule must be repaired following the arrival of a new order is discussed using a prototype application – SmartGantt®- for interactive rescheduling in a reactive way. SmartGantt® demonstrates the advantages of resorting to RRL and abstract states for real-time rescheduling. A small number of training episodes are required to define a repair policy which can handle on the fly events such as order insertion, resource break-down, raw material delay or shortage and rush order arrivals using a sequence of operators to achieve a selected goal.Fil: Palombarini, Jorge Andrés. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; ArgentinaFil: Martínez, Ernesto Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Desarrollo y Diseño. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Instituto de Desarrollo y Diseño; Argentin

    Task Rescheduling using Relational Reinforcement Learning

    Get PDF
    Generating and representing knowledge about heuristics for repair-based scheduling is a key issue in any rescheduling strategy to deal with unforeseen events and disturbances. Resorting to a feature-based propositional representation of schedule states is very inefficient and generalization to unseen states is highly unreliable whereas knowledge transfer to similar scheduling domains is difficult. In contrast, first-order relational representations enable the exploitation of the existence of domain objects and relations over these objects, and enable the use of quantification over objectives (goals), action effects and properties of states. In this work, a novel approach which formalizes the re-scheduling problem as a Relational Markov Decision Process integrating first-order (deictic)representations of (abstract) schedule states is presented. Task rescheduling is solved using a relational reinforcement learning algorithm implemented in a real-time prototype system which makes room for an interactive scheduling strategy that successfully handle different repair goals and disruption scenarios. An industrial case study vividly shows how relational abstractions provide compact repair policies with minor computational efforts.Fil: Palombarini, Jorge Andrés. Universidad Tecnologica Nacional. Facultad Regional Villa Maria; ArgentinaFil: Martínez, Ernesto Carlos. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Santa Fe. Instituto de Desarrollo y Diseño. Universidad Tecnológica Nacional. Facultad Regional Santa Fe. Instituto de Desarrollo y Diseño; Argentin

    Towards Autonomous Reinforcement Learning: Automatic Setting of Hyper-parameters using Bayesian Optimization

    Get PDF
    With the increase of machine learning usage by industries and scientific communities in a variety of tasks such as text mining, image recognition and self-driving cars, automatic setting of hyper-parameter in learning algorithms is a key factor for obtaining good performances regardless of user expertise in the inner workings of the techniques and methodologies. In particular, for a reinforcement learning algorithm, the efficiency of an agent learning a control policy in an uncertain environment is heavily dependent on the hyper-parameters used to balance exploration with exploitation. In this work, an autonomous learning framework that integrates Bayesian optimization with Gaussian process regression to optimize the hyper-parameters of a reinforcement learning algorithm, is proposed. Also, a bandits-based approach to achieve a balance between computational costs and decreasing uncertainty about the \textit{Q}-values, is presented. A gridworld example is used to highlight how hyper-parameter configurations of a learning algorithm (SARSA) are iteratively improved based on two performance functions
    corecore