20,137 research outputs found

    SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning

    Full text link
    Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner -- controller -- meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches

    On the Benefits of Inoculation, an Example in Train Scheduling

    Get PDF
    The local reconstruction of a railway schedule following a small perturbation of the traffic, seeking minimization of the total accumulated delay, is a very difficult and tightly constrained combinatorial problem. Notoriously enough, the railway company's public image degrades proportionally to the amount of daily delays, and the same goes for its profit! This paper describes an inoculation procedure which greatly enhances an evolutionary algorithm for train re-scheduling. The procedure consists in building the initial population around a pre-computed solution based on problem-related information available beforehand. The optimization is performed by adapting times of departure and arrival, as well as allocation of tracks, for each train at each station. This is achieved by a permutation-based evolutionary algorithm that relies on a semi-greedy heuristic scheduler to gradually reconstruct the schedule by inserting trains one after another. Experimental results are presented on various instances of a large real-world case involving around 500 trains and more than 1 million constraints. In terms of competition with commercial math ematical programming tool ILOG CPLEX, it appears that within a large class of instances, excluding trivial instances as well as too difficult ones, and with very few exceptions, a clever initialization turns an encouraging failure into a clear-cut success auguring of substantial financial savings
    • …
    corecore