20,137 research outputs found
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Deep reinforcement learning (DRL) has gained great success by learning
directly from high-dimensional sensory inputs, yet is notorious for the lack of
interpretability. Interpretability of the subtasks is critical in hierarchical
decision-making as it increases the transparency of black-box-style DRL
approach and helps the RL practitioners to understand the high-level behavior
of the system better. In this paper, we introduce symbolic planning into DRL
and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can
handle both high-dimensional sensory inputs and symbolic planning. The
task-level interpretability is enabled by relating symbolic actions to
options.This framework features a planner -- controller -- meta-controller
architecture, which takes charge of subtask scheduling, data-driven subtask
learning, and subtask evaluation, respectively. The three components
cross-fertilize each other and eventually converge to an optimal symbolic plan
along with the learned subtasks, bringing together the advantages of long-term
planning capability with symbolic knowledge and end-to-end reinforcement
learning directly from a high-dimensional sensory input. Experimental results
validate the interpretability of subtasks, along with improved data efficiency
compared with state-of-the-art approaches
On the Benefits of Inoculation, an Example in Train Scheduling
The local reconstruction of a railway schedule following a small perturbation
of the traffic, seeking minimization of the total accumulated delay, is a very
difficult and tightly constrained combinatorial problem. Notoriously enough,
the railway company's public image degrades proportionally to the amount of
daily delays, and the same goes for its profit! This paper describes an
inoculation procedure which greatly enhances an evolutionary algorithm for
train re-scheduling. The procedure consists in building the initial population
around a pre-computed solution based on problem-related information available
beforehand. The optimization is performed by adapting times of departure and
arrival, as well as allocation of tracks, for each train at each station. This
is achieved by a permutation-based evolutionary algorithm that relies on a
semi-greedy heuristic scheduler to gradually reconstruct the schedule by
inserting trains one after another. Experimental results are presented on
various instances of a large real-world case involving around 500 trains and
more than 1 million constraints. In terms of competition with commercial math
ematical programming tool ILOG CPLEX, it appears that within a large class of
instances, excluding trivial instances as well as too difficult ones, and with
very few exceptions, a clever initialization turns an encouraging failure into
a clear-cut success auguring of substantial financial savings
- …