5,858 research outputs found
SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning
Deep reinforcement learning (DRL) has gained great success by learning
directly from high-dimensional sensory inputs, yet is notorious for the lack of
interpretability. Interpretability of the subtasks is critical in hierarchical
decision-making as it increases the transparency of black-box-style DRL
approach and helps the RL practitioners to understand the high-level behavior
of the system better. In this paper, we introduce symbolic planning into DRL
and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can
handle both high-dimensional sensory inputs and symbolic planning. The
task-level interpretability is enabled by relating symbolic actions to
options.This framework features a planner -- controller -- meta-controller
architecture, which takes charge of subtask scheduling, data-driven subtask
learning, and subtask evaluation, respectively. The three components
cross-fertilize each other and eventually converge to an optimal symbolic plan
along with the learned subtasks, bringing together the advantages of long-term
planning capability with symbolic knowledge and end-to-end reinforcement
learning directly from a high-dimensional sensory input. Experimental results
validate the interpretability of subtasks, along with improved data efficiency
compared with state-of-the-art approaches
Global Continuous Optimization with Error Bound and Fast Convergence
This paper considers global optimization with a black-box unknown objective
function that can be non-convex and non-differentiable. Such a difficult
optimization problem arises in many real-world applications, such as parameter
tuning in machine learning, engineering design problem, and planning with a
complex physics simulator. This paper proposes a new global optimization
algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both
fast convergence in practice and finite-time error bound in theory. The
advantage and usage of the new algorithm are illustrated via theoretical
analysis and an experiment conducted with 11 benchmark test functions. Further,
we modify the LOGO algorithm to specifically solve a planning problem via
policy search with continuous state/action space and long time horizon while
maintaining its finite-time error bound. We apply the proposed planning method
to accident management of a nuclear power plant. The result of the application
study demonstrates the practical utility of our method
Flexible provisioning of Web service workflows
Web services promise to revolutionise the way computational resources and business processes are offered and invoked in open, distributed systems, such as the Internet. These services are described using machine-readable meta-data, which enables consumer applications to automatically discover and provision suitable services for their workflows at run-time. However, current approaches have typically assumed service descriptions are accurate and deterministic, and so have neglected to account for the fact that services in these open systems are inherently unreliable and uncertain. Specifically, network failures, software bugs and competition for services may regularly lead to execution delays or even service failures. To address this problem, the process of provisioning services needs to be performed in a more flexible manner than has so far been considered, in order to proactively deal with failures and to recover workflows that have partially failed. To this end, we devise and present a heuristic strategy that varies the provisioning of services according to their predicted performance. Using simulation, we then benchmark our algorithm and show that it leads to a 700% improvement in average utility, while successfully completing up to eight times as many workflows as approaches that do not consider service failures
Optimizing Coordinated Vehicle Platooning: An Analytical Approach Based on Stochastic Dynamic Programming
Platooning connected and autonomous vehicles (CAVs) can improve traffic and
fuel efficiency. However, scalable platooning operations require junction-level
coordination, which has not been well studied. In this paper, we study the
coordination of vehicle platooning at highway junctions. We consider a setting
where CAVs randomly arrive at a highway junction according to a general renewal
process. When a CAV approaches the junction, a system operator determines
whether the CAV will merge into the platoon ahead according to the positions
and speeds of the CAV and the platoon. We formulate a Markov decision process
to minimize the discounted cumulative travel cost, i.e. fuel consumption plus
travel delay, over an infinite time horizon. We show that the optimal policy is
threshold-based: the CAV will merge with the platoon if and only if the
difference between the CAV's and the platoon's predicted times of arrival at
the junction is less than a constant threshold. We also propose two
ready-to-implement algorithms to derive the optimal policy. Comparison with the
classical value iteration algorithm implies that our approach explicitly
incorporating the characteristics of the optimal policy is significantly more
efficient in terms of computation. Importantly, we show that the optimal policy
under Poisson arrivals can be obtained by solving a system of integral
equations. We also validate our results in simulation with Real-time Strategy
(RTS) using real traffic data. The simulation results indicate that the
proposed method yields better performance compared with the conventional
method
- …