5,858 research outputs found

    SDRL: Interpretable and Data-efficient Deep Reinforcement Learning Leveraging Symbolic Planning

    Full text link
    Deep reinforcement learning (DRL) has gained great success by learning directly from high-dimensional sensory inputs, yet is notorious for the lack of interpretability. Interpretability of the subtasks is critical in hierarchical decision-making as it increases the transparency of black-box-style DRL approach and helps the RL practitioners to understand the high-level behavior of the system better. In this paper, we introduce symbolic planning into DRL and propose a framework of Symbolic Deep Reinforcement Learning (SDRL) that can handle both high-dimensional sensory inputs and symbolic planning. The task-level interpretability is enabled by relating symbolic actions to options.This framework features a planner -- controller -- meta-controller architecture, which takes charge of subtask scheduling, data-driven subtask learning, and subtask evaluation, respectively. The three components cross-fertilize each other and eventually converge to an optimal symbolic plan along with the learned subtasks, bringing together the advantages of long-term planning capability with symbolic knowledge and end-to-end reinforcement learning directly from a high-dimensional sensory input. Experimental results validate the interpretability of subtasks, along with improved data efficiency compared with state-of-the-art approaches

    Global Continuous Optimization with Error Bound and Fast Convergence

    Get PDF
    This paper considers global optimization with a black-box unknown objective function that can be non-convex and non-differentiable. Such a difficult optimization problem arises in many real-world applications, such as parameter tuning in machine learning, engineering design problem, and planning with a complex physics simulator. This paper proposes a new global optimization algorithm, called Locally Oriented Global Optimization (LOGO), to aim for both fast convergence in practice and finite-time error bound in theory. The advantage and usage of the new algorithm are illustrated via theoretical analysis and an experiment conducted with 11 benchmark test functions. Further, we modify the LOGO algorithm to specifically solve a planning problem via policy search with continuous state/action space and long time horizon while maintaining its finite-time error bound. We apply the proposed planning method to accident management of a nuclear power plant. The result of the application study demonstrates the practical utility of our method

    Flexible provisioning of Web service workflows

    No full text
    Web services promise to revolutionise the way computational resources and business processes are offered and invoked in open, distributed systems, such as the Internet. These services are described using machine-readable meta-data, which enables consumer applications to automatically discover and provision suitable services for their workflows at run-time. However, current approaches have typically assumed service descriptions are accurate and deterministic, and so have neglected to account for the fact that services in these open systems are inherently unreliable and uncertain. Specifically, network failures, software bugs and competition for services may regularly lead to execution delays or even service failures. To address this problem, the process of provisioning services needs to be performed in a more flexible manner than has so far been considered, in order to proactively deal with failures and to recover workflows that have partially failed. To this end, we devise and present a heuristic strategy that varies the provisioning of services according to their predicted performance. Using simulation, we then benchmark our algorithm and show that it leads to a 700% improvement in average utility, while successfully completing up to eight times as many workflows as approaches that do not consider service failures

    Optimizing Coordinated Vehicle Platooning: An Analytical Approach Based on Stochastic Dynamic Programming

    Full text link
    Platooning connected and autonomous vehicles (CAVs) can improve traffic and fuel efficiency. However, scalable platooning operations require junction-level coordination, which has not been well studied. In this paper, we study the coordination of vehicle platooning at highway junctions. We consider a setting where CAVs randomly arrive at a highway junction according to a general renewal process. When a CAV approaches the junction, a system operator determines whether the CAV will merge into the platoon ahead according to the positions and speeds of the CAV and the platoon. We formulate a Markov decision process to minimize the discounted cumulative travel cost, i.e. fuel consumption plus travel delay, over an infinite time horizon. We show that the optimal policy is threshold-based: the CAV will merge with the platoon if and only if the difference between the CAV's and the platoon's predicted times of arrival at the junction is less than a constant threshold. We also propose two ready-to-implement algorithms to derive the optimal policy. Comparison with the classical value iteration algorithm implies that our approach explicitly incorporating the characteristics of the optimal policy is significantly more efficient in terms of computation. Importantly, we show that the optimal policy under Poisson arrivals can be obtained by solving a system of integral equations. We also validate our results in simulation with Real-time Strategy (RTS) using real traffic data. The simulation results indicate that the proposed method yields better performance compared with the conventional method
    corecore