19 research outputs found

    Towards a Reformulation Based Approach for Efficient Numeric Planning: Numeric Outer Entanglements

    Get PDF
    Restricting the search space has shown to be an effective approach for improving the performance of automated planning systems. A planner-independent technique for pruning the search space is domain and problem reformulation. Recently, Outer Entanglements, which are relations between planning operators and initial or goal predicates, have been introduced as a reformulation technique for eliminating potential undesirable instances of planning operators, and thus restricting the search space. Reformulation techniques, however, have been mainly applied in classical planning, although many real-world planning applications require to deal with numerical information. In this paper, we investigate the usefulness of reformulation approaches in planning with numerical fluents. In particular, we propose and extension of the notion of outer entanglements for handling numeric fluents. An empirical evaluation, which involves 150 instances from 5 domains, shows promising results

    Answer Set Programming for Non-Stationary Markov Decision Processes

    Full text link
    Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains

    Learning Classical Planning Strategies with Policy Gradient

    Get PDF
    A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner's state. Then, we train the system on randomly generated problems from five IPC domains using three different performance metrics. Our experimental results show that the learner is able to discover domain-specific search strategies, improving the planner's performance relative to the baselines of plain best-first search and a uniform policy.Comment: Accepted for ICAPS 201

    Dynamic Controllability Made Simple

    Get PDF
    Simple Temporal Networks with Uncertainty (STNUs) are a well-studied model for representing temporal constraints, where some intervals (contingent links) have an unknown but bounded duration, discovered only during execution. An STNU is dynamically controllable (DC) if there exists a strategy to execute its time-points satisfying all the constraints, regardless of the actual duration of contingent links revealed during execution. In this work we present a new system of constraint propagation rules for STNUs, which is sound-and-complete for DC checking. Our system comprises just three rules which, differently from the ones proposed in all previous works, only generate unconditioned constraints. In particular, after applying our sound rules, the network remains an STNU in all respects. Moreover, our completeness proof is short and non-algorithmic, based on the explicit construction of a valid execution strategy. This is a substantial simplification of the theory which underlies all the polynomial-time algorithms for DC-checking. Our analysis also shows: (1) the existence of late execution strategies for STNUs, (2) the equivalence of several variants of the notion of DC, (3) the existence of a fast algorithm for real-time execution of STNUs, which runs in O(KN) total time in a network with K contingent links and N time points, considerably improving the previous O(N^3)-time bound

    A multi-objective approach for PH-graphs with applications to stochastic shortest paths

    Get PDF
    Stochastic shortest path problems (SSPPs) have many applications in practice and are subject of ongoing research for many years. This paper considers a variant of SSPPs where times or costs to pass an edge in a graph are, possibly correlated, random variables. There are two general goals one can aim for, the minimization of the expected costs to reach the destination or the maximization of the probability to reach the destination within a given budget. Often one is interested in policies that build a compromise between different goals which results in multi-objective problems. In this paper, an algorithm to compute the convex hull of Pareto optimal policies that consider expected costs and probabilities of falling below given budgets is developed. The approach uses the recently published class of PH-graphs that allow one to map SSPPs, even with generally distributed and correlated costs associated to edges, on Markov decision processes (MDPs) and apply the available techniques for MDPs to compute optimal policies

    Autonomous Driving: A Multi-Objective Deep Reinforcement Learning Approach

    Get PDF
    Autonomous driving is a challenging domain that entails multiple aspects: a vehicle should be able to drive to its destination as fast as possible while avoiding collision, obeying traffic rules and ensuring the comfort of passengers. It's representative of complex reinforcement learning tasks humans encounter in real life. The aim of this thesis is to explore the effectiveness of multi-objective reinforcement learning for such tasks characterized by autonomous driving. In particular, it shows that: 1. Multi-objective reinforcement learning is effective at overcoming some of the difficulties faced by scalar-reward reinforcement learning, and a multi-objective DQN agent based on a variant of thresholded lexicographic Q-learning is successfully trained to drive on multi-lane roads and intersections, yielding and changing lanes according to traffic rules. 2. Data efficiency of (multi-objective) reinforcement learning can be significantly improved by exploiting the factored structure of a task. Specifically, factored Q functions learned on the factored state space can be used as features to the original Q function to speed up learning. 3. Inclusion of history-dependent policies enables an intuitive exact algorithm for multi-objective reinforcement learning with thresholded lexicographic order
    corecore