Search CORE

11 research outputs found

Learning STRIPS Action Models with Classical Planning

Author: Aineto Diego
Jiménez Sergio
Onaindia Eva
Publication venue
Publication date: 15/06/2018
Field of study

This paper presents a novel approach for learning STRIPS action models from examples that compiles this inductive learning task into a classical planning task. Interestingly, the compilation approach is flexible to different amounts of available input knowledge; the learning examples can range from a set of plans (with their corresponding initial and final states) to just a pair of initial and final states (no intermediate action or state is given). Moreover, the compilation accepts partially specified action models and it can be used to validate whether the observation of a plan execution follows a given STRIPS action model, even if this model is not fully specified.Comment: 8+1 pages, 4 figures, 6 table

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Planning Technologies for Interactive Storytelling

Author: A Jhala
AI Coles
B Bonet
D Chapman
D Dennett
David Pizzi
E Amir
Gabriela Tully
HH Zhuo
J Bates
J Hoffmann
J Hoffmann
J Hoffmann
JD Bolter
Julie Porteous
LM Barros
M Fox
M Mateas
M. Cavazza
Mark O. Riedl
ML Ryan
MO Riedl
P Gervás
P Haslum
R. Michael Young
RE Korf
Reid Swanson
Richard Paul
S Chatman
S Rimmon-Kenan
T Trabasso
Y Chen
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 03/08/2016
Field of study

Since AI planning was first proposed for the task of narrative generation in interactive storytelling (IS), it has emerged as the dominant approach in this field. This chapter traces the use of planning technologies in this area, considers the core issues involved in the application of planning technologies in IS, and identifies some of the remaining challenges

Crossref

Teeside University's Research Repository

RMIT Research Repository

STRIPS Action Discovery

Author: Alenyà Guillem
Segovia-Aguas Javier
Suárez-Hernández Alejandro
Torras Carme
Publication venue
Publication date: 01/01/2020
Field of study

The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show through several experiments how learned action models generalize over unseen planning instances.Comment: Presented to Genplan 2020 workshop, held in the AAAI 2020 conference (https://sites.google.com/view/genplan20) (2021/03/05: included missing acknowledgments

arXiv.org e-Print Archive

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Classical Planning in Deep Latent Space

Author: Asai Masataro
Fukunaga Alex
Kajino Hiroshi
Muise Christian
Publication venue
Publication date: 30/06/2021
Field of study

Current domain-independent, classical planners require symbolic models of the problem domain and instance as input, resulting in a knowledge acquisition bottleneck. Meanwhile, although deep learning has achieved significant success in many fields, the knowledge is encoded in a subsymbolic representation which is incompatible with symbolic systems such as planners. We propose Latplan, an unsupervised architecture combining deep learning and classical planning. Given only an unlabeled set of image pairs showing a subset of transitions allowed in the environment (training inputs), Latplan learns a complete propositional PDDL action model of the environment. Later, when a pair of images representing the initial and the goal states (planning inputs) is given, Latplan finds a plan to the goal state in a symbolic latent space and returns a visualized plan execution. We evaluate Latplan using image-based versions of 6 planning domains: 8-puzzle, 15-Puzzle, Blocksworld, Sokoban and Two variations of LightsOut.Comment: Under review at Journal of Artificial Intelligence Research (JAIR

arXiv.org e-Print Archive

Recommended from our members

Approximate Dynamic Programming with Parallel Stochastic Planning Operators

Author: Child C. H. T.
Publication venue
Publication date
Field of study

This thesis presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture. Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives. The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience. This thesis presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated. The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment. Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed

City Research Online

Approximate Dynamic Programming with Parallel Stochastic Planning Operators

Author: Child C. H. T.
Publication venue: City University London
Publication date: 01/01/2011
Field of study

This report presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture. Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives. The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience. This report presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated. The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment. Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed

CiteSeerX

City Research Online

Generalised Domain Model Acquisition from Action Traces

Author: Cresswell Stephen
Gregory Peter
Publication venue
Publication date: 22/03/2011
Field of study

One approach to the problem of formulating domain models for planning is to learn the models from example action sequences. The LOCM system demonstrated the feasibility of learning domain models from example action sequences only, with no observation of states before, during or after the plans. LOCM uses an object-centred representation, in which each object is represented by a single parameterised state machine. This makes it powerful for learning domains which fit within that representation, but there are some well-known domains which do not. This paper introduces LOCM2, a novel algorithm in which the domain representation of LOCM is generalised to allow multiple parameterised state machines to represent a single object. This extends the coverage of domains for which an adequate domain model can be learned. The LOCM2 algorithm is described and evaluated by testing domain learning from example plans from published results of past International Planning Competitions

Association for the Advancement of Artificial Intelligence: AAAI Publications

University of Huddersfield Repository