7 research outputs found

    STRIPS Action Discovery

    Get PDF
    The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show through several experiments how learned action models generalize over unseen planning instances.Comment: Presented to Genplan 2020 workshop, held in the AAAI 2020 conference (https://sites.google.com/view/genplan20) (2021/03/05: included missing acknowledgments

    Proactive learning of cognitive exercises with a social robot

    Get PDF
    Extended Abstract presentat al workshop de Prediction and Anticipation Reasoning in Human Robot Interaction (https://www.iri.upc.edu/workshops/pred-ant-hri/cfp.html), a la conferència ICRA 2022. Aquest article descriu un treball en progrés que es preveu afegir a la tesi. A dia d'aquesta descripció (15/07/2022), encara no s'ha publicat un enllaç al document presentat al workshop.We introduce INtuitive PROgramming 2 (INPRO2), an improvement over our previous INPRO framework for learning board exercises via demonstrations. INPRO2 makes use of our Online Action Recognition through Unification (OARU) algorithm, which maintains and extends as needed a library of STRIPS action schemata that represent the dynamics, rules and goal of the exercise. OARU operates on a sequence of states shown by the user. Each state transition is either used to learn a new action, or is recognized as an instance of one action currently present in the library, possibly refining it. We have extended OARU to support negative examples (i.e. invalid moves that show forbidden state transitions) in order to increase the complexity of the exercises that can be learned. This new OARU's feature is exploited through another crucial element of INPRO2: its ability to proactively ask for the legality of certain moves to the user in critical situations, and fix overly permissive actions. We show an example of a typical INPRO2 learning session. We also outline a plan for a user study that will serve to assess the proactive behavior of the robot.Preprin

    Online action recognition

    Get PDF
    Recognition in planning seeks to find agent intentions, goals or activities given a set of observations and a knowledge library (e.g. goal states, plans or domain theories). In this work we introduce the problem of Online Action Recognition. It consists in recognizing, in an open world, the planning action that best explains a partially observable state transition from a knowledge library of first-order STRIPS actions, which is initially empty. We frame this as an optimization problem, and propose two algorithms to address it: Action Unification (AU) and Online Action Recognition through Unification (OARU). The former builds on logic unification and generalizes two input actions using weighted partial MaxSAT. The latter looks for an action within the library that explains an observed transition. If there is such action, it generalizes it making use of AU, building in this way an AU hierarchy. Otherwise, OARU inserts a Trivial Grounded Action (TGA) in the library that explains just that transition. We report results on benchmarks from the International Planning Competition and PDDLGym, where OARU recognizes actions accurately with respect to expert knowledge, and shows real-time performance.Peer ReviewedPostprint (author's final draft

    Learning Parameterized Task Structure for Generalization to Unseen Entities

    Full text link
    Real world tasks are hierarchical and compositional. Tasks can be composed of multiple subtasks (or sub-goals) that are dependent on each other. These subtasks are defined in terms of entities (e.g., "apple", "pear") that can be recombined to form new subtasks (e.g., "pickup apple", and "pickup pear"). To solve these tasks efficiently, an agent must infer subtask dependencies (e.g. an agent must execute "pickup apple" before "place apple in pot"), and generalize the inferred dependencies to new subtasks (e.g. "place apple in pot" is similar to "place apple in pan"). Moreover, an agent may also need to solve unseen tasks, which can involve unseen entities. To this end, we formulate parameterized subtask graph inference (PSGI), a method for modeling subtask dependencies using first-order logic with subtask entities. To facilitate this, we learn entity attributes in a zero-shot manner, which are used as quantifiers (e.g. "is_pickable(X)") for the parameterized subtask graph. We show this approach accurately learns the latent structure on hierarchical and compositional tasks more efficiently than prior work, and show PSGI can generalize by modelling structure on subtasks unseen during adaptation.Comment: Published in AAAI 202

    Multi-task Hierarchical Reinforcement Learning for Compositional Tasks

    Full text link
    This thesis presents the algorithms for solve multiple compositional tasks with high sample efficiency and strong generalization ability. Central to this work is the subtask graph which models the structure in compositional tasks into a graph form. We formulate the compositional tasks as a multi-task and meta-RL problems using the subtask graph and discuss different approaches to tackle the problem. Specifically, we present four contributions, where the common idea is to exploit the inductive bias in the hierarchical task structure for efficien learning and strong generalization. The first part of the thesis formally introduces the subtask graph execution problem: a modeling of the compositional task as an multi-task RL problem where the agent is given a task description input in a graph form as an additional input. We present the hierarchical architecture where high-level policy determines the subtask to execute and low-level policy executes the given subtask. The high-level policy learns the modular neural network that can be dynamically assmbled according to the input task description to choose the optimal sequence of subtasks to maximize the reward. We demonstrate that the proposed method can achieve a strong zero-shot task generalization ability, and also improve the search efficiency of existing planning method when combined together. The second part studies the more general setting where the task structure is not available to agent such that the task should be inferred from the agent's own experience; ie, few-shot reinforcement learning setting. Specifically, we combine the meta-reinforcemenet learning with an inductive logic programming (ILP) method to explicitly infer the latent task structure in terms of subtask graph from agent's trajectory. Our empirical study shows that the underlying task structure can be accurately inferred from a small amount of environment interaction without any explicit supervision on complex 3D environments with high-dimensional state and actions space. The third contribution extends thesecond contribution by transfer-learning the prior over the task structure from training tasks to the unseen testing task to achieve a faster adaptation. Although the meta-policy learned the general exploration strategy over the distribution of tasks, the task structure was independently inferred from scratch for each task in the previous part. We overcome such limitation by modeling the prior of the tasks from the subtask graph inferred via ILP, and transfer-learning the learned prior when performing the inference of novel test tasks. To achieve this, we propose a novel prior sampling and posterior update method to incorporate the knowledge learned from the seen task that is most relevant to the current task. The last part investigates more indirect form of inductive bias that is implemented as a constraint on the trajectory rolled out by the policy in MDP. We present a theoretical result proving that the proposed constraint preserves the optimality while reducing the policy search space. Empirically, the proposed method improves the sample effciency of the policy gradient method on a wide range of challenging sparse-reward tasks. Overall, this work formulates the hierarchical structure in the compositional tasks and provides the evidences that such structure exists in many important problems. In addition, we present diverse principled approaches to exploit the inductive bias on the hierarchical structure in MDP in different problem settings and assumptions, and demonstrate the usefulness of such inductive bias when tackling compositional tasks.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169951/1/srsohn_1.pd

    STRIPS action discovery

    Get PDF
    The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces, each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show, through several experiments, how learnt action models generalize over unseen planning instances.Peer ReviewedPostprint (author's final draft

    STRIPS action discovery

    No full text
    Trabajo presentado en el AAAI 2020 workshop on Generalization in Planning (GenPlan), celebrado en Nueva York (Estados Unidos), el 7 de febrero de 2020The problem of specifying high-level knowledge bases for planning becomes a hard task in realistic environments. This knowledge is usually handcrafted and is hard to keep updated, even for system experts. Recent approaches have shown the success of classical planning at synthesizing action models even when all intermediate states are missing. These approaches can synthesize action schemas in Planning Domain Definition Language (PDDL) from a set of execution traces each consisting, at least, of an initial and final state. In this paper, we propose a new algorithm to unsupervisedly synthesize STRIPS action models with a classical planner when action signatures are unknown. In addition, we contribute with a compilation to classical planning that mitigates the problem of learning static predicates in the action model preconditions, exploits the capabilities of SAT planners with parallel encodings to compute action schemas and validate all instances. Our system is flexible in that it supports the inclusion of partial input information that may speed up the search. We show through several experiments how learnt action models generalize over unseen planning instances
    corecore