45 research outputs found

    Learning Symbolic Models of Stochastic Domains

    Full text link
    In this article, we work towards the goal of developing agents that can learn to act in complex worlds. We develop a probabilistic, relational planning rule representation that compactly models noisy, nondeterministic action effects, and show how such rules can be effectively learned. Through experiments in simple planning domains and a 3D simulated blocks world with realistic physics, we demonstrate that this learning algorithm allows agents to effectively model world dynamics

    Learning relational dynamics of stochastic domains for planning

    Get PDF
    Probabilistic planners are very flexible tools that can provide good solutions for difficult tasks. However, they rely on a model of the domain, which may be costly to either hand code or automatically learn for complex tasks. We propose a new learning approach that (a) requires only a set of state transitions to learn the model; (b) can cope with uncertainty in the effects; (c) uses a relational representation to generalize over different objects; and (d) in addition to action effects, it can also learn exogenous effects that are not related to any action, e.g., moving objects, endogenous growth and natural development. The proposed learning approach combines a multi-valued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. Finally, experimental validation is provided that shows improvements over previous work.Peer ReviewedPostprint (author's final draft

    Learning relational dynamics of stochastic domains for planning

    Get PDF
    Probabilistic planners are very flexible tools that can provide good solutions for difficult tasks. However, they rely on a model of the domain, which may be costly to either hand code or automatically learn for complex tasks. We propose a new learning approach that (a) requires only a set of state transitions to learn the model; (b) can cope with uncertainty in the effects; (c) uses a relational representation to generalize over different objects; and (d) in addition to action effects, it can also learn exogenous effects that are not related to any action, e.g., moving objects, endogenous growth and natural development. The proposed learning approach combines a multi-valued variant of inductive logic programming for the generation of candidate models, with an optimization method to select the best set of planning operators to model a problem. Finally, experimental validation is provided that shows improvements over previous work.Peer ReviewedPostprint (author's final draft

    Adapting robot task planning to user preferences: an assistive shoe dressing example

    Get PDF
    The final publication is available at link.springer.comHealthcare robots will be the next big advance in humans’ domestic welfare, with robots able to assist elderly people and users with disabilities. However, each user has his/her own preferences, needs and abilities. Therefore, robotic assistants will need to adapt to them, behaving accordingly. Towards this goal, we propose a method to perform behavior adaptation to the user preferences, using symbolic task planning. A user model is built from the user’s answers to simple questions with a fuzzy inference system, and it is then integrated into the planning domain. We describe an adaptation method based on both the user satisfaction and the execution outcome, depending on which penalizations are applied to the planner’s rules. We demonstrate the application of the adaptation method in a simple shoe-fitting scenario, with experiments performed in a simulated user environment. The results show quick behavior adaptation, even when the user behavior changes, as well as robustness to wrong inference of the initial user model. Finally, some insights in a non-simulated world shoe-fitting setup are also provided.Peer ReviewedPostprint (author's final draft

    Planning surface cleaning tasks by learning uncertain drag actions outcomes

    Get PDF
    A method to perform cleaning tasks is presented where a robot manipulator autonomously grasps a textile and uses different dragging actions to clean a surface. Ac- tions are imprecise, and probabilistic planning is used to select the best sequence of actions. The character- ization of such actions is complex because the initial autonomous grasp of the textile introduces differences in the initial conditions that change the efficacy of the robot cleaning actions. We demonstrate that the action outcome probabilities can be learned very fast while the task is being executed, so as to progressively improve robot performance. The learner adds only a little over- head to the system compared to the improvements ob- tained. Experiments with a real robot show that the most effective plan varies depending on the initial grasp, and that plans become better after only a few learning itera- tionsPeer ReviewedPostprint (author’s final draft

    V-MIN: Efficient reinforcement learning through demonstrations and relaxed reward demands

    Get PDF
    Trabajo presentado a la Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI-15) celebrada en Austin, Texas (US) del 25 al 30 de enero de 2015.Reinforcement learning (RL) is a common paradigm for learning tasks in robotics. However, a lot of exploration is usually required, making RL too slow for high-level tasks. We present V-MIN, an algorithm that integrates teacher demonstrations with RL to learn complex tasks faster. The algorithm combines active demonstration requests and autonomous exploration to find policies yielding rewards higher than a given threshold Vmin. This threshold sets the degree of quality with which the robot is expected to complete the task, thus allowing the user to either opt for very good policies that require many learning experiences, or to be more permissive with sub-optimal policies that are easier to learn. The threshold can also be increased online to force the system to improve its policies until the desired behavior is obtained. Furthermore, the algorithm generalizes previously learned knowledge, adapting well to changes. The performance of V-MIN has been validated through experimentation, including domains from the international planning competition. Our approach achieves the desired behavior where previous algorithms failed.This work is supported by CSIC project MANIPlus 201350E102 and by the Spanish Ministry of Science and Innovation under project PAU+ DPI2011-27510. D. Martínez is also supported by the Spanish Ministry of Education, Culture and Sport via a FPU doctoral grant (FPU12-04173).Peer Reviewe
    corecore