99 research outputs found

    Competitive function approximation for reinforcement learning

    Get PDF
    The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin

    Reinforcement learning for robot control using probability density estimations

    Get PDF
    Presentado al ICINCO 2010 celebrado en Funchal (Portugal) del 15 al 18 de junio.The successful application of Reinforcement Learning (RL) techniques to robot control is limited by the fact that, in most robotic tasks, the state and action spaces are continuous, multidimensional, and in essence, too large for conventional RL algorithms to work. The well known curse of dimensionality makes infeasible using a tabular representation of the value function, which is the classical approach that provides convergence guarantees. When a function approximation technique is used to generalize among similar states, the convergence of the algorithm is compromised, since updates unavoidably affect an extended region of the domain, that is, some situations are modified in a way that has not been really experienced, and the update may degrade the approximation. We propose a RL algorithm that uses a probability density estimation in the joint space of states, actions and Q-values as a means of function approximation. This allows us to devise an updating approach that, taking into account the local sampling density, avoids an excessive modification of the approximation far from the observed sample.This work was supported by the project 'CONSOLIDER-INGENIO 2010 Multimodal interaction in pattern recognition and computer vision' (V-00069). This research was partially supported by Consolider Ingenio 2010, project CSD2007-00018.Peer Reviewe

    Learning weakly correlated cause-effects for gardening with a cognitive system

    Get PDF
    We propose a cognitive system that combines artificial intelligence techniques for planning and learning to execute tasks involving delayed and variable correlations between the actions executed and their expected effects. The system is applied to the task of controlling the growth of plants, where the evolution of the plant attributes strongly depends on different events taking place in the temporally distant past history of the plant. The main problem to tackle is how to efficiently detect these past events. This is very challenging since the inclusion of time could make the dimensionality of the search space extremely large and the collected training instances may only provide very limited information about the relevant combinations of events. To address this problem we propose a learning method that progressively identifies those events that are more likely to produce a sequence of changes under a plant treatment. Since the number of experiences is very limited compared to the size of the event space, we use a probabilistic estimate that takes into account the lack of experience to prevent biased estimations. Planning operators are generated from most accurately predicted sequences of changes. Planning and learning are integrated in a decision-making framework that operates without task interruptions by allowing a human gardener to instruct the treatments when the knowledge acquired so far is not enough to make a decision.This research was supported by the European Community Seventh Framework ProgrammeFP7/2007–2013 – Challenge 2 – Cognitive Systems, Interaction, Robotics – under Grant agreement No 247947 – GARNICS.Peer Reviewe

    Online EM with weight-based forgetting

    Get PDF
    In the online version of the EM algorithm introduced by Sato and Ishii (2000), a time-dependent discount factor is introduced for forgetting the effect of the old estimated values obtained with an earlier, inaccurate estimator. In their approach, forgetting is uniformly applied to the estimators of each mixture component depending exclusively on time, irrespective of theweight attributed to each unit for the observed sample. This causes an excessive forgetting in the less frequently sampled regions. To address this problem, we propose a modification of the algorithm that involves a weight-dependent forgetting, different for each mixture component, in which old observations are forgotten according to the actual weight of the new samples used to replace older values. A comparison of the timedependent versus the weight-dependent approach shows that the latter improves the accuracy of the approximation and exhibits much greater stability.Peer Reviewe

    A competitive strategy for function approximation in Q-learning

    Get PDF
    In this work we propose an approach for generalization in continuous domain Reinforcement Learning that, instead of using a single function approximator, tries many different function approximators in parallel, each one defined in a different region of the domain. Associated with each approximator is a relevance function that locally quantifies the quality of its approximation, so that, at each input point, the approximator with highest relevance can be selected. The relevance function is defined using parametric estimations of the variance of the q-values and the density of samples in the input space, which are used to quantify the accuracy and the confidence in the approximation, respectively. These parametric estimations are obtained from a probability density distribution represented as a Gaussian Mixture Model embedded in the input-output space of each approximator. In our experiments, the proposed approach required a lesser number of experiences for learning and produced more stable convergence profiles than when using a single function approximator.Peer ReviewedPreprin

    Stochastic approximations of average values using proportions of samples

    Get PDF
    IRI Technical ReportIn this work we explain how the stochastic approximation of the average of a random variable is carried out when the observations used in the updates consist in proportion of samples rather than complete samples.Preprin

    Long-Horizon Task Planning and Execution with Functional Object-Oriented Networks

    Full text link
    Following work on joint object-action representation, functional object-oriented networks (FOON) were introduced as a knowledge representation for robots. A FOON contains symbolic (high-level) concepts useful to a robot's understanding of tasks and its environment for object-level planning. Prior to this work, little has been done to show how plans acquired from FOON can be executed by a robot, as the concepts in a FOON are too abstract for immediate execution. We propose a hierarchical task planning approach that translates a FOON graph into a PDDL-based representation of domain knowledge for task planning and execution. As a result of this process, a task plan can be acquired, which can be executed by a robot from start to end, leveraging the use of action contexts and skills as dynamic movement primitives (DMPs). We demonstrate the entire pipeline from planning to execution using CoppeliaSim and show how learned action contexts can be extended to never-before-seen scenarios.Comment: Preliminary Draft, 8 pages, IEEE Conference Forma

    Efficient interactive decision-making framework for robotic applications

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/The inclusion of robots in our society is imminent, such as service robots. Robots are now capable of reliably manipulating objects in our daily lives but only when combined with artificial intelligence (AI) techniques for planning and decision-making, which allow a machine to determine how a task can be completed successfully. To perform decision making, AI planning methods use a set of planning operators to code the state changes in the environment produced by a robotic action. Given a specific goal, the planner then searches for the best sequence of planning operators, i.e., the best plan that leads through the state space to satisfy the goal. In principle, planning operators can be hand-coded, but this is impractical for applications that involve many possible state transitions. An alternative is to learn them automatically from experience, which is most efficient when there is a human teacher. In this study, we propose a simple and efficient decision-making framework for this purpose. The robot executes its plan in a step-wise manner and any planning impasse produced by missing operators is resolved online by asking a human teacher for the next action to execute. Based on the observed state transitions, this approach rapidly generates the missing operators by evaluating the relevance of several cause–effect alternatives in parallel using a probability estimate, which compensates for the high uncertainty that is inherent when learning from a small number of samples. We evaluated the validity of our approach in simulated and real environments, where it was benchmarked against previous methods. Humans learn in the same incremental manner, so we consider that our approach may be a better alternative to existing learning paradigms, which require offline learning, a significant amount of previous knowledge, or a large number of samples.Peer ReviewedPostprint (author's final draft

    A general strategy for interactive decision-making in robotic platforms

    Get PDF
    This work presents an intergated strategy for planning and learning suitable to execute tasks with robotic platforms without any previous task specification. The approach rapidly learns planning operators from few action experiences using a competitive strategy where many alternatives of cause-effect explanations are evaluated in parallel, and the most successful ones are used to generate the operators. The system operates without task interruption by integrating in the planning-learning loop a human teacher that supports the planner in making decisions. All the mechanisms are integrated and synchronized in the robot using a general decision-making framework.Preprin
    • …
    corecore