72 research outputs found

    Survey: Robot Programming by Demonstration

    Get PDF
    Robot PbD started about 30 years ago, growing importantly during the past decade. The rationale for moving from purely preprogrammed robots to very flexible user-based interfaces for training the robot to perform a task is three-fold. First and foremost, PbD, also referred to as {\em imitation learning} is a powerful mechanism for reducing the complexity of search spaces for learning. When observing either good or bad examples, one can reduce the search for a possible solution, by either starting the search from the observed good solution (local optima), or conversely, by eliminating from the search space what is known as a bad solution. Imitation learning is, thus, a powerful tool for enhancing and accelerating learning in both animals and artifacts. Second, imitation learning offers an implicit means of training a machine, such that explicit and tedious programming of a task by a human user can be minimized or eliminated (Figure \ref{fig:what-how}). Imitation learning is thus a ``natural'' means of interacting with a machine that would be accessible to lay people. And third, studying and modeling the coupling of perception and action, which is at the core of imitation learning, helps us to understand the mechanisms by which the self-organization of perception and action could arise during development. The reciprocal interaction of perception and action could explain how competence in motor control can be grounded in rich structure of perceptual variables, and vice versa, how the processes of perception can develop as means to create successful actions. PbD promises were thus multiple. On the one hand, one hoped that it would make the learning faster, in contrast to tedious reinforcement learning methods or trials-and-error learning. On the other hand, one expected that the methods, being user-friendly, would enhance the application of robots in human daily environments. Recent progresses in the field, which we review in this chapter, show that the field has make a leap forward the past decade toward these goals and that these promises may be fulfilled very soon

    Classical Many-particle Clusters in Two Dimensions

    Full text link
    We report on a study of a classical, finite system of confined particles in two dimensions with a two-body repulsive interaction. We first develop a simple analytical method to obtain equilibrium configurations and energies for few particles. When the confinement is harmonic, we prove that the first transition from a single shell occurs when the number of particles changes from five to six. The shell structure in the case of an arbitrary number of particles is shown to be independent of the strength of the interaction but dependent only on its functional form. It is also independent of the magnetic field strength when included. We further study the effect of the functional form of the confinement potential on the shell structure. Finally we report some interesting results when a three-body interaction is included, albeit in a particular model.Comment: Minor corrections, a few references added. To appear in J. Phys: Condensed Matte

    Newton-Hooke spacetimes, Hpp-waves and the cosmological constant

    Full text link
    We show explicitly how the Newton-Hooke groups act as symmetries of the equations of motion of non-relativistic cosmological models with a cosmological constant. We give the action on the associated non-relativistic spacetimes and show how these may be obtained from a null reduction of 5-dimensional homogeneous pp-wave Lorentzian spacetimes. This allows us to realize the Newton-Hooke groups and their Bargmann type central extensions as subgroups of the isometry groups of the pp-wave spacetimes. The extended Schrodinger type conformal group is identified and its action on the equations of motion given. The non-relativistic conformal symmetries also have applications to time-dependent harmonic oscillators. Finally we comment on a possible application to Gao's generalization of the matrix model.Comment: 21 page

    APRIL: Active Preference-learning based Reinforcement Learning

    Get PDF
    This paper focuses on reinforcement learning (RL) with limited prior knowledge. In the domain of swarm robotics for instance, the expert can hardly design a reward function or demonstrate the target behavior, forbidding the use of both standard RL and inverse reinforcement learning. Although with a limited expertise, the human expert is still often able to emit preferences and rank the agent demonstrations. Earlier work has presented an iterative preference-based RL framework: expert preferences are exploited to learn an approximate policy return, thus enabling the agent to achieve direct policy search. Iteratively, the agent selects a new candidate policy and demonstrates it; the expert ranks the new demonstration comparatively to the previous best one; the expert's ranking feedback enables the agent to refine the approximate policy return, and the process is iterated. In this paper, preference-based reinforcement learning is combined with active ranking in order to decrease the number of ranking queries to the expert needed to yield a satisfactory policy. Experiments on the mountain car and the cancer treatment testbeds witness that a couple of dozen rankings enable to learn a competent policy

    Deep active learning for autonomous navigation.

    Get PDF
    Imitation learning refers to an agent's ability to mimic a desired behavior by learning from observations. A major challenge facing learning from demonstrations is to represent the demonstrations in a manner that is adequate for learning and efficient for real time decisions. Creating feature representations is especially challenging when extracted from high dimensional visual data. In this paper, we present a method for imitation learning from raw visual data. The proposed method is applied to a popular imitation learning domain that is relevant to a variety of real life applications; namely navigation. To create a training set, a teacher uses an optimal policy to perform a navigation task, and the actions taken are recorded along with visual footage from the first person perspective. Features are automatically extracted and used to learn a policy that mimics the teacher via a deep convolutional neural network. A trained agent can then predict an action to perform based on the scene it finds itself in. This method is generic, and the network is trained without knowledge of the task, targets or environment in which it is acting. Another common challenge in imitation learning is generalizing a policy over unseen situation in training data. To address this challenge, the learned policy is subsequently improved by employing active learning. While the agent is executing a task, it can query the teacher for the correct action to take in situations where it has low confidence. The active samples are added to the training set and used to update the initial policy. The proposed approach is demonstrated on 4 different tasks in a 3D simulated environment. The experiments show that an agent can effectively perform imitation learning from raw visual data for navigation tasks and that active learning can significantly improve the initial policy using a small number of samples. The simulated test bed facilitates reproduction of these results and comparison with other approaches

    Embodied Gesture Processing: Motor-Based Integration of Perception and Action in Social Artificial Agents

    Get PDF
    A close coupling of perception and action processes is assumed to play an important role in basic capabilities of social interaction, such as guiding attention and observation of others’ behavior, coordinating the form and functions of behavior, or grounding the understanding of others’ behavior in one’s own experiences. In the attempt to endow artificial embodied agents with similar abilities, we present a probabilistic model for the integration of perception and generation of hand-arm gestures via a hierarchy of shared motor representations, allowing for combined bottom-up and top-down processing. Results from human-agent interactions are reported demonstrating the model’s performance in learning, observation, imitation, and generation of gestures

    Measuring Generalization of Visuomotor Perturbations in Wrist Movements Using Mobile Phones

    Get PDF
    Recent studies in motor control have shown that visuomotor rotations for reaching have narrow generalization functions: what we learn during movements in one direction only affects subsequent movements into close directions. Here we wanted to measure the generalization functions for wrist movement. To do so we had 7 subjects performing an experiment holding a mobile phone in their dominant hand. The mobile phone's built in acceleration sensor provided a convenient way to measure wrist movements and to run the behavioral protocol. Subjects moved a cursor on the screen by tilting the phone. Movements on the screen toward the training target were rotated and we then measured how learning of the rotation in the training direction affected subsequent movements in other directions. We find that generalization is local and similar to generalization patterns of visuomotor rotation for reaching

    Review of the techniques used in motor‐cognitive human‐robot skill transfer

    Get PDF
    Abstract A conventional robot programming method extensively limits the reusability of skills in the developmental aspect. Engineers programme a robot in a targeted manner for the realisation of predefined skills. The low reusability of general‐purpose robot skills is mainly reflected in inability in novel and complex scenarios. Skill transfer aims to transfer human skills to general‐purpose manipulators or mobile robots to replicate human‐like behaviours. Skill transfer methods that are commonly used at present, such as learning from demonstrated (LfD) or imitation learning, endow the robot with the expert's low‐level motor and high‐level decision‐making ability, so that skills can be reproduced and generalised according to perceived context. The improvement of robot cognition usually relates to an improvement in the autonomous high‐level decision‐making ability. Based on the idea of establishing a generic or specialised robot skill library, robots are expected to autonomously reason about the needs for using skills and plan compound movements according to sensory input. In recent years, in this area, many successful studies have demonstrated their effectiveness. Herein, a detailed review is provided on the transferring techniques of skills, applications, advancements, and limitations, especially in the LfD. Future research directions are also suggested

    Evidence for Composite Cost Functions in Arm Movement Planning: An Inverse Optimal Control Approach

    Get PDF
    An important issue in motor control is understanding the basic principles underlying the accomplishment of natural movements. According to optimal control theory, the problem can be stated in these terms: what cost function do we optimize to coordinate the many more degrees of freedom than necessary to fulfill a specific motor goal? This question has not received a final answer yet, since what is optimized partly depends on the requirements of the task. Many cost functions were proposed in the past, and most of them were found to be in agreement with experimental data. Therefore, the actual principles on which the brain relies to achieve a certain motor behavior are still unclear. Existing results might suggest that movements are not the results of the minimization of single but rather of composite cost functions. In order to better clarify this last point, we consider an innovative experimental paradigm characterized by arm reaching with target redundancy. Within this framework, we make use of an inverse optimal control technique to automatically infer the (combination of) optimality criteria that best fit the experimental data. Results show that the subjects exhibited a consistent behavior during each experimental condition, even though the target point was not prescribed in advance. Inverse and direct optimal control together reveal that the average arm trajectories were best replicated when optimizing the combination of two cost functions, nominally a mix between the absolute work of torques and the integrated squared joint acceleration. Our results thus support the cost combination hypothesis and demonstrate that the recorded movements were closely linked to the combination of two complementary functions related to mechanical energy expenditure and joint-level smoothness
    corecore