14 research outputs found

    Distance-based kernels for dynamical movement primitives

    Get PDF
    In the Anchoring Problem actions and objects must be anchored to symbols; and movement primitives as DMPs seems a good option to describe actions. In the bottom-up approach to anchoring, the recognition of an action is done applying learning techniques as clustering. Although most work done about movement recognition with DMPs is focus on weights, we propose to use the shape-attractor function as feature vector. As several DMPs formulations exist, we have analyzed the two most known to check if using the shape-attractor instead of weights is feasible for both formulations. In addition, we propose to use distance-based kernels, as RBF and TrE, to classify DMPs in some predefined actions. Our experiments based on an existing dataset and using 1-NN and SVM techniques confirm that shape-attractor function is a better choice for movement recognition with DMPs.Peer ReviewedPostprint (author's final draft

    Model Mediated Teleoperation with a Hand-Arm Exoskeleton in Long Time Delays Using Reinforcement Learning

    Get PDF
    Telerobotic systems must adapt to new environmental conditions and deal with high uncertainty caused by long-time delays. As one of the best alternatives to human-level intelligence, Reinforcement Learning (RL) may offer a solution to cope with these issues. This paper proposes to integrate RL with the Model Mediated Teleoperation (MMT) concept. The teleoperator interacts with a simulated virtual environment, which provides instant feedback. Whereas feedback from the real environment is delayed, feedback from the model is instantaneous, leading to high transparency. The MMT is realized in combination with an intelligent system with two layers. The first layer utilizes Dynamic Movement Primitives (DMP) which accounts for certain changes in the avatar environment. And, the second layer addresses the problems caused by uncertainty in the model using RL methods. Augmented reality was also provided to fuse the avatar device and virtual environment models for the teleoperator. Implemented on DLR's Exodex Adam hand-arm haptic exoskeleton, the results show RL methods are able to find different solutions when changes are applied to the object position after the demonstration. The results also show DMPs to be effective at adapting to new conditions where there is no uncertainty involved

    Unifying Skill-Based Programming and Programming by Demonstration through Ontologies

    Get PDF
    Smart manufacturing requires easily reconfigurable robotic systems to increase the flexibility in presence of market uncertainties by reducing the set-up times for new tasks. One enabler of fast reconfigurability is given by intuitive robot programming methods. On the one hand, offline skill-based programming (OSP) allows the definition of new tasks by sequencing pre-defined, parameterizable building blocks termed as skills in a graphical user interface. On the other hand, programming by demonstration (PbD) is a well known technique that uses kinesthetic teaching for intuitive robot programming, where this work presents an approach to automatically recognize skills from the human demonstration and parameterize them using the recorded data. The approach further unifies both programming modes of OSP and PbD with the help of an ontological knowledge base and empowers the end user to choose the preferred mode for each phase of the task. In the experiments, we evaluate two scenarios with different sequences of programming modes being selected by the user to define a task. In each scenario, skills are recognized by a data-driven classifier and automatically parameterized from the recorded data. The fully defined tasks consist of both manually added and automatically recognized skills and are executed in the context of a realistic industrial assembly environment

    Learning multiple strategies to perform a valve turning with underwater currents using an I-AUV

    Full text link

    Model Mediated Teleoperation with a Hand-Arm Exoskeleton in Long Time Delays Using Reinforcement Learning

    Get PDF
    elerobotic systems must adapt to new environmental conditions and deal with high uncertainty caused by long-time delays. As one of the best alternatives to human-level intelligence, Reinforcement Learning (RL) may offer a solution to cope with these issues. This paper proposes to integrate RL with the Model Mediated Teleoperation (MMT) concept. The teleoperator interacts with a simulated virtual environment, which provides instant feedback. Whereas feedback from the real environment is delayed, feedback from the model is instantaneous, leading to high transparency. The MMT is realized in combination with an intelligent system with two layers. The first layer utilizes Dynamic Movement Primitives (DMP) which accounts for certain changes in the avatar environment. And, the second layer addresses the problems caused by uncertainty in the model using RL methods. Augmented reality was also provided to fuse the avatar device and virtual environment models for the teleoperator. Implemented on DLR's Exodex Adam hand-arm haptic exoskeleton, the results show RL methods are able to find different solutions when changes are applied to the object position after the demonstration. The results also show DMPs to be effective at adapting to new conditions where there is no uncertainty involved

    Incremental motor skill learning and generalization from human dynamic reactions based on dynamic movement primitives and fuzzy logic system

    Get PDF
    Different from previous work on single skill learning from human demonstrations, an incremental motor skill learning, generalization and control method based on dynamic movement primitives (DMP) and broad learning system (BLS) is proposed for extracting both ordinary skills and instant reactive skills from demonstrations, the latter of which is usually generated to avoid a sudden danger (e.g., touching a hot cup). The method is completed in three steps. First, ordinary skills are basically learned from demonstrations in normal cases by using DMP. Then the incremental learning idea of BLS is combined with DMP to achieve multi-stylistic reactive skill learning such that the forcing function of the ordinary skills will be reasonably extended into multiple stylistic functions by adding enhancement terms and updating weights of the radial basis function (RBF) kernels. Finally, electromyography (EMG) signals are collected from human muscles and processed to achieve stiffness factors. By using fuzzy logic system (FLS), the two kinds of skills learned are integrated and generalized in new cases such that not only start, end and scaling factors but also the environmental conditions, robot reactive strategies and impedance control factors will be generalized to lead to various reactions. To verify the effectiveness of the proposed method, an obstacle avoidance experiment that enables robots to approach destinations flexibly in various situations with barriers will be undertaken

    ๋ชจ์…˜ ํ”„๋ฆฌ๋จธํ‹ฐ๋ธŒ๋ฅผ ์ด์šฉํ•œ ๋ณต์žกํ•œ ๋กœ๋ด‡ ์ž„๋ฌด ํ•™์Šต ๋ฐ ์ผ๋ฐ˜ํ™” ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ํ•ญ๊ณต์šฐ์ฃผ๊ณตํ•™๊ณผ, 2020. 8. ๊น€ํ˜„์ง„.Learning from demonstrations (LfD) is a promising approach that enables robots to perform a specific movement. As robotic manipulations are substituting a variety of tasks, LfD algorithms are widely used and studied for specifying the robot configurations for the various types of movements. This dissertation presents an approach based on parametric dynamic movement primitives (PDMP) as a motion representation algorithm which is one of relevant LfD techniques. Unlike existing motion representation algorithms, this work not only represents a prescribed motion but also computes the new behavior through a generalization of multiple demonstrations in the actual environment. The generalization process uses Gaussian process regression (GPR) by representing the nonlinear relationship between the PDMP parameters that determine motion and the corresponding environmental variables. The proposed algorithm shows that it serves as a powerful optimal and real-time motion planner among the existing planning algorithms when optimal demonstrations are provided as dataset. In this dissertation, the safety of motion is also considered. Here, safety refers to keeping the system away from certain configurations that are unsafe. The safety criterion of the PDMP internal parameters are computed to check the safety. This safety criterion reflects the new behavior computed through the generalization process, as well as the individual motion safety of the demonstration set. The demonstrations causing unsafe movement are identified and removed. Also, the demolished demonstrations are replaced by proven demonstrations upon this criterion. This work also presents an extension approach reducing the number of required demonstrations for the PDMP framework. This approach is effective where a single mission consists of multiple sub-tasks and requires numerous demonstrations in generalizing them. The whole trajectories in provided demonstrations are segmented into multiple sub-tasks representing unit motions. Then, multiple PDMPs are formed independently for correlated-segments. The phase-decision process determines which sub-task and associated PDMPs to be executed online, allowing multiple PDMPs to be autonomously configured within an integrated framework. GPR formulations are applied to obtain execution time and regional goal configuration for each sub-task. Finally, the proposed approach and its extension are validated with the actual experiments of mobile manipulators. The first two scenarios regarding cooperative aerial transportation demonstrate the excellence of the proposed technique in terms of quick computation, generation of efficient movement, and safety assurance. The last scenario deals with two mobile manipulations using ground vehicles and shows the effectiveness of the proposed extension in executing complex missions.์‹œ์—ฐ ํ•™์Šต ๊ธฐ๋ฒ•(Learning from demonstrations, LfD)์€ ๋กœ๋ด‡์ด ํŠน์ • ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ์œ ๋งํ•œ ๋™์ž‘ ์ƒ์„ฑ ๊ธฐ๋ฒ•์ด๋‹ค. ๋กœ๋ด‡ ์กฐ์ž‘๊ธฐ๊ฐ€ ์ธ๊ฐ„ ์‚ฌํšŒ์—์„œ ๋‹ค์–‘ํ•œ ์—…๋ฌด๋ฅผ ๋Œ€์ฒดํ•ด ๊ฐ์— ๋”ฐ๋ผ, ๋‹ค์–‘ํ•œ ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋กœ๋ด‡์˜ ๋™์ž‘์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด LfD ์•Œ๊ณ ๋ฆฌ์ฆ˜๋“ค์€ ๋„๋ฆฌ ์—ฐ๊ตฌ๋˜๊ณ , ์‚ฌ์šฉ๋˜๊ณ  ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ LfD ๊ธฐ๋ฒ• ์ค‘ ๋ชจ์…˜ ํ”„๋ฆฌ๋จธํ‹ฐ๋ธŒ ๊ธฐ๋ฐ˜์˜ ๋™์ž‘ ์žฌ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜์ธ Parametric dynamic movement primitives(PDMP)์— ๊ธฐ์ดˆํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์‹œํ•˜๋ฉฐ, ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์ž„๋ฌด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ชจ๋ฐ”์ผ ์กฐ์ž‘๊ธฐ์˜ ๊ถค์ ์„ ์ƒ์„ฑํ•œ๋‹ค. ๊ธฐ์กด์˜ ๋™์ž‘ ์žฌ์ƒ์„ฑ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋‹ฌ๋ฆฌ, ์ด ์—ฐ๊ตฌ๋Š” ์ œ๊ณต๋œ ์‹œ์—ฐ์—์„œ ํ‘œํ˜„๋œ ๋™์ž‘์„ ๋‹จ์ˆœํžˆ ์žฌ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์— ๊ทธ์น˜์ง€ ์•Š๊ณ , ์ƒˆ๋กœ์šด ํ™˜๊ฒฝ์— ๋งž๊ฒŒ ์ผ๋ฐ˜ํ™” ํ•˜๋Š” ๊ณผ์ •์„ ํฌํ•จํ•œ๋‹ค. ์ด ๋…ผ๋ฌธ์—์„œ ์ œ์‹œํ•˜๋Š” ์ผ๋ฐ˜ํ™” ๊ณผ์ •์€ PDMPs์˜ ๋‚ด๋ถ€ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์ธ ์Šคํƒ€์ผ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ์‚ฌ์ด์˜ ๋น„์„ ํ˜• ๊ด€๊ณ„๋ฅผ ๊ฐ€์šฐ์Šค ํšŒ๊ท€ ๊ธฐ๋ฒ• (Gaussian process regression, GPR)์„ ์ด์šฉํ•˜์—ฌ ์ˆ˜์‹์ ์œผ๋กœ ํ‘œํ˜„ํ•œ๋‹ค. ์ œ์•ˆ๋œ ๊ธฐ๋ฒ•์€ ๋˜ํ•œ ์ตœ์  ์‹œ์—ฐ๋ฅผ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ํ†ตํ•ด ๊ฐ•๋ ฅํ•œ ์ตœ์  ์‹ค์‹œ๊ฐ„ ๊ฒฝ๋กœ ๊ณ„ํš ๊ธฐ๋ฒ•์œผ๋กœ๋„ ์‘์šฉ๋  ์ˆ˜ ์žˆ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ๋˜ํ•œ ๋กœ๋ด‡์˜ ๊ตฌ๋™ ์•ˆ์ „์„ฑ๋„ ๊ณ ๋ คํ•œ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์—์„œ ๋‹ค๋ฃจ์–ด์ง„ ์‹œ์—ฐ ๊ด€๋ฆฌ ๊ธฐ์ˆ ์ด ๋กœ๋ด‡์˜ ๊ตฌ๋™ ํšจ์œจ์„ฑ์„ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ œ์‹œ๋œ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, ์ด ์—ฐ๊ตฌ๋Š” ๊ฐ•ํ•œ ๊ตฌ์†์กฐ๊ฑด์œผ๋กœ ๋กœ๋ด‡์˜ ๊ตฌ๋™ ์•ˆ์ „์„ฑ์„ ํ™•๋ณดํ•˜๋Š” ์‹œ์—ฐ ๊ด€๋ฆฌ ๊ธฐ์ˆ ์„ ํ†ตํ•ด ์•ˆ์ •์„ฑ์„ ๊ณ ๋ คํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฐฉ์‹์„ ์ œ์‹œํ•œ๋‹ค. ์ œ์•ˆ๋œ ๋ฐฉ์‹์€ ์Šคํƒ€์ผ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’ ์ƒ์—์„œ ์•ˆ์ „์„ฑ ๊ธฐ์ค€์„ ๊ณ„์‚ฐํ•˜๋ฉฐ, ์ด ์•ˆ์ „ ๊ธฐ์ค€์„ ํ†ตํ•ด ์‹œ์—ฐ์„ ์ œ๊ฑฐํ•˜๋Š” ์ผ๋ จ์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•œ๋‹ค. ๋˜ํ•œ, ์ œ๊ฑฐ๋œ ์‹œ์œ„๋ฅผ ์•ˆ์ „ ๊ธฐ์ค€์— ๋”ฐ๋ผ ์ž…์ฆ๋œ ์‹œ์œ„๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ์ €ํ•˜์‹œํ‚ค์ง€ ์•Š๋„๋ก ์‹œ์œ„๋ฅผ ๊ด€๋ฆฌํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋‹ค์ˆ˜์˜ ์‹œ์—ฐ ๊ฐ๊ฐ ๊ฐœ๋ณ„ ๋™์ž‘ ์•ˆ์ „์„ฑ ๋ฟ ์•„๋‹ˆ๋ผ ์˜จ๋ผ์ธ ๋™์ž‘์˜ ์•ˆ์ „์„ฑ๊นŒ์ง€ ๊ณ ๋ คํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์‹ค์‹œ๊ฐ„ ๋กœ๋ด‡ ์กฐ์ž‘๊ธฐ ์šด์šฉ์‹œ ์•ˆ์ „์„ฑ์ด ํ™•๋ณด๋  ์ˆ˜ ์žˆ๋‹ค. ์ œ์•ˆ๋œ ์•ˆ์ •์„ฑ์„ ๊ณ ๋ คํ•œ ์‹œ์—ฐ ๊ด€๋ฆฌ ๊ธฐ์ˆ ์€ ๋˜ํ•œ ํ™˜๊ฒฝ์˜ ์ •์  ์„ค์ •์ด ๋ณ€๊ฒฝ๋˜์–ด ๋ชจ๋“  ์‹œ์—ฐ์„ ๊ต์ฒดํ•ด์•ผ ํ•  ์ˆ˜ ์žˆ๋Š” ์ƒํ™ฉ์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ์‹œ์—ฐ๋“ค์„ ํŒ๋ณ„ํ•˜๊ณ , ํšจ์œจ์ ์œผ๋กœ ์žฌ์‚ฌ์šฉํ•˜๋Š” ๋ฐ ์‘์šฉํ•  ์ˆ˜ ์žˆ๋‹ค. ๋˜ํ•œ ๋ณธ ๋…ผ๋ฌธ์€ ๋ณต์žกํ•œ ์ž„๋ฌด์—์„œ ์ ์šฉ๋  ์ˆ˜ ์žˆ๋Š” PDMPs์˜ ํ™•์žฅ ๊ธฐ๋ฒ•์ธ seg-PDMPs๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ด ์ ‘๊ทผ๋ฐฉ์‹์€ ๋ณต์žกํ•œ ์ž„๋ฌด๊ฐ€ ์ผ๋ฐ˜์ ์œผ๋กœ ๋ณต์ˆ˜๊ฐœ์˜ ๊ฐ„๋‹จํ•œ ํ•˜์œ„ ์ž‘์—…์œผ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค๊ณ  ๊ฐ€์ •ํ•œ๋‹ค. ๊ธฐ์กด PDMPs์™€ ๋‹ฌ๋ฆฌ seg-PDMPs๋Š” ์ „์ฒด ๊ถค์ ์„ ํ•˜์œ„ ์ž‘์—…์„ ๋‚˜ํƒ€๋‚ด๋Š” ์—ฌ๋Ÿฌ ๊ฐœ์˜ ๋‹จ์œ„ ๋™์ž‘์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ , ๊ฐ ๋‹จ์œ„๋™์ž‘์— ๋Œ€ํ•ด ์—ฌ๋Ÿฌ๊ฐœ์˜ PDMPs๋ฅผ ๊ตฌ์„ฑํ•œ๋‹ค. ๊ฐ ๋‹จ์œ„ ๋™์ž‘ ๋ณ„๋กœ ์ƒ์„ฑ๋œ PDMPs๋Š” ํ†ตํ•ฉ๋œ ํ”„๋ ˆ์ž„์›Œํฌ๋‚ด์—์„œ ๋‹จ๊ณ„ ๊ฒฐ์ • ํ”„๋กœ์„ธ์Šค๋ฅผ ํ†ตํ•ด ์ž๋™์ ์œผ๋กœ ํ˜ธ์ถœ๋œ๋‹ค. ๊ฐ ๋‹จ๊ณ„ ๋ณ„๋กœ ๋‹จ์œ„ ๋™์ž‘์„ ์ˆ˜ํ–‰ํ•˜๊ธฐ ์œ„ํ•œ ์‹œ๊ฐ„ ๋ฐ ํ•˜์œ„ ๋ชฉํ‘œ์ ์€ ๊ฐ€์šฐ์Šค ๊ณต์ • ํšŒ๊ท€(GPR)๋ฅผ ์ด์šฉํ•œ ํ™˜๊ฒฝ๋ณ€์ˆ˜์™€์˜์˜ ๊ด€๊ณ„์‹์„ ํ†ตํ•ด ์–ป๋Š”๋‹ค. ๊ฒฐ๊ณผ์ ์œผ๋กœ, ์ด ์—ฐ๊ตฌ๋Š” ์ „์ฒด์ ์œผ๋กœ ์š”๊ตฌ๋˜๋Š” ์‹œ์—ฐ์˜ ์ˆ˜๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ์ค„์ผ ๋ฟ ์•„๋‹ˆ๋ผ, ๊ฐ ๋‹จ์œ„๋™์ž‘์˜ ํ‘œํ˜„ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•œ๋‹ค. ์ œ์•ˆ๋œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ํ˜‘๋™ ๋ชจ๋ฐ”์ผ ๋กœ๋ด‡ ์กฐ์ž‘๊ธฐ ์‹คํ—˜์„ ํ†ตํ•˜์—ฌ ๊ฒ€์ฆ๋œ๋‹ค. ์„ธ ๊ฐ€์ง€์˜ ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ๋ณธ ๋…ผ๋ฌธ์—์„œ ๋‹ค๋ฃจ์–ด์ง€๋ฉฐ, ํ•ญ๊ณต ์šด์†ก๊ณผ ๊ด€๋ จ๋œ ์ฒซ ๋‘ ๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” PDMPs ๊ธฐ๋ฒ•์ด ๋กœ๋ด‡ ์กฐ์ž‘๊ธฐ์—์„œ ๋น ๋ฅธ ์ ์‘์„ฑ, ์ž„๋ฌด ํšจ์œจ์„ฑ๊ณผ ์•ˆ์ „์„ฑ ๋ชจ๋‘ ๋งŒ์กฑํ•˜๋Š” ๊ฒƒ์„ ์ž…์ฆํ•œ๋‹ค. ๋งˆ์ง€๋ง‰ ์‹œ๋‚˜๋ฆฌ์˜ค๋Š” ์ง€์ƒ ์ฐจ๋Ÿ‰์„ ์ด์šฉํ•œ ๋‘ ๊ฐœ์˜ ๋กœ๋ด‡ ์กฐ์ž‘๊ธฐ์— ๋Œ€ํ•œ ์‹คํ—˜์œผ๋กœ ๋ณต์žกํ•œ ์ž„๋ฌด ์ˆ˜ํ–‰์„ ํ•˜๊ธฐ ์œ„ํ•ด ํ™•์žฅ๋œ ๊ธฐ๋ฒ•์ธ seg-PDMPs๊ฐ€ ํšจ๊ณผ์ ์œผ๋กœ ๋ณ€ํ™”ํ•˜๋Š” ํ™˜๊ฒฝ์—์„œ ์ผ๋ฐ˜ํ™”๋œ ๋™์ž‘์„ ์ƒ์„ฑํ•จ์„ ๊ฒ€์ฆํ•œ๋‹ค.1 Introduction 1 1.1 Motivations 1 1.2 Literature Survey 3 1.2.1 Conventional Motion Planning in Mobile Manipulations 3 1.2.2 Motion Representation Algorithms 5 1.2.3 Safety-guaranteed Motion Representation Algorithms 7 1.3 Research Objectives and Contributions 7 1.3.1 Motion Generalization in Motion Representation Algorithm 9 1.3.2 Motion Generalization with Safety Guarantee 9 1.3.3 Motion Generalization for Complex Missions 10 1.4 Thesis Organization 11 2 Background 12 2.1 DMPs 12 2.2 Mobile Manipulation Systems 13 2.2.1 Single Mobile Manipulation 14 2.2.2 Cooperative Mobile Manipulations 14 2.3 Experimental Setup 17 2.3.1 Test-beds for Aerial Manipulators 17 2.3.2 Test-beds for Robot Manipulators with Ground Vehicles 17 3 Motion Generalization in Motion Representation Algorithm 22 3.1 Parametric Dynamic Movement Primitives 22 3.2 Generalization Process in PDMPs 26 3.2.1 Environmental Parameters 26 3.2.2 Mapping Function 26 3.3 Simulation Results 29 3.3.1 Two-dimensional Hurdling Motion 29 3.3.2 Cooperative Aerial Transportation 30 4 Motion Generalization with Safety Guarantee 36 4.1 Safety Criterion in Style Parameter 36 4.2 Demonstration Management 39 4.3 Simulation Validation 42 4.3.1 Two-dimensional Hurdling Motion 46 4.3.2 Cooperative Aerial Transportation 47 5 Motion Generalization for Complex Missions 51 5.1 Overall Structure of Seg-PDMPs 51 5.2 Motion Segments 53 5.3 Phase-decision Process 54 5.4 Seg-PDMPs for Single Phase 54 5.5 Simulation Results 55 5.5.1 Initial/terminal Offsets 56 5.5.2 Style Generalization 59 5.5.3 Recombination 61 6 Experimental Validation and Results 63 6.1 Cooperative Aerial Transportation 63 6.2 Cooperative Mobile Hang-dry Mission 70 6.2.1 Demonstrations 70 6.2.2 Simulation Validation 72 6.2.3 Experimental Results 78 7 Conclusions 82 Abstract (in Korean) 93Docto

    Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces

    Get PDF
    QueiรŸer J, Steil JJ. Bootstrapping of parameterized skills through hybrid optimization in task and policy spaces. Frontiers in Robotics and AI. 2018;5:49.Modern robotic applications create high demands on adaptation of actions with respect to variance in a given task. Reinforcement learning is able to optimize for these changing conditions, but relearning from scratch is hardly feasible due to the high number of required rollouts. We propose a parameterized skill that generalizes to new actions for changing task parameters, which is encoded as a meta-learner that provides parameters for task-specific dynamic motion primitives. Our work shows that utilizing parameterized skills for initialization of the optimization process leads to a more effective incremental task learning. In addition, we introduce a hybrid optimization method that combines a fast coarse optimization on a manifold of policy parameters with a fine grained parameter search in the unrestricted space of actions. The proposed algorithm reduces the number of required rollouts for adaptation to new task conditions. Application in illustrative toy scenarios, for a 10-DOF planar arm, and a humanoid robot point reaching task validate the approach
    corecore