Search CORE

8,892 research outputs found

From virtual demonstration to real-world manipulation using LSTM and MDN

Author: Abolghasemi Pooya
Behal Aman
Bölöni Ladislau
Rahmatizadeh Rouhollah
Publication venue
Publication date: 21/11/2017
Field of study

Robots assisting the disabled or elderly must perform complex manipulation tasks and must adapt to the home environment and preferences of their user. Learning from demonstration is a promising choice, that would allow the non-technical user to teach the robot different tasks. However, collecting demonstrations in the home environment of a disabled user is time consuming, disruptive to the comfort of the user, and presents safety challenges. It would be desirable to perform the demonstrations in a virtual environment. In this paper we describe a solution to the challenging problem of behavior transfer from virtual demonstration to a physical robot. The virtual demonstrations are used to train a deep neural network based controller, which is using a Long Short Term Memory (LSTM) recurrent neural network to generate trajectories. The training process uses a Mixture Density Network (MDN) to calculate an error signal suitable for the multimodal nature of demonstrations. The controller learned in the virtual environment is transferred to a physical robot (a Rethink Robotics Baxter). An off-the-shelf vision component is used to substitute for geometric knowledge available in the simulation and an inverse kinematics module is used to allow the Baxter to enact the trajectory. Our experimental studies validate the three contributions of the paper: (1) the controller learned from virtual demonstrations can be used to successfully perform the manipulation tasks on a physical robot, (2) the LSTM+MDN architectural choice outperforms other choices, such as the use of feedforward networks and mean-squared error based training signals and (3) allowing imperfect demonstrations in the training set also allows the controller to learn how to correct its manipulation mistakes

arXiv.org e-Print Archive

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Data-Driven Grasp Synthesis - A Survey

Author: Asfour Tamim
Bohg Jeannette
Kragic Danica
Morales Antonio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/04/2016
Field of study

We review the work on data-driven grasp synthesis and the methodologies for sampling and ranking candidate grasps. We divide the approaches into three groups based on whether they synthesize grasps for known, familiar or unknown objects. This structure allows us to identify common object representations and perceptual processes that facilitate the employed data-driven grasp synthesis technique. In the case of known objects, we concentrate on the approaches that are based on object recognition and pose estimation. In the case of familiar objects, the techniques use some form of a similarity matching to a set of previously encountered objects. Finally for the approaches dealing with unknown objects, the core part is the extraction of specific features that are indicative of good grasps. Our survey provides an overview of the different methodologies and discusses open problems in the area of robot grasping. We also draw a parallel to the classical approaches that rely on analytic formulations.Comment: 20 pages, 30 Figures, submitted to IEEE Transactions on Robotic

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Exposing piaget's scheme: Empirical evidence for the ontogenesis of coordination in learning a mathematical concept

Author: Abrahamson D
Bakker A
Shayan S
Van Der Schaaf MF
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

The combination of two methodological resources-natural-user interfaces (NUI) and multimodal learning analytics (MMLA)-is creating opportunities for educational researchers to empirically evaluate seminal models for the hypothetical emergence of concepts from situated sensorimotor activity. 76 participants (9-14 yo) solved tablet-based non-symbolic manipulation tasks designed to foster grounded meanings for the mathematical concept of proportional equivalence. Data gathered in task-based semi-structured clinical interviews included action logging, eye-gaze tracking, and videography. Successful task performance coincided with spontaneous appearance of stable dynamical gaze-path patterns soon followed by multimodal articulation of strategy. Significantly, gaze patterns included uncued non-salient screen locations. We present cumulative results to argue that these 'attentional anchors' mediated participants' problem solving. We interpret the findings as enabling us to revisit, support, refine, and elaborate on central claims of Piaget's theory of genetic epistemology and in particular his insistence on the role of situated motor-action coordination in the process of reflective abstraction

eScholarship - University of California

Utrecht University Repository

Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions

Author: Hatori Jun
Kikuchi Yuta
Ko Wilson
Kobayashi Sosuke
Takahashi Kuniyuki
Tan Jethro
Tsuboi Yuta
Unno Yuya
Publication venue
Publication date: 27/03/2018
Field of study

Comprehension of spoken natural language is an essential component for robots to communicate with human effectively. However, handling unconstrained spoken instructions is challenging due to (1) complex structures including a wide variety of expressions used in spoken language and (2) inherent ambiguity in interpretation of human instructions. In this paper, we propose the first comprehensive system that can handle unconstrained spoken language and is able to effectively resolve ambiguity in spoken instructions. Specifically, we integrate deep-learning-based object detection together with natural language processing technologies to handle unconstrained spoken instructions, and propose a method for robots to resolve instruction ambiguity through dialogue. Through our experiments on both a simulated environment as well as a physical industrial robot arm, we demonstrate the ability of our system to understand natural instructions from human operators effectively, and how higher success rates of the object picking task can be achieved through an interactive clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA) 2018. Accompanying videos are available at the following links: https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission

arXiv.org e-Print Archive

Crossref

Effective Natural Language Interfaces for Data Visualization Tools

Author: Ruoff Marcel
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 21/04/2023
Field of study

How many Covid cases and deaths are there in my hometown? How much money was invested into renewable energy projects across states in the last 5 years? How large was the biggest investment in solar energy projects in the previous year? These questions and others are of interest to users and can often be answered by data visualization tools (e.g., COVID-19 dashboards) provided by governmental organizations or other institutions. However, while users in organizations or private life with limited expertise with data visualization tools (hereafter referred to as end users) are also interested in these topics, they do not necessarily have knowledge of how to use these data visualization tools effectively to answer these questions. This challenge is highlighted by previous research that provided evidence suggesting that while business analysts and other experts can effectively use these data visualization tools, end users with limited expertise with data visualization tools are still impeded in their interactions. One approach to tackle this problem is natural language interfaces (NLIs) that provide end users with a more intuitive way of interacting with these data visualization tools. End users would be enabled to interact with the data visualization tool both by utilizing the graphical user interface (GUI) elements and by just typing or speaking a natural language (NL) input to the data visualization tool. While NLIs for data visualization tools have been regarded as a promising approach to improving the interaction, two design challenges still remain. First, existing NLIs for data visualization tools still target users who are familiar with the technology, such as business analysts. Consequently, the unique design required by end users that address their specific characteristics and that would enable the effective use of data visualization tools by them is not included in existing NLIs for data visualization tools. Second, developers of NLIs for data visualization tools are not able to foresee all NL inputs and tasks that end users want to perform with these NLIs for data visualization tools. Consequently, errors still occur in current NLIs for data visualization tools. End users need to be therefore enabled to continuously improve and personalize the NLI themselves by addressing these errors. However, only limited work exists that focus on enabling end users in teaching NLIs for data visualization tools how to correctly respond to new NL inputs. This thesis addresses these design challenges and provides insights into the related research questions. Furthermore, this thesis contributes prescriptive knowledge on how to design effective NLIs for data visualization tools. Specifically, this thesis provides insights into how data visualization tools can be extended through NLIs to improve their effective use by end users and how to enable end users to effectively teach NLIs how to respond to new NL inputs. Furthermore, this thesis provides high-level guidance that developers and providers of data visualization tools can utilize as a blueprint for developing data visualization tools with NLIs for end users and outlines future research opportunities that are of interest in supporting end users to effectively use data visualization tools

KITopen