8,892 research outputs found
From virtual demonstration to real-world manipulation using LSTM and MDN
Robots assisting the disabled or elderly must perform complex manipulation
tasks and must adapt to the home environment and preferences of their user.
Learning from demonstration is a promising choice, that would allow the
non-technical user to teach the robot different tasks. However, collecting
demonstrations in the home environment of a disabled user is time consuming,
disruptive to the comfort of the user, and presents safety challenges. It would
be desirable to perform the demonstrations in a virtual environment. In this
paper we describe a solution to the challenging problem of behavior transfer
from virtual demonstration to a physical robot. The virtual demonstrations are
used to train a deep neural network based controller, which is using a Long
Short Term Memory (LSTM) recurrent neural network to generate trajectories. The
training process uses a Mixture Density Network (MDN) to calculate an error
signal suitable for the multimodal nature of demonstrations. The controller
learned in the virtual environment is transferred to a physical robot (a
Rethink Robotics Baxter). An off-the-shelf vision component is used to
substitute for geometric knowledge available in the simulation and an inverse
kinematics module is used to allow the Baxter to enact the trajectory. Our
experimental studies validate the three contributions of the paper: (1) the
controller learned from virtual demonstrations can be used to successfully
perform the manipulation tasks on a physical robot, (2) the LSTM+MDN
architectural choice outperforms other choices, such as the use of feedforward
networks and mean-squared error based training signals and (3) allowing
imperfect demonstrations in the training set also allows the controller to
learn how to correct its manipulation mistakes
Data-Driven Grasp Synthesis - A Survey
We review the work on data-driven grasp synthesis and the methodologies for
sampling and ranking candidate grasps. We divide the approaches into three
groups based on whether they synthesize grasps for known, familiar or unknown
objects. This structure allows us to identify common object representations and
perceptual processes that facilitate the employed data-driven grasp synthesis
technique. In the case of known objects, we concentrate on the approaches that
are based on object recognition and pose estimation. In the case of familiar
objects, the techniques use some form of a similarity matching to a set of
previously encountered objects. Finally for the approaches dealing with unknown
objects, the core part is the extraction of specific features that are
indicative of good grasps. Our survey provides an overview of the different
methodologies and discusses open problems in the area of robot grasping. We
also draw a parallel to the classical approaches that rely on analytic
formulations.Comment: 20 pages, 30 Figures, submitted to IEEE Transactions on Robotic
Recommended from our members
Exposing piaget's scheme: Empirical evidence for the ontogenesis of coordination in learning a mathematical concept
The combination of two methodological resources-natural-user interfaces (NUI) and multimodal learning analytics (MMLA)-is creating opportunities for educational researchers to empirically evaluate seminal models for the hypothetical emergence of concepts from situated sensorimotor activity. 76 participants (9-14 yo) solved tablet-based non-symbolic manipulation tasks designed to foster grounded meanings for the mathematical concept of proportional equivalence. Data gathered in task-based semi-structured clinical interviews included action logging, eye-gaze tracking, and videography. Successful task performance coincided with spontaneous appearance of stable dynamical gaze-path patterns soon followed by multimodal articulation of strategy. Significantly, gaze patterns included uncued non-salient screen locations. We present cumulative results to argue that these 'attentional anchors' mediated participants' problem solving. We interpret the findings as enabling us to revisit, support, refine, and elaborate on central claims of Piaget's theory of genetic epistemology and in particular his insistence on the role of situated motor-action coordination in the process of reflective abstraction
Interactively Picking Real-World Objects with Unconstrained Spoken Language Instructions
Comprehension of spoken natural language is an essential component for robots
to communicate with human effectively. However, handling unconstrained spoken
instructions is challenging due to (1) complex structures including a wide
variety of expressions used in spoken language and (2) inherent ambiguity in
interpretation of human instructions. In this paper, we propose the first
comprehensive system that can handle unconstrained spoken language and is able
to effectively resolve ambiguity in spoken instructions. Specifically, we
integrate deep-learning-based object detection together with natural language
processing technologies to handle unconstrained spoken instructions, and
propose a method for robots to resolve instruction ambiguity through dialogue.
Through our experiments on both a simulated environment as well as a physical
industrial robot arm, we demonstrate the ability of our system to understand
natural instructions from human operators effectively, and how higher success
rates of the object picking task can be achieved through an interactive
clarification process.Comment: 9 pages. International Conference on Robotics and Automation (ICRA)
2018. Accompanying videos are available at the following links:
https://youtu.be/_Uyv1XIUqhk (the system submitted to ICRA-2018) and
http://youtu.be/DGJazkyw0Ws (with improvements after ICRA-2018 submission
Effective Natural Language Interfaces for Data Visualization Tools
How many Covid cases and deaths are there in my hometown? How much money was invested into renewable energy projects across states in the last 5 years? How large was the biggest investment in solar energy projects in the previous year? These questions and others are of interest to users and can often be answered by data visualization tools (e.g., COVID-19 dashboards) provided by governmental organizations or other institutions. However, while users in organizations or private life with limited expertise with data visualization tools (hereafter referred to as end users) are also interested in these topics, they do not necessarily have knowledge of how to use these data visualization tools effectively to answer these questions. This challenge is highlighted by previous research that provided evidence suggesting that while business analysts and other experts can effectively use these data visualization tools, end users with limited expertise with data visualization tools are still impeded in their interactions.
One approach to tackle this problem is natural language interfaces (NLIs) that provide end users with a more intuitive way of interacting with these data visualization tools. End users would be enabled to interact with the data visualization tool both by utilizing the graphical user interface (GUI) elements and by just typing or speaking a natural language (NL) input to the data visualization tool. While NLIs for data visualization tools have been regarded as a promising approach to improving the interaction, two design challenges still remain. First, existing NLIs for data visualization tools still target users who are familiar with the technology, such as business analysts. Consequently, the unique design required by end users that address their specific characteristics and that would enable the effective use of data visualization tools by them is not included in existing NLIs for data visualization tools. Second, developers of NLIs for data visualization tools are not able to foresee all NL inputs and tasks that end users want to perform with these NLIs for data visualization tools. Consequently, errors still occur in current NLIs for data visualization tools. End users need to be therefore enabled to continuously improve and personalize the NLI themselves by addressing these errors. However, only limited work exists that focus on enabling end users in teaching NLIs for data visualization tools how to correctly respond to new NL inputs.
This thesis addresses these design challenges and provides insights into the related research questions. Furthermore, this thesis contributes prescriptive knowledge on how to design effective NLIs for data visualization tools. Specifically, this thesis provides insights into how data visualization tools can be extended through NLIs to improve their effective use by end users and how to enable end users to effectively teach NLIs how to respond to new NL inputs.
Furthermore, this thesis provides high-level guidance that developers and providers of data visualization tools can utilize as a blueprint for developing data visualization tools with NLIs for end users and outlines future research opportunities that are of interest in supporting end users to effectively use data visualization tools
- …