1,578 research outputs found

    A Comparison of Visualisation Methods for Disambiguating Verbal Requests in Human-Robot Interaction

    Full text link
    Picking up objects requested by a human user is a common task in human-robot interaction. When multiple objects match the user's verbal description, the robot needs to clarify which object the user is referring to before executing the action. Previous research has focused on perceiving user's multimodal behaviour to complement verbal commands or minimising the number of follow up questions to reduce task time. In this paper, we propose a system for reference disambiguation based on visualisation and compare three methods to disambiguate natural language instructions. In a controlled experiment with a YuMi robot, we investigated real-time augmentations of the workspace in three conditions -- mixed reality, augmented reality, and a monitor as the baseline -- using objective measures such as time and accuracy, and subjective measures like engagement, immersion, and display interference. Significant differences were found in accuracy and engagement between the conditions, but no differences were found in task time. Despite the higher error rates in the mixed reality condition, participants found that modality more engaging than the other two, but overall showed preference for the augmented reality condition over the monitor and mixed reality conditions

    Multimodal One-Shot Learning of Speech and Images

    Full text link
    Imagine a robot is shown new concepts visually together with spoken tags, e.g. "milk", "eggs", "butter". After seeing one paired audio-visual example per class, it is shown a new set of unseen instances of these objects, and asked to pick the "milk". Without receiving any hard labels, could it learn to match the new continuous speech input to the correct visual instance? Although unimodal one-shot learning has been studied, where one labelled example in a single modality is given per class, this example motivates multimodal one-shot learning. Our main contribution is to formally define this task, and to propose several baseline and advanced models. We use a dataset of paired spoken and visual digits to specifically investigate recent advances in Siamese convolutional neural networks. Our best Siamese model achieves twice the accuracy of a nearest neighbour model using pixel-distance over images and dynamic time warping over speech in 11-way cross-modal matching.Comment: 5 pages, 1 figure, 3 tables; accepted to ICASSP 201

    Composable Deep Reinforcement Learning for Robotic Manipulation

    Full text link
    Model-free deep reinforcement learning has been shown to exhibit good performance in domains ranging from video games to simulated robotic manipulation and locomotion. However, model-free methods are known to perform poorly when the interaction time with the environment is limited, as is the case for most real-world robotic tasks. In this paper, we study how maximum entropy policies trained using soft Q-learning can be applied to real-world robotic manipulation. The application of this method to real-world manipulation is facilitated by two important features of soft Q-learning. First, soft Q-learning can learn multimodal exploration strategies by learning policies represented by expressive energy-based models. Second, we show that policies learned with soft Q-learning can be composed to create new policies, and that the optimality of the resulting policy can be bounded in terms of the divergence between the composed policies. This compositionality provides an especially valuable tool for real-world manipulation, where constructing new policies by composing existing skills can provide a large gain in efficiency over training from scratch. Our experimental evaluation demonstrates that soft Q-learning is substantially more sample efficient than prior model-free deep reinforcement learning methods, and that compositionality can be performed for both simulated and real-world tasks.Comment: Videos: https://sites.google.com/view/composing-real-world-policies

    Collaborating with a Mobile Robot: An Augmented Reality Multimodal Interface

    Get PDF
    Invited paperWe have created an infrastructure that allows a human to collaborate in a natural manner with a robotic system. In this paper we describe our system and its implementation with a mobile robot. In our prototype the human communicates with the mobile robot using natural speech and gestures, for example, by selecting a point in 3D space and saying “go here” or “go behind that”. The robot responds using speech so the human is able to understand its intentions and beliefs. Augmented Reality (AR) technology is used to facilitate natural use of gestures and provide a common 3D spatial reference for both the robot and human, thus providing a means for grounding of communication and maintaining spatial awareness. This paper first discusses related work then gives a brief overview of AR and its capabilities. The architectural design we have developed is outlined and then a case study is discussed

    Speech & Multimodal Resources: the Herme Database of Spontaneous Multimodal Human-Robot Dialogues

    Get PDF
    This paper presents methodologies and tools for language resource (LR) construction. It describes a database of interactive speech collected over a three-month period at the Science Gallery in Dublin, where visitors could take part in a conversation with a robot. The system collected samples of informal, chatty dialogue – normally difficult to capture under laboratory conditions for human-human dialogue, and particularly so for human-machine interaction. The conversations were based on a script followed by the robot consisting largely of social chat with some task-based elements. The interactions were audio-visually recorded using several cameras together with microphones. As part of the conversation the participants were asked to sign a consent form giving permission to use their data for human-machine interaction research. The multimodal corpus will be made available to interested researchers and the technology developed during the three-month exhibition is being extended for use in education and assisted-living applications

    Including universal design in a summer camp workshop on robotics

    Get PDF
    In this paper we will describe a summer camp short-course intended for high-school students with excellent qualifications. The course is addressed to students who are thinking on studying a technical career including a section on universal design for the first time. The department of Mathematics and Computer Science at Universitat de Barcelona will host a workshop on robotics next summer within the context of Campus CientĂ­ficos de Verano by FundaciĂłn Española para la Ciencia y la TecnologĂ­a.. High-school students will be selected around Spain based on their qualifications and motivation to attend the workshop. The first activity in the summer camp will be the building of Lego Mindstorms robots. These robots contain several sensors and actuators that can be programmed to do different tasks. One of the robots will be programmed to be able to track a line and another two will be programmed to do a Sumo fight on their own. Students will learn how to use sensors and actuators and code programming algorithms. For the second activity the students will develop a Mobile App with the MIT App Inventor2 software [1] in order to control the robots. In this activity students will learn how to program apps in a simple way to complete their understanding of programming. Taking into account European Higher Education Area requirements for Accessibility in technical careers, this workshop will introduce an innovation; the third activity will consist in the adaptation of the app and robots for multimodal access (including sound and sight redundant warnings) and the readjustment of the app’s buttons for users with motor and visual disabilities (e.g. making the buttons bigger and with non-repeating behaviour). Students attending the summer camp will be introduced to the needs and skills of different user profiles of people with disabilities. After this theoretical introduction, they will experience motor and visual disabilities with simulations inspired by the Inclusive design Toolkit resource [2] ].And finally, they will modify the app based on IEEE RWEP Accessible apps by Ayanna Howard [3] so to maximise the accessibility possibilities of App Inventor. Complementary resources will be made available to those students showing interest in this area, such as RWEP prosthetic hands projects, other toolkits and bibliography. This will serve as a first experience for the students and there is no prevision of including technical aids such as GRID2 or similar [4] due to budget restrictions. There are no students with disabilities registered for this year edition so the course does not seek accessibility for participants as authors. We will consider working on accessibility for participants of the following editions of this workshop, building on past experiences reaching this goal [5] [6] [7]. The main focus of the workshop is to encourage the creative learning of a robots summer camp [8], [9] with the inclusion of universal design as an essential requirement in the design and development of computer applications or systems. With this initiative we want to increase awareness on accessibility requirements for future technical students.PID U

    Young children in an education context : apps, cultural agency and expanding communicative repertoires

    Get PDF
    This chapter examines video recorded interactions of children’s engagement with touchscreens in an early education setting. The extracts are taken from an ethnographic research study that explored children’s expanding repertoires for meaning-making as these emerged throughout their first year of school. The episodes presented in this chapter draw on observations of children’s spontaneous interactions with and around two iPad apps. The findings reveal how children’s engagement with iPads has the potential to simultaneously confer children’s cultural agency and further expand children’s repertoires for meaning-making. The discussion provides nuanced interpretations of how touchscreens might contribute positively to young children’s early learning and play experiences
    • 

    corecore