1,594 research outputs found

    Text2Action: Generative Adversarial Synthesis from Language to Action

    Full text link
    In this paper, we propose a generative model which learns the relationship between language and human action in order to generate a human action sequence given a sentence describing human behavior. The proposed generative model is a generative adversarial network (GAN), which is based on the sequence to sequence (SEQ2SEQ) model. Using the proposed generative network, we can synthesize various actions for a robot or a virtual agent using a text encoder recurrent neural network (RNN) and an action decoder RNN. The proposed generative network is trained from 29,770 pairs of actions and sentence annotations extracted from MSR-Video-to-Text (MSR-VTT), a large-scale video dataset. We demonstrate that the network can generate human-like actions which can be transferred to a Baxter robot, such that the robot performs an action based on a provided sentence. Results show that the proposed generative network correctly models the relationship between language and action and can generate a diverse set of actions from the same sentence.Comment: 8 pages, 10 figure

    Cognitive Reasoning for Compliant Robot Manipulation

    Get PDF
    Physically compliant contact is a major element for many tasks in everyday environments. A universal service robot that is utilized to collect leaves in a park, polish a workpiece, or clean solar panels requires the cognition and manipulation capabilities to facilitate such compliant interaction. Evolution equipped humans with advanced mental abilities to envision physical contact situations and their resulting outcome, dexterous motor skills to perform the actions accordingly, as well as a sense of quality to rate the outcome of the task. In order to achieve human-like performance, a robot must provide the necessary methods to represent, plan, execute, and interpret compliant manipulation tasks. This dissertation covers those four steps of reasoning in the concept of intelligent physical compliance. The contributions advance the capabilities of service robots by combining artificial intelligence reasoning methods and control strategies for compliant manipulation. A classification of manipulation tasks is conducted to identify the central research questions of the addressed topic. Novel representations are derived to describe the properties of physical interaction. Special attention is given to wiping tasks which are predominant in everyday environments. It is investigated how symbolic task descriptions can be translated into meaningful robot commands. A particle distribution model is used to plan goal-oriented wiping actions and predict the quality according to the anticipated result. The planned tool motions are converted into the joint space of the humanoid robot Rollin' Justin to perform the tasks in the real world. In order to execute the motions in a physically compliant fashion, a hierarchical whole-body impedance controller is integrated into the framework. The controller is automatically parameterized with respect to the requirements of the particular task. Haptic feedback is utilized to infer contact and interpret the performance semantically. Finally, the robot is able to compensate for possible disturbances as it plans additional recovery motions while effectively closing the cognitive control loop. Among others, the developed concept is applied in an actual space robotics mission, in which an astronaut aboard the International Space Station (ISS) commands Rollin' Justin to maintain a Martian solar panel farm in a mock-up environment. This application demonstrates the far-reaching impact of the proposed approach and the associated opportunities that emerge with the availability of cognition-enabled service robots

    Artificial general intelligence: Proceedings of the Second Conference on Artificial General Intelligence, AGI 2009, Arlington, Virginia, USA, March 6-9, 2009

    Get PDF
    Artificial General Intelligence (AGI) research focuses on the original and ultimate goal of AI – to create broad human-like and transhuman intelligence, by exploring all available paths, including theoretical and experimental computer science, cognitive science, neuroscience, and innovative interdisciplinary methodologies. Due to the difficulty of this task, for the last few decades the majority of AI researchers have focused on what has been called narrow AI – the production of AI systems displaying intelligence regarding specific, highly constrained tasks. In recent years, however, more and more researchers have recognized the necessity – and feasibility – of returning to the original goals of the field. Increasingly, there is a call for a transition back to confronting the more difficult issues of human level intelligence and more broadly artificial general intelligence

    Directional adposition use in English, Swedish and Finnish

    Get PDF
    Directional adpositions such as to the left of describe where a Figure is in relation to a Ground. English and Swedish directional adpositions refer to the location of a Figure in relation to a Ground, whether both are static or in motion. In contrast, the Finnish directional adpositions edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) solely describe the location of a moving Figure in relation to a moving Ground (Nikanne, 2003). When using directional adpositions, a frame of reference must be assumed for interpreting the meaning of directional adpositions. For example, the meaning of to the left of in English can be based on a relative (speaker or listener based) reference frame or an intrinsic (object based) reference frame (Levinson, 1996). When a Figure and a Ground are both in motion, it is possible for a Figure to be described as being behind or in front of the Ground, even if neither have intrinsic features. As shown by Walker (in preparation), there are good reasons to assume that in the latter case a motion based reference frame is involved. This means that if Finnish speakers would use edellĂ€ (in front of) and jĂ€ljessĂ€ (behind) more frequently in situations where both the Figure and Ground are in motion, a difference in reference frame use between Finnish on one hand and English and Swedish on the other could be expected. We asked native English, Swedish and Finnish speakers’ to select adpositions from a language specific list to describe the location of a Figure relative to a Ground when both were shown to be moving on a computer screen. We were interested in any differences between Finnish, English and Swedish speakers. All languages showed a predominant use of directional spatial adpositions referring to the lexical concepts TO THE LEFT OF, TO THE RIGHT OF, ABOVE and BELOW. There were no differences between the languages in directional adpositions use or reference frame use, including reference frame use based on motion. We conclude that despite differences in the grammars of the languages involved, and potential differences in reference frame system use, the three languages investigated encode Figure location in relation to Ground location in a similar way when both are in motion. Levinson, S. C. (1996). Frames of reference and Molyneux’s question: Crosslingiuistic evidence. In P. Bloom, M.A. Peterson, L. Nadel & M.F. Garrett (Eds.) Language and Space (pp.109-170). Massachusetts: MIT Press. Nikanne, U. (2003). How Finnish postpositions see the axis system. In E. van der Zee & J. Slack (Eds.), Representing direction in language and space. Oxford, UK: Oxford University Press. Walker, C. (in preparation). Motion encoding in language, the use of spatial locatives in a motion context. Unpublished doctoral dissertation, University of Lincoln, Lincoln. United Kingdo

    Embodied Songs: Insights Into the Nature of Cross-Modal Meaning-Making Within Sign Language Informed, Embodied Interpretations of Vocal Music

    Get PDF
    Embodied song practices involve the transformation of songs from the acoustic modality into an embodied-visual form, to increase meaningful access for d/Deaf audiences. This goes beyond the translation of lyrics, by combining poetic sign language with other bodily movements to embody the para-linguistic expressive and musical features that enhance the message of a song. To date, the limited research into this phenomenon has focussed on linguistic features and interactions with rhythm. The relationship between bodily actions and music has not been probed beyond an assumed implication of conformance. However, as the primary objective is to communicate equivalent meanings, the ways that the acoustic and embodied-visual signals relate to each other should reveal something about underlying conceptual agreement. This paper draws together a range of pertinent theories from within a grounded cognition framework including semiotics, analogy mapping and cross-modal correspondences. These theories are applied to embodiment strategies used by prominent d/Deaf and hearing Dutch practitioners, to unpack the relationship between acoustic songs, their embodied representations, and their broader conceptual and affective meanings. This leads to the proposition that meaning primarily arises through shared patterns of internal relations across a range of amodal and cross-modal features with an emphasis on dynamic qualities. These analogous patterns can inform metaphorical interpretations and trigger shared emotional responses. This exploratory survey offers insights into the nature of cross-modal and embodied meaning-making, as a jumping-off point for further research

    Example Based Caricature Synthesis

    Get PDF
    The likeness of a caricature to the original face image is an essential and often overlooked part of caricature production. In this paper we present an example based caricature synthesis technique, consisting of shape exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial features. The relationship exaggeration step introduces two definitions which facilitate global facial feature synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance (MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a number of constraints. The effectiveness of our algorithm is demonstrated with experimental results

    Systematic literature review of hand gestures used in human computer interaction interfaces

    Get PDF
    Gestures, widely accepted as a humans' natural mode of interaction with their surroundings, have been considered for use in human-computer based interfaces since the early 1980s. They have been explored and implemented, with a range of success and maturity levels, in a variety of fields, facilitated by a multitude of technologies. Underpinning gesture theory however focuses on gestures performed simultaneously with speech, and majority of gesture based interfaces are supported by other modes of interaction. This article reports the results of a systematic review undertaken to identify characteristics of touchless/in-air hand gestures used in interaction interfaces. 148 articles were reviewed reporting on gesture-based interaction interfaces, identified through searching engineering and science databases (Engineering Village, Pro Quest, Science Direct, Scopus and Web of Science). The goal of the review was to map the field of gesture-based interfaces, investigate the patterns in gesture use, and identify common combinations of gestures for different combinations of applications and technologies. From the review, the community seems disparate with little evidence of building upon prior work and a fundamental framework of gesture-based interaction is not evident. However, the findings can help inform future developments and provide valuable information about the benefits and drawbacks of different approaches. It was further found that the nature and appropriateness of gestures used was not a primary factor in gesture elicitation when designing gesture based systems, and that ease of technology implementation often took precedence

    Scaling Up with Radically Embodied Cognition

    Get PDF
    Radically embodied cognitive science (REC) is typically concerned with basic cognition such as perception and action. However, complex cognition or higher-order cognition is difficult to explain for REC, as these theories eschew traditional representational explanations. This leaves REC with a scaling-up problem. In this dissertation I will explore options for REC to fix its scaling-up problem. I am specifically interested in autonoetic cognition, which is the ability to remember and imagine objects and events in the way they would be experienced if they were immediately present to be perceived. I contend that a simulationist account provides many of th necessary conceptual tools for understanding autonoetic cognition from a REC perspective. Furthermore, simulationist accounts are generally useful, as they are suggestive of a way to understand the observed neural activity and can be used to make empirical predictions. I will examine different simulationist theories in order to determine whether or not they can cohere with REC and help solve the scaling-up problem. Eventually I will argue that the REC commitment to reject representations makes the scaling-up problem insurmountable at this time
    • 

    corecore