25,594 research outputs found

    Analyzing Input and Output Representations for Speech-Driven Gesture Generation

    Full text link
    This paper presents a novel framework for automatic speech-driven gesture generation, applicable to human-agent interaction including both virtual agents and robots. Specifically, we extend recent deep-learning-based, data-driven methods for speech-driven gesture generation by incorporating representation learning. Our model takes speech as input and produces gestures as output, in the form of a sequence of 3D coordinates. Our approach consists of two steps. First, we learn a lower-dimensional representation of human motion using a denoising autoencoder neural network, consisting of a motion encoder MotionE and a motion decoder MotionD. The learned representation preserves the most important aspects of the human pose variation while removing less relevant variation. Second, we train a novel encoder network SpeechE to map from speech to a corresponding motion representation with reduced dimensionality. At test time, the speech encoder and the motion decoder networks are combined: SpeechE predicts motion representations based on a given speech signal and MotionD then decodes these representations to produce motion sequences. We evaluate different representation sizes in order to find the most effective dimensionality for the representation. We also evaluate the effects of using different speech features as input to the model. We find that mel-frequency cepstral coefficients (MFCCs), alone or combined with prosodic features, perform the best. The results of a subsequent user study confirm the benefits of the representation learning.Comment: Accepted at IVA '19. Shorter version published at AAMAS '19. The code is available at https://github.com/GestureGeneration/Speech_driven_gesture_generation_with_autoencode

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    From Parallel Sequence Representations to Calligraphic Control: A Conspiracy of Neural Circuits

    Full text link
    Calligraphic writing presents a rich set of challenges to the human movement control system. These challenges include: initial learning, and recall from memory, of prescribed stroke sequences; critical timing of stroke onsets and durations; fine control of grip and contact forces; and letter-form invariance under voluntary size scaling, which entails fine control of stroke direction and amplitude during recruitment and derecruitment of musculoskeletal degrees of freedom. Experimental and computational studies in behavioral neuroscience have made rapid progress toward explaining the learning, planning and contTOl exercised in tasks that share features with calligraphic writing and drawing. This article summarizes computational neuroscience models and related neurobiological data that reveal critical operations spanning from parallel sequence representations to fine force control. Part one addresses stroke sequencing. It treats competitive queuing (CQ) models of sequence representation, performance, learning, and recall. Part two addresses letter size scaling and motor equivalence. It treats cursive handwriting models together with models in which sensory-motor tmnsformations are performed by circuits that learn inverse differential kinematic mappings. Part three addresses fine-grained control of timing and transient forces, by treating circuit models that learn to solve inverse dynamics problems.National Institutes of Health (R01 DC02852

    Computational and Robotic Models of Early Language Development: A Review

    Get PDF
    We review computational and robotics models of early language learning and development. We first explain why and how these models are used to understand better how children learn language. We argue that they provide concrete theories of language learning as a complex dynamic system, complementing traditional methods in psychology and linguistics. We review different modeling formalisms, grounded in techniques from machine learning and artificial intelligence such as Bayesian and neural network approaches. We then discuss their role in understanding several key mechanisms of language development: cross-situational statistical learning, embodiment, situated social interaction, intrinsically motivated learning, and cultural evolution. We conclude by discussing future challenges for research, including modeling of large-scale empirical data about language acquisition in real-world environments. Keywords: Early language learning, Computational and robotic models, machine learning, development, embodiment, social interaction, intrinsic motivation, self-organization, dynamical systems, complexity.Comment: to appear in International Handbook on Language Development, ed. J. Horst and J. von Koss Torkildsen, Routledg

    Integration of Action and Language Knowledge: A Roadmap for Developmental Robotics

    Get PDF
    “This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder." “Copyright IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.”This position paper proposes that the study of embodied cognitive agents, such as humanoid robots, can advance our understanding of the cognitive development of complex sensorimotor, linguistic, and social learning skills. This in turn will benefit the design of cognitive robots capable of learning to handle and manipulate objects and tools autonomously, to cooperate and communicate with other robots and humans, and to adapt their abilities to changing internal, environmental, and social conditions. Four key areas of research challenges are discussed, specifically for the issues related to the understanding of: 1) how agents learn and represent compositional actions; 2) how agents learn and represent compositional lexica; 3) the dynamics of social interaction and learning; and 4) how compositional action and language representations are integrated to bootstrap the cognitive system. The review of specific issues and progress in these areas is then translated into a practical roadmap based on a series of milestones. These milestones provide a possible set of cognitive robotics goals and test scenarios, thus acting as a research roadmap for future work on cognitive developmental robotics.Peer reviewe

    Objective dysphonia quantification in vocal fold paralysis: comparing nonlinear with classical measures

    Get PDF
    Clinical acoustic voice recording analysis is usually performed using classical perturbation measures including jitter, shimmer and noise-to-harmonic ratios. However, restrictive mathematical limitations of these measures prevent analysis for severely dysphonic voices. Previous studies of alternative nonlinear random measures addressed wide varieties of vocal pathologies. Here, we analyze a single vocal pathology cohort, testing the performance of these alternative measures alongside classical measures.

We present voice analysis pre- and post-operatively in unilateral vocal fold paralysis (UVFP) patients and healthy controls, patients undergoing standard medialisation thyroplasty surgery, using jitter, shimmer and noise-to-harmonic ratio (NHR), and nonlinear recurrence period density entropy (RPDE), detrended fluctuation analysis (DFA) and correlation dimension. Systematizing the preparative editing of the recordings, we found that the novel measures were more stable and hence reliable, than the classical measures, on healthy controls.

RPDE and jitter are sensitive to improvements pre- to post-operation. Shimmer, NHR and DFA showed no significant change (p > 0.05). All measures detect statistically significant and clinically important differences between controls and patients, both treated and untreated (p < 0.001, AUC > 0.7). Pre- to post-operation, GRBAS ratings show statistically significant and clinically important improvement in overall dysphonia grade (G) (AUC = 0.946, p < 0.001).

Re-calculating AUCs from other study data, we compare these results in terms of clinical importance. We conclude that, when preparative editing is systematized, nonlinear random measures may be useful UVFP treatment effectiveness monitoring tools, and there may be applications for other forms of dysphonia.
&#xa
    • …
    corecore