833 research outputs found

    Parametric Hidden Markov Models for Recognition and Synthesis of Movements

    Get PDF
    In humanoid robotics, the recognition and synthesis of parametric movements plays an extraordinary role for robot human interaction. Such a parametric movement is a movement of a particular type (semantic), for example, similar pointing movements performed at different table-top positions. For understanding the whole meaning of a movement of a human, the recognition of its type, likewise its parameterization are important. Only both together convey the whole meaning. Vice versa, for mimicry, the synthesis of movements for the motor control of a robot needs to be parameterized, e.g., by the relative position a grasping action is performed at. For both cases, synthesis and recognition, only parametric approaches are meaningful as it is not feasible to store, or acquire all possible trajectories. In this paper, we use hidden Markov models (HMMs) extended in an exemplar-based parametric way (PHMM) to represent parametric movements. As HMMs are generative, they are well suited for synthesis as well as for recognition. Synthesis and recognition are carried out through interpolation of exemplar movements to generalize over the parameterization of a movement class. In the evaluation of the approach we concentrate on a systematical validation for two parametric movements, grasping and pointing. Even though the movements are very similar in appearance our approach is able to distinguish the two movement types reasonable well. In further experiments, we show the applicability for online recognition based on very noisy 3D tracking data. The use of a parametric representation of movements is shown in a robot demo, where a robot removes objects from a table as demonstrated by an advisor. The synthesis for motor control is performed for arbitrary table-top positions

    Bayesian fusion of hidden Markov models for understanding bimanual movements

    Get PDF
    Understanding hand and body gestures is a part of a wide spectrum of current research in computer vision and human-computer interaction. A part of this can be the recognition of movements in which the two hands move simultaneously to do something or imply a meaning. We present a Bayesian network for fusing hidden Markov models in order to recognise a bimanual movement. A bimanual movement is tracked and segmented by a tracking algorithm. Hidden Markov models are assigned to the segments in order to learn and recognize the partial movement within each segment. A Bayesian network fuses the HMMs in order to perceive the movement of the two hands as a single entity

    Speech Synthesis Based on Hidden Markov Models

    Get PDF

    Parametric Human Movements:Learning, Synthesis, Recognition, and Tracking

    Get PDF

    Parallel Reference Speaker Weighting for Kinematic-Independent Acoustic-to-Articulatory Inversion

    Get PDF
    Acoustic-to-articulatory inversion, the estimation of articulatory kinematics from an acoustic waveform, is a challenging but important problem. Accurate estimation of articulatory movements has the potential for significant impact on our understanding of speech production, on our capacity to assess and treat pathologies in a clinical setting, and on speech technologies such as computer aided pronunciation assessment and audio-video synthesis. However, because of the complex and speaker-specific relationship between articulation and acoustics, existing approaches for inversion do not generalize well across speakers. As acquiring speaker-specific kinematic data for training is not feasible in many practical applications, this remains an important and open problem. This paper proposes a novel approach to acoustic-to-articulatory inversion, Parallel Reference Speaker Weighting (PRSW), which requires no kinematic data for the target speaker and a small amount of acoustic adaptation data. PRSW hypothesizes that acoustic and kinematic similarities are correlated and uses speaker-adapted articulatory models derived from acoustically derived weights. The system was assessed using a 20-speaker data set of synchronous acoustic and Electromagnetic Articulography (EMA) kinematic data. Results demonstrate that by restricting the reference group to a subset consisting of speakers with strong individual speaker-dependent inversion performance, the PRSW method is able to attain kinematic-independent acoustic-to-articulatory inversion performance nearly matching that of the speaker-dependent model, with an average correlation of 0.62 versus 0.63. This indicates that given a sufficiently complete and appropriately selected reference speaker set for adaptation, it is possible to create effective articulatory models without kinematic training data

    Articulatory Control of HMM-based Parametric Speech Synthesis using Feature-Space-Switched Multiple Regression

    Get PDF
    corecore