218 research outputs found

    Talking Robot and the Autonomous Acquisition of Vocalization and Singing Skill

    Get PDF

    A Novel Speech to Mouth Articulation System for Realistic Humanoid Robots

    Get PDF
    A signi�cant ongoing issue in realistic humanoid robotics (RHRs) is inaccurate speech to mouth synchronisation. Even the most advanced robotic systems cannot authentically emulate the natural movements of the human jaw, lips and tongue during verbal communication. These visual and functional irregularities have the potential to propagate the Uncanny Valley Effect (UVE) and reduce speech understanding in human-robot interaction (HRI). This paper outlines the development and testing of a novel Computer Aided Design (CAD) robotic mouth prototype with buccinator actuators for emulating the fluidic movements of the human mouth. The robotic mouth system incorporates a custom Machine Learning (ML) application that measures the acoustic qualities of speech synthesis (SS) and translates this data into servomotor triangulation for triggering jaw, lip and tongue positions. The objective of this study is to improve current robotic mouth design and provide engineers with a framework for increasing the authenticity, accuracy and communication capabilities of RHRs for HRI. The primary contributions of this study are the engineering of a robotic mouth prototype and the programming of a speech processing application that achieved a 79.4% syllable accuracy, 86.7% lip synchronisation accuracy and 0.1s speech to mouth articulation diferential

    A Novel Speech to Mouth Articulation System for Realistic Humanoid Robots

    Get PDF
    A significant ongoing issue in realistic humanoid robotics (RHRs) is inaccurate speech to mouth synchronisation. Even the most advanced robotic systems cannot authentically emulate the natural movements of the human jaw, lips and tongue during verbal communication. These visual and functional irregularities have the potential to propagate the Uncanny Valley Effect (UVE) and reduce speech understanding in human-robot interaction (HRI). This paper outlines the development and testing of a novel Computer Aided Design (CAD) robotic mouth prototype with buccinator actuators for emulating the fluidic movements of the human mouth. The robotic mouth system incorporates a custom Machine Learning (ML) application that measures the acoustic qualities of speech synthesis (SS) and translates this data into servomotor triangulation for triggering jaw, lip and tongue positions. The objective of this study is to improve current robotic mouth design and provide engineers with a framework for increasing the authenticity, accuracy and communication capabilities of RHRs for HRI. The primary contributions of this study are the engineering of a robotic mouth prototype and the programming of a speech processing application that achieved a 79.4% syllable accuracy, 86.7% lip synchronisation accuracy and 0.1s speech to mouth articulation differential

    A system for recognizing human emotions based on speech analysis and facial feature extraction: applications to Human-Robot Interaction

    Get PDF
    With the advance in Artificial Intelligence, humanoid robots start to interact with ordinary people based on the growing understanding of psychological processes. Accumulating evidences in Human Robot Interaction (HRI) suggest that researches are focusing on making an emotional communication between human and robot for creating a social perception, cognition, desired interaction and sensation. Furthermore, robots need to receive human emotion and optimize their behavior to help and interact with a human being in various environments. The most natural way to recognize basic emotions is extracting sets of features from human speech, facial expression and body gesture. A system for recognition of emotions based on speech analysis and facial features extraction can have interesting applications in Human-Robot Interaction. Thus, the Human-Robot Interaction ontology explains how the knowledge of these fundamental sciences is applied in physics (sound analyses), mathematics (face detection and perception), philosophy theory (behavior) and robotic science context. In this project, we carry out a study to recognize basic emotions (sadness, surprise, happiness, anger, fear and disgust). Also, we propose a methodology and a software program for classification of emotions based on speech analysis and facial features extraction. The speech analysis phase attempted to investigate the appropriateness of using acoustic (pitch value, pitch peak, pitch range, intensity and formant), phonetic (speech rate) properties of emotive speech with the freeware program PRAAT, and consists of generating and analyzing a graph of speech signals. The proposed architecture investigated the appropriateness of analyzing emotive speech with the minimal use of signal processing algorithms. 30 participants to the experiment had to repeat five sentences in English (with durations typically between 0.40 s and 2.5 s) in order to extract data relative to pitch (value, range and peak) and rising-falling intonation. Pitch alignments (peak, value and range) have been evaluated and the results have been compared with intensity and speech rate. The facial feature extraction phase uses the mathematical formulation (B\ue9zier curves) and the geometric analysis of the facial image, based on measurements of a set of Action Units (AUs) for classifying the emotion. The proposed technique consists of three steps: (i) detecting the facial region within the image, (ii) extracting and classifying the facial features, (iii) recognizing the emotion. Then, the new data have been merged with reference data in order to recognize the basic emotion. Finally, we combined the two proposed algorithms (speech analysis and facial expression), in order to design a hybrid technique for emotion recognition. Such technique have been implemented in a software program, which can be employed in Human-Robot Interaction. The efficiency of the methodology was evaluated by experimental tests on 30 individuals (15 female and 15 male, 20 to 48 years old) form different ethnic groups, namely: (i) Ten adult European, (ii) Ten Asian (Middle East) adult and (iii) Ten adult American. Eventually, the proposed technique made possible to recognize the basic emotion in most of the cases

    Design And Development Of A Social Robotic Head - Dorothy

    Get PDF
    Master'sMASTER OF ENGINEERIN

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Multi-Sensory Emotion Recognition with Speech and Facial Expression

    Get PDF
    Emotion plays an important role in human beings’ daily lives. Understanding emotions and recognizing how to react to others’ feelings are fundamental to engaging in successful social interactions. Currently, emotion recognition is not only significant in human beings’ daily lives, but also a hot topic in academic research, as new techniques such as emotion recognition from speech context inspires us as to how emotions are related to the content we are uttering. The demand and importance of emotion recognition have highly increased in many applications in recent years, such as video games, human computer interaction, cognitive computing, and affective computing. Emotion recognition can be done from many sources including text, speech, hand, and body gesture as well as facial expression. Presently, most of the emotion recognition methods only use one of these sources. The emotion of human beings changes every second and using a single way to process the emotion recognition may not reflect the emotion correctly. This research is motivated by the desire to understand and evaluate human beings’ emotion from multiple ways such as speech and facial expressions. In this dissertation, multi-sensory emotion recognition has been exploited. The proposed framework can recognize emotion from speech, facial expression, and both of them. There are three important parts in the design of the system: the facial emotion recognizer, the speech emotion recognizer, and the information fusion. The information fusion part uses the results from the speech emotion recognition and facial emotion recognition. Then, a novel weighted method is used to integrate the results, and a final decision of the emotion is given after the fusion. The experiments show that with the weighted fusion methods, the accuracy can be improved to an average of 3.66% compared to fusion without adding weight. The improvement of the recognition rate can reach 18.27% and 5.66% compared to the speech emotion recognition and facial expression recognition, respectively. By improving the emotion recognition accuracy, the proposed multi-sensory emotion recognition system can help to improve the naturalness of human computer interaction

    Can a Robot Smile? Wittgenstein on Facial Expression

    Get PDF
    Some researchers in social robotics aim to build ‘face robots’—machines that interact with human beings (or other robots) by means of facial expression and gesture. They aim, in part, to use these robots to test hypotheses concerning human social and psychological development (and disorders such as autism) in controlled, repeatable experiments. A robot may be said to ‘grin’ and ‘frown’, or to have ‘a smile on its face’. This is not to claim merely that the robot has a certain physical configuration or behaviour; nor is it to say merely that the robot’s ‘facial’ display is, like an emoticon or photograph, a representation of a smile or frown. Although researchers may refrain from claiming that their machines have emotions, they attribute expressive behaviours to them literally and without qualification. Wittgenstein said, however, ‘A smiling mouth smiles only in a human face’. Smiling is a complex conventional gesture. A facial display is a smile only if it has a certain meaning—the meaning that distinguishes a smile from a human grimace or facial tic, and from a chimpanzee’s bared-teeth display. In this paper I explore the implications of Wittgenstein’s remarks on expression for the claim that face robots can smile or frown
    corecore