1,099 research outputs found

    The Perception of Emotion from Acoustic Cues in Natural Speech

    Get PDF
    Knowledge of human perception of emotional speech is imperative for the development of emotion in speech recognition systems and emotional speech synthesis. Owing to the fact that there is a growing trend towards research on spontaneous, real-life data, the aim of the present thesis is to examine human perception of emotion in naturalistic speech. Although there are many available emotional speech corpora, most contain simulated expressions. Therefore, there remains a compelling need to obtain naturalistic speech corpora that are appropriate and freely available for research. In that regard, our initial aim was to acquire suitable naturalistic material and examine its emotional content based on listener perceptions. A web-based listening tool was developed to accumulate ratings based on large-scale listening groups. The emotional content present in the speech material was demonstrated by performing perception tests on conveyed levels of Activation and Evaluation. As a result, labels were determined that signified the emotional content, and thus contribute to the construction of a naturalistic emotional speech corpus. In line with the literature, the ratings obtained from the perception tests suggested that Evaluation (or hedonic valence) is not identified as reliably as Activation is. Emotional valence can be conveyed through both semantic and prosodic information, for which the meaning of one may serve to facilitate, modify, or conflict with the meaning of the other—particularly with naturalistic speech. The subsequent experiments aimed to investigate this concept by comparing ratings from perception tests of non-verbal speech with verbal speech. The method used to render non-verbal speech was low-pass filtering, and for this, suitable filtering conditions were determined by carrying out preliminary perception tests. The results suggested that nonverbal naturalistic speech provides sufficiently discernible levels of Activation and Evaluation. It appears that the perception of Activation and Evaluation is affected by low-pass filtering, but that the effect is relatively small. Moreover, the results suggest that there is a similar trend in agreement levels between verbal and non-verbal speech. To date it still remains difficult to determine unique acoustical patterns for hedonic valence of emotion, which may be due to inadequate labels or the incorrect selection of acoustic parameters. This study has implications for the labelling of emotional speech data and the determination of salient acoustic correlates of emotion

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    A cross-cultural investigation of the vocal correlates of emotion

    Get PDF
    PhD ThesisUniversal and culture-specific properties of the vocal communication of human emotion are investigated in this balanced study focussing on encoding and decoding of Happy, Sad, Angry, Fearful and Calm by English and Japanese participants (eight female encoders for each culture, and eight female and eight male decoders for each culture). Previous methodologies and findings are compared. This investigation is novel in the design of symmetrical procedures to facilitate cross-cultural comparison of results of decoding tests and acoustic analysis; a simulation/self-induction method was used in which participants from both cultures produced, as far as possible, the same pseudo-utterances. All emotions were distinguished beyond chance irrespective of culture, except for Japanese participants’ decoding of English Fearful, which was decoded at a level borderline with chance. Angry and Sad were well-recognised, both in-group and cross-culturally and Happy was identified well in-group. Confusions between emotions tended to follow dimensional lines of arousal or valence. Acoustic analysis found significant distinctions between all emotions for each culture, except between the two low arousal emotions Sad and Calm. Evidence of ‘In-Group Advantage’ was found for English decoding of Happy, Fearful and Calm and for Japanese decoding of Happy; there is support for previous evidence of East/West cultural differences in display rules. A novel concept is suggested for the finding that Japanese decoders identified Happy, Sad and Angry more reliably from English than from Japanese expressions. Whilst duration, fundamental frequency and intensity all contributed to distinctions between emotions for English, only measures of fundamental frequency were found to significantly distinguish emotions in Japanese. Acoustic cues tended to be less salient in Japanese than in English when compared to expected cues for high and low arousal emotions. In addition, new evidence was found of cross-cultural influence of vowel quality upon emotion recognition

    Handbook of Digital Face Manipulation and Detection

    Get PDF
    This open access book provides the first comprehensive collection of studies dealing with the hot topic of digital face manipulation such as DeepFakes, Face Morphing, or Reenactment. It combines the research fields of biometrics and media forensics including contributions from academia and industry. Appealing to a broad readership, introductory chapters provide a comprehensive overview of the topic, which address readers wishing to gain a brief overview of the state-of-the-art. Subsequent chapters, which delve deeper into various research challenges, are oriented towards advanced readers. Moreover, the book provides a good starting point for young researchers as well as a reference guide pointing at further literature. Hence, the primary readership is academic institutions and industry currently involved in digital face manipulation and detection. The book could easily be used as a recommended text for courses in image processing, machine learning, media forensics, biometrics, and the general security area

    Emotion recognition: recognition of emotions through voice

    Get PDF
    Dissertação de mestrado integrado em Informatics EngineeringAs the years go by, the interaction between humans and machines seems to gain more and more importance for many different reasons, whether it's taken into consideration personal or commercial use. On a time where technology is reaching many parts of our lives, it's important to keep thriving for a healthy progress and help not only to improve but also to maintain the benefits that everyone gets from it. This relationship can be tackled through many points, but here the focus will be on the mind. Emotions are still a mystery. The concept itself brings up serious questions because of its complex nature. Till the date, scientists still struggle to understand it, so it's crucial to pave the right path for the growth on technology on the aid of such topic. There is some consensus on a few indicators that provide important insights on mental state, like words used, facial expressions, voice. The context of this work is on the use of voice and, based on the field of Automatic Speech Emotion Recognition, it is proposed a full pipeline of work with a wide scope by resorting to sound capture and signal processing software, to learning and classifying through algorithms belonging on the Semi Supervised Learning paradigm and visualization techniques for interpretation of results. For the classification of the samples,using a semi-supervised approach with Neural Networks represents an important setting to try alleviating the dependency of human labelling of emotions, a task that has proven to be challenging and, in many cases, highly subjective, not to mention expensive. It is intended to rely mostly on empiric results more than theoretical concepts due to the complexity of the human emotions concept and its inherent uncertainty, but never to disregard prior knowledge on the matter.À medida que os anos passam, a interacção entre indivíduos e máquinas tem vindo a ganhar maior importância por várias razões, quer seja para uso pessoal ou comercial. Numa altura onde a tecnologia está a chegar a várias partes das nossas vidas, é importante continuar a perseguir um progresso saudável e ajudar não só a melhorar mas também manter os benefícios que todos recebem. Esta relação pode ser abordada por vários pontos, neste trabalho o foco está na mente. Emoções são um mistério. O próprio conceito levanta questões sobre a sua natureza complexa. Até aos dias de hoje, muitos cientistas debatem-se para a compreender, e é crucial que um caminho apropriado seja criado para o crescimento de tecnologia na ajuda da compreensão deste assunto. Existe algum consenso sobre indicadores que demonstram pistas importantes sobre o estado mental de um sujeito, como palavras, expressões faciais, voz. O conteúdo deste trabalho foca-se na voz e, com base no campo de Automatic Speech Emotion Recognition, é proposto uma sequência de procedimentos diversificados, ao optar por software de captura de som e processamento de sinais, aprendizagem e classificação através de algoritmos de Aprendizagem Semi Supervisionada e técnicas de visualização para interpretar resultados. Para a classificação de amostras, o uso de uma abordagem Semi Supervisionada com redes neuronais representam um procedimentos importante para tentar combater a alta dependência da anotação de amostras de emoções humanas, uma tarefa que se demonstra ser árdua e, em muitos casos, altamente subjectiva, para não dizer cara. A intenção é estabelecer raciocínios baseados em factores experimentais, mais que teóricos, devido à complexidade do conceito de emoções humanas e à sua incerteza associada, mas tendo sempre em conta conhecimento já estabelecido no assunto

    Example Based Caricature Synthesis

    Get PDF
    The likeness of a caricature to the original face image is an essential and often overlooked part of caricature production. In this paper we present an example based caricature synthesis technique, consisting of shape exaggeration, relationship exaggeration, and optimization for likeness. Rather than relying on a large training set of caricature face pairs, our shape exaggeration step is based on only one or a small number of examples of facial features. The relationship exaggeration step introduces two definitions which facilitate global facial feature synthesis. The first is the T-Shape rule, which describes the relative relationship between the facial elements in an intuitive manner. The second is the so called proportions, which characterizes the facial features in a proportion form. Finally we introduce a similarity metric as the likeness metric based on the Modified Hausdorff Distance (MHD) which allows us to optimize the configuration of facial elements, maximizing likeness while satisfying a number of constraints. The effectiveness of our algorithm is demonstrated with experimental results

    Ultra-high-speed imaging of bubbles interacting with cells and tissue

    Get PDF
    Ultrasound contrast microbubbles are exploited in molecular imaging, where bubbles are directed to target cells and where their high-scattering cross section to ultrasound allows for the detection of pathologies at a molecular level. In therapeutic applications vibrating bubbles close to cells may alter the permeability of cell membranes, and these systems are therefore highly interesting for drug and gene delivery applications using ultrasound. In a more extreme regime bubbles are driven through shock waves to sonoporate or kill cells through intense stresses or jets following inertial bubble collapse. Here, we elucidate some of the underlying mechanisms using the 25-Mfps camera Brandaris128, resolving the bubble dynamics and its interactions with cells. We quantify acoustic microstreaming around oscillating bubbles close to rigid walls and evaluate the shear stresses on nonadherent cells. In a study on the fluid dynamical interaction of cavitation bubbles with adherent cells, we find that the nonspherical collapse of bubbles is responsible for cell detachment. We also visualized the dynamics of vibrating microbubbles in contact with endothelial cells followed by fluorescent imaging of the transport of propidium iodide, used as a membrane integrity probe, into these cells showing a direct correlation between cell deformation and cell membrane permeability
    corecore