Search CORE

4,360 research outputs found

Acta Cybernetica : Volume 19. Number 4.

Author
Publication venue
Publication date: 01/01/2010
Field of study

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

Author: Lux Florian
Meyer Sarina
Tilli Pascal
Vu Ngoc Thang
Publication venue
Publication date: 26/10/2023
Field of study

Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available. Furthermore, editing an existing human's voice also comes with ethical concerns. In this paper, we propose a method to generate artificial speaker embeddings that cannot be linked to a real human while offering intuitive and fine-grained control over the voice and speaking style of the embeddings, without requiring any labels for speaker or style. The artificial and controllable embeddings can be fed to a speech synthesis system, conditioned on embeddings of real humans during training, without sacrificing privacy during inference.Comment: Published at ISCA Interspeech 2023 https://www.isca-speech.org/archive/interspeech_2023/lux23_interspeech.htm

arXiv.org e-Print Archive

A multimodal analysis of the sequential organization of verbal and nonverbal interaction

Author: Abuczki Ágnes
Publication venue
Publication date: 01/12/2017
Field of study

University of Debrecen Electronic Archive

Intelligent Advanced User Interfaces for Monitoring Mental Health Wellbeing

Author: Callejas Z.
Cordasco G.
Esposito A.
Fuchs M.
Hemmje M. L.
Maldonato M. N.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

It has become pressing to develop objective and automatic measurements integrated in intelligent diagnostic tools for detecting and monitoring depressive states and enabling an increased precision of diagnoses and clinical decision-makings. The challenge is to exploit behavioral and physiological biomarkers and develop Artificial Intelligent (AI) models able to extract information from a complex combination of signals considered key symptoms. The proposed AI models should be able to help clinicians to rapidly formulate accurate diagnoses and suggest personalized intervention plans ranging from coaching activities (exploiting for example serious games), support networks (via chats, or social networks), and alerts to caregivers, doctors, and care control centers, reducing the considerable burden on national health care institutions in terms of medical, and social costs associated to depression cares

Archivio della ricerca - Università degli studi di Napoli Federico II

Archivio Istituzionale della Ricerca - Università degli Studi della Campania "Luigi Vanvitelli"

Acta Cybernetica : Volume 16. Number 4.

Author
Publication venue
Publication date: 01/01/2004
Field of study

University of Szeged

Intuitive Multimodal Interaction with Communication Robot Fritz

Author: Dominik Joho
Felix Faber
Maren Bennewitz
Sven Behnke
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

CiteSeerX

Time- and value-continuous explainable affect estimation in-the-wild

Author: Pandit Vedhas
Publication venue
Publication date: 27/06/2022
Field of study

Today, the relevance of Affective Computing, i.e., of making computers recognise and simulate human emotions, cannot be overstated. All technology giants (from manufacturers of laptops to mobile phones to smart speakers) are in a fierce competition to make their devices understand not only what is being said, but also how it is being said to recognise user’s emotions. The goals have evolved from predicting the basic emotions (e.g., happy, sad) to now the more nuanced affective states (e.g., relaxed, bored) real-time. The databases used in such research too have evolved, from earlier featuring the acted behaviours to now spontaneous behaviours. There is a more powerful shift lately, called in-the-wild affect recognition, i.e., taking the research out of the laboratory, into the uncontrolled real-world. This thesis discusses, for the very first time, affect recognition for two unique in-the-wild audiovisual databases, GRAS2 and SEWA. The GRAS2 is the only database till date with time- and value-continuous affect annotations for Labov effect-free affective behaviours, i.e., without the participant’s awareness of being recorded (which otherwise is known to affect the naturalness of one’s affective behaviour). The SEWA features participants from six different cultural backgrounds, conversing using a video-calling platform. Thus, SEWA features in-the-wild recordings further corrupted by unpredictable artifacts, such as the network-induced delays, frame-freezing and echoes. The two databases present a unique opportunity to study time- and value-continuous affect estimation that is truly in-the-wild. A novel ‘Evaluator Weighted Estimation’ formulation is proposed to generate a gold standard sequence from several annotations. An illustration is presented demonstrating that the moving bag-of-words (BoW) representation better preserves the temporal context of the features, yet remaining more robust against the outliers compared to other statistical summaries, e.g., moving average. A novel, data-independent randomised codebook is proposed for the BoW representation; especially useful for cross-corpus model generalisation testing when the feature-spaces of the databases differ drastically. Various deep learning models and support vector regressors are used to predict affect dimensions time- and value-continuously. Better generalisability of the models trained on GRAS2 , despite the smaller training size, makes a strong case for the collection and use of Labov effect-free data. A further foundational contribution is the discovery of the missing many-to-many mapping between the mean square error (MSE) and the concordance correlation coefficient (CCC), i.e., between two of the most popular utility functions till date. The newly invented cost function |MSE_{XY}/σ_{XY}| has been evaluated in the experiments aimed at demystifying the inner workings of a well-performing, simple, low-cost neural network effectively utilising the BoW text features. Also proposed herein is the shallowest-possible convolutional neural network (CNN) that uses the facial action unit (FAU) features. The CNN exploits sequential context, but unlike RNNs, also inherently allows data- and process-parallelism. Interestingly, for the most part, these white-box AI models have shown to utilise the provided features consistent with the human perception of emotion expression

OPUS Augsburg

Speech Research conference : Hungarian Research Centre for Linguistics Budapest, 23-24. February 2023 = Beszédkutatás konferencia : Nyelvtudományi Kutatóközpont Budapest, 2023. február 23-24.

Author
Publication venue: Nyelvtudományi Kutatóközpont
Publication date: 01/01/2023
Field of study

Repository of the Academy's Library

COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers

Author: Esposito A.
Esposito A. M.
Hoffmann R.
Müller V. C.
Vinciarelli A.
Publication venue: Springer Berlin Heidelberg
Publication date
Field of study

Earth-prints Repository

Music Information Retrieval: An Inspirational Guide to Transfer from Related Disciplines

Author: Hanjalic Alan
Kurth Frank
Liem Cynthia C.S.
Weninger Felix
Publication venue: Dagstuhl Follow-Ups. Multimodal Music Processing
Publication date: 01/01/2012
Field of study

The emerging field of Music Information Retrieval (MIR) has been influenced by neighboring domains in signal processing and machine learning, including automatic speech recognition, image processing and text information retrieval. In this contribution, we start with concrete examples for methodology transfer between speech and music processing, oriented on the building blocks of pattern recognition: preprocessing, feature extraction, and classification/decoding. We then assume a higher level viewpoint when describing sources of mutual inspiration derived from text and image information retrieval. We conclude that dealing with the peculiarities of music in MIR research has contributed to advancing the state-of-the-art in other fields, and that many future challenges in MIR are strikingly similar to those that other research areas have been facing

Dagstuhl Research Online Publication Server