15 research outputs found

    Towards Multimodal Prediction of Spontaneous Humour: A Novel Dataset and First Results

    Full text link
    Humour is a substantial element of human affect and cognition. Its automatic understanding can facilitate a more naturalistic human-device interaction and the humanisation of artificial intelligence. Current methods of humour detection are solely based on staged data making them inadequate for 'real-world' applications. We address this deficiency by introducing the novel Passau-Spontaneous Football Coach Humour (Passau-SFCH) dataset, comprising of about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humour and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments, employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humour recognition is analysed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humour and its sentiment, facial expressions are most promising, while humour direction can be best modelled via text-based features. The results reveal considerable differences among various subjects, highlighting the individuality of humour usage and style. Further, we observe that a decision-level fusion yields the best recognition result. Finally, we make our code publicly available at https://www.github.com/EIHW/passau-sfch. The Passau-SFCH dataset is available upon request.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible (Major Revision

    Zero-shot personalization of speech foundation models for depressed mood monitoring

    Get PDF
    The monitoring of depressed mood plays an important role as a diagnostic tool in psychotherapy. An automated analysis of speech can provide a non-invasive measurement of a patient’s affective state. While speech has been shown to be a useful biomarker for depression, existing approaches mostly build population-level models that aim to predict each individual’s diagnosis as a (mostly) static property. Because of inter-individual differences in symptomatology and mood regulation behaviors, these approaches are ill-suited to detect smaller temporal variations in depressed mood. We address this issue by introducing a zero-shot personalization of large speech foundation models. Compared with other personalization strategies, our work does not require labeled speech samples for enrollment. Instead, the approach makes use of adapters conditioned on subject-specific metadata. On a longitudinal dataset, we show that the method improves performance compared with a set of suitable baselines. Finally, applying our personalization strategy improves individual-level fairness

    Ecology & computer audition: applications of audio technology to monitor organisms and environment

    Get PDF
    Among the 17 Sustainable Development Goals (SDGs) proposed within the 2030 Agenda and adopted by all the United Nations member states, the 13th SDG is a call for action to combat climate change. Moreover, SDGs 14 and 15 claim the protection and conservation of life below water and life on land, respectively. In this work, we provide a literature-founded overview of application areas, in which computer audition – a powerful but in this context so far hardly considered technology, combining audio signal processing and machine intelligence – is employed to monitor our ecosystem with the potential to identify ecologically critical processes or states. We distinguish between applications related to organisms, such as species richness analysis and plant health monitoring, and applications related to the environment, such as melting ice monitoring or wildfire detection. This work positions computer audition in relation to alternative approaches by discussing methodological strengths and limitations, as well as ethical aspects. We conclude with an urgent call to action to the research community for a greater involvement of audio intelligence methodology in future ecosystem monitoring approaches

    Audio self-supervised learning: a survey

    Get PDF
    Inspired by the humans' cognitive ability to generalise knowledge and skills, Self-Supervised Learning (SSL) targets at discovering general representations from large-scale data without requiring human annotations, which is an expensive and time consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarising the knowledge in audio SSL are currently missing. To fill this gap, in the present work, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarise the empirical works that exploit the audio modality in multi-modal SSL frameworks, and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions on the development of audio SSL

    The MuSe 2022 Multimodal Sentiment Analysis Challenge: Humor, Emotional Reactions, and Stress

    Full text link
    The Multimodal Sentiment Analysis Challenge (MuSe) 2022 is dedicated to multimodal sentiment and emotion recognition. For this year's challenge, we feature three datasets: (i) the Passau Spontaneous Football Coach Humor (Passau-SFCH) dataset that contains audio-visual recordings of German football coaches, labelled for the presence of humour; (ii) the Hume-Reaction dataset in which reactions of individuals to emotional stimuli have been annotated with respect to seven emotional expression intensities, and (iii) the Ulm-Trier Social Stress Test (Ulm-TSST) dataset comprising of audio-visual data labelled with continuous emotion values (arousal and valence) of people in stressful dispositions. Using the introduced datasets, MuSe 2022 2022 addresses three contemporary affective computing problems: in the Humor Detection Sub-Challenge (MuSe-Humor), spontaneous humour has to be recognised; in the Emotional Reactions Sub-Challenge (MuSe-Reaction), seven fine-grained `in-the-wild' emotions have to be predicted; and in the Emotional Stress Sub-Challenge (MuSe-Stress), a continuous prediction of stressed emotion values is featured. The challenge is designed to attract different research communities, encouraging a fusion of their disciplines. Mainly, MuSe 2022 targets the communities of audio-visual emotion recognition, health informatics, and symbolic sentiment analysis. This baseline paper describes the datasets as well as the feature sets extracted from them. A recurrent neural network with LSTM cells is used to set competitive baseline results on the test partitions for each sub-challenge. We report an Area Under the Curve (AUC) of .8480 for MuSe-Humor; .2801 mean (from 7-classes) Pearson's Correlations Coefficient for MuSe-Reaction, as well as .4931 Concordance Correlation Coefficient (CCC) and .4761 for valence and arousal in MuSe-Stress, respectively.Comment: Preliminary baseline paper for the 3rd Multimodal Sentiment Analysis Challenge (MuSe) 2022, a full-day workshop at ACM Multimedia 202

    Personalised depression forecasting using mobile sensor data and ecological momentary assessment

    Get PDF
    Introduction Digital health interventions are an effective way to treat depression, but it is still largely unclear how patients’ individual symptoms evolve dynamically during such treatments. Data-driven forecasts of depressive symptoms would allow to greatly improve the personalisation of treatments. In current forecasting approaches, models are often trained on an entire population, resulting in a general model that works overall, but does not translate well to each individual in clinically heterogeneous, real-world populations. Model fairness across patient subgroups is also frequently overlooked. Personalised models tailored to the individual patient may therefore be promising. Methods We investigate different personalisation strategies using transfer learning, subgroup models, as well as subject-dependent standardisation on a newly-collected, longitudinal dataset of depression patients undergoing treatment with a digital intervention (N=65 patients recruited). Both passive mobile sensor data as well as ecological momentary assessments were available for modelling. We evaluated the models’ ability to predict symptoms of depression (Patient Health Questionnaire-2; PHQ-2) at the end of each day, and to forecast symptoms of the next day. Results In our experiments, we achieve a best mean-absolute-error (MAE) of 0.801 (25% improvement) for predicting PHQ-2 values at the end of the day with subject-dependent standardisation compared to a non-personalised baseline (MAE=1.062). For one day ahead-forecasting, we can improve the baseline of 1.539 by 12% to a MAE of 1.349 using a transfer learning approach with shared common layers. In addition, personalisation leads to fairer models at group-level. Discussion Our results suggest that personalisation using subject-dependent standardisation and transfer learning can improve predictions and forecasts, respectively, of depressive symptoms in participants of a digital depression intervention. We discuss technical and clinical limitations of this approach, avenues for future investigations, and how personalised machine learning architectures may be implemented to improve existing digital interventions for depression

    Toward Detecting and Addressing Corner Cases in Deep Learning Based Medical Image Segmentation

    Get PDF
    Translating machine learning research into clinical practice has several challenges. In this paper, we identify some critical issues in translating research to clinical practice in the context of medical image segmentation and propose strategies to systematically address these challenges. Specifically, we focus on cases where the model yields erroneous segmentation, which we define as corner cases. One of the standard metrics used for reporting the performance of medical image segmentation algorithms is the average Dice score across all patients. We have discovered that this aggregate reporting has the inherent drawback that the corner cases where the algorithm or model has erroneous performance or very low metrics go unnoticed. Due to this reporting, models that report superior performance could end up producing completely erroneous results, or even anatomically impossible results in a few challenging cases, albeit without being noticed.We have demonstrated how corner cases go unnoticed using the Magnetic Resonance (MR) cardiac image segmentation task of the Automated Cardiac Diagnosis Challenge (ACDC) challenge. To counter this drawback, we propose a framework that helps to identify and report corner cases. Further, we propose a novel balanced checkpointing scheme capable of finding a solution that has superior performance even on these corner cases. Our proposed scheme leads to an improvement of 44.6% for LV, 46.1% for RV and 38.1% for the Myocardium on our identified corner case in the ACDC segmentation challenge. Further, we establish the generalisability of our proposed framework by also demonstrating its applicability in the context of chest X-ray lung segmentation. This framework has broader applications across multiple deep learning tasks even beyond medical image segmentation

    Computational charisma—A brick by brick blueprint for building charismatic artificial intelligence

    Get PDF
    Charisma is considered as one's ability to attract and potentially influence others. Clearly, there can be considerable interest from an artificial intelligence's (AI) perspective to provide it with such skill. Beyond, a plethora of use cases opens up for computational measurement of human charisma, such as for tutoring humans in the acquisition of charisma, mediating human-to-human conversation, or identifying charismatic individuals in big social data. While charisma is a subject of research in its own right, a number of models exist that base it on various “pillars,” that is, dimensions, often following the idea that charisma is given if someone could and would help others. Examples of such pillars, therefore, include influence (could help) and affability (would help) in scientific studies, or power (could help), presence, and warmth (both would help) as a popular concept. Modeling high levels in these dimensions, i. e., high influence and high affability, or high power, presence, and warmth for charismatic AI of the future, e. g., for humanoid robots or virtual agents, seems accomplishable. Beyond, also automatic measurement appears quite feasible with the recent advances in the related fields of Affective Computing and Social Signal Processing. Here, we therefore present a brick by brick blueprint for building machines that can appear charismatic, but also analyse the charisma of others. We first approach the topic very broadly and discuss how the foundation of charisma is defined from a psychological perspective. Throughout the manuscript, the building blocks (bricks) then become more specific and provide concrete groundwork for capturing charisma through artificial intelligence (AI). Following the introduction of the concept of charisma, we switch to charisma in spoken language as an exemplary modality that is essential for human-human and human-computer conversations. The computational perspective then deals with the recognition and generation of charismatic behavior by AI. This includes an overview of the state of play in the field and the aforementioned blueprint. We then list exemplary use cases of computational charismatic skills. The building blocks of application domains and ethics conclude the article

    The MuSe 2023 Multimodal Sentiment Analysis Challenge: Mimicked Emotions, Cross-Cultural Humour, and Personalisation

    Full text link
    The MuSe 2023 is a set of shared tasks addressing three different contemporary multimodal affect and sentiment analysis problems: In the Mimicked Emotions Sub-Challenge (MuSe-Mimic), participants predict three continuous emotion targets. This sub-challenge utilises the Hume-Vidmimic dataset comprising of user-generated videos. For the Cross-Cultural Humour Detection Sub-Challenge (MuSe-Humour), an extension of the Passau Spontaneous Football Coach Humour (Passau-SFCH) dataset is provided. Participants predict the presence of spontaneous humour in a cross-cultural setting. The Personalisation Sub-Challenge (MuSe-Personalisation) is based on the Ulm-Trier Social Stress Test (Ulm-TSST) dataset, featuring recordings of subjects in a stressed situation. Here, arousal and valence signals are to be predicted, whereas parts of the test labels are made available in order to facilitate personalisation. MuSe 2023 seeks to bring together a broad audience from different research communities such as audio-visual emotion recognition, natural language processing, signal processing, and health informatics. In this baseline paper, we introduce the datasets, sub-challenges, and provided feature sets. As a competitive baseline system, a Gated Recurrent Unit (GRU)-Recurrent Neural Network (RNN) is employed. On the respective sub-challenges' test datasets, it achieves a mean (across three continuous intensity targets) Pearson's Correlation Coefficient of .4727 for MuSe-Mimic, an Area Under the Curve (AUC) value of .8310 for MuSe-Humor and Concordance Correlation Coefficient (CCC) values of .7482 for arousal and .7827 for valence in the MuSe-Personalisation sub-challenge.Comment: Baseline paper for the 4th Multimodal Sentiment Analysis Challenge (MuSe) 2023, a workshop at ACM Multimedia 202

    HEAR4Health: a blueprint for making computer audition a staple of modern healthcare

    Get PDF
    Recent years have seen a rapid increase in digital medicine research in an attempt to transform traditional healthcare systems to their modern, intelligent, and versatile equivalents that are adequately equipped to tackle contemporary challenges. This has led to a wave of applications that utilise AI technologies; first and foremost in the fields of medical imaging, but also in the use of wearables and other intelligent sensors. In comparison, computer audition can be seen to be lagging behind, at least in terms of commercial interest. Yet, audition has long been a staple assistant for medical practitioners, with the stethoscope being the quintessential sign of doctors around the world. Transforming this traditional technology with the use of AI entails a set of unique challenges. We categorise the advances needed in four key pillars: Hear, corresponding to the cornerstone technologies needed to analyse auditory signals in real-life conditions; Earlier, for the advances needed in computational and data efficiency; Attentively, for accounting to individual differences and handling the longitudinal nature of medical data; and, finally, Responsibly, for ensuring compliance to the ethical standards accorded to the field of medicine
    corecore