10 research outputs found
Continuous Emotion Prediction from Speech: Modelling Ambiguity in Emotion
There is growing interest in emotion research to model perceived emotion labelled as intensities along the affect dimensions such as arousal and valence. These labels are typically obtained from multiple annotators who would have their individualistic perceptions of emotional speech. Consequently, emotion prediction models that incorporate variation in individual perceptions as ambiguity in the emotional state would be more realistic. This thesis develops the modelling framework necessary to achieve continuous prediction of ambiguous emotional states from speech. Besides, emotion labels, feature space distribution and encoding are an integral part of the prediction system. The first part of this thesis examines the limitations of current low-level feature distributions and their minimalistic statistical descriptions. Specifically, front-end paralinguistic acoustic features are reflective of speech production mechanisms. However, discriminatively learnt features have frequently outperformed acoustic features in emotion prediction tasks, but provide no insights into the physical significance of these features. One of the contributions of this thesis is the development of a framework that can modify the acoustic feature representation based on emotion label information. Another investigation in this thesis indicates that emotion perception is language-dependent and in turn, helped develop a framework for cross-language emotion prediction. Furthermore, this investigation supported the hypothesis that emotion perception is highly individualistic and is better modelled as a distribution rather than a point estimate to encode information about the ambiguity in the perceived emotion. Following this observation, the thesis proposes measures to quantify the appropriateness of distribution types in modelling ambiguity in dimensional emotion labels which are then employed to compare well-known bounded parametric distributions. These analyses led to the conclusion that the beta distribution was the most appropriate parametric model of ambiguity in emotion labels. Finally, the thesis focuses on developing a deep learning framework for continuous emotion prediction as a temporal series of beta distributions, examining various parameterizations of the beta distributions as well as loss functions. Furthermore, distribution over the parameter spaces is examined and priors from kernel density estimation are employed to shape the posteriors over the parameter space which significantly improved valence ambiguity predictions. The proposed frameworks and methods have been extensively evaluated on multiple state of-the-art databases and the results demonstrate both the viability of predicting ambiguous emotion states and the validity of the proposed systems
Recommended from our members
Active inference: building a new bridge between control theory and embodied cognitive science
The application of Bayesian techniques to the study and computational modelling of biological systems is one of the most remarkable advances in the natural and cognitive sciences over the last 50 years. More recently, it has been proposed that Bayesian frameworks are not only useful for building descriptive models of biological functions, but that living systems themselves can be seen as Bayesian (inference) machines. On this view, the statistical tools more traditionally used to account for data in biology, neuroscience and psychology, are now used to model the mechanisms underlying functions and properties of living systems as if the systems themselves were the ones“calculating”those probabilities following Bayesian inference schemes. The free energy principle (FEP) is a framework proposed in light of this paradigm shift, advocating the minimisation of variational free energy, a proxy for sensory surprisal, as a general computational principle for biological systems. More intuitively and under some simplifying assumptions,the minimisation of variational free energy reduces,for an agent,to the minimisation of prediction errors on sensory input. Initially proposed as a candidate unifying theory of brain functioning, the FEP was later extended to encompass hypotheses on the origins of life, and is nowadays discussed in the cognitive science community for its possible implications for theories of the mind. In particular,one of the most popular process theories derived from the FEP,active inference,describes a biologically plausible algorithmic implementation of this principle with several repercussions on our understanding of cognition. In this thesis, I will focus on the role of this process theory for action and perception. In active inference, the two of them are combined in a closed sensorimotor loopasco-dependent processes of minimisation of a single loss function,variational free energy, with respect to different sets of variables. Building on this, I will suggest that some of the core ideas of active inference are best seen in terms of enactive, embodied, extended and embedded (4E) theories, in contrast to the majority of the literature emphasising its apparent connections to more traditional, computational, accounts of the mind. In particular, I will develop this argument by focusing on some proposals central to 4E approaches: (a) the non-brain-centric nature of cognitive processes,(b)the lack of explicit representations of the world,(c)the coupling of agent-environment systems and (d) the necessity of real-time feedback signals from the environment. Under the FEP formulation, I will present a series of case studies with mainly two objectives in mind: 1) to conceptually analyse and reframe these 4E ideas in the context of active inference, arguing for the advantages of their formalisation in a more general probabilistic (Bayesian) framework and, 2) to present new mathematical models and agent-based implementations of some of the conceptual connections between Bayesian inference frameworks and 4E proposals, largely missing in the literature
Perception and Navigation in Autonomous Systems in the Era of Learning: A Survey
Autonomous systems possess the features of inferring their own state,
understanding their surroundings, and performing autonomous navigation. With
the applications of learning systems, like deep learning and reinforcement
learning, the visual-based self-state estimation, environment perception and
navigation capabilities of autonomous systems have been efficiently addressed,
and many new learning-based algorithms have surfaced with respect to autonomous
visual perception and navigation. In this review, we focus on the applications
of learning-based monocular approaches in ego-motion perception, environment
perception and navigation in autonomous systems, which is different from
previous reviews that discussed traditional methods. First, we delineate the
shortcomings of existing classical visual simultaneous localization and mapping
(vSLAM) solutions, which demonstrate the necessity to integrate deep learning
techniques. Second, we review the visual-based environmental perception and
understanding methods based on deep learning, including deep learning-based
monocular depth estimation, monocular ego-motion prediction, image enhancement,
object detection, semantic segmentation, and their combinations with
traditional vSLAM frameworks. Then, we focus on the visual navigation based on
learning systems, mainly including reinforcement learning and deep
reinforcement learning. Finally, we examine several challenges and promising
directions discussed and concluded in related research of learning systems in
the era of computer science and robotics.Comment: This paper has been accepted by IEEE TNNL
Deep Neural Networks and Data for Automated Driving
This open access book brings together the latest developments from industry and research on automated driving and artificial intelligence. Environment perception for highly automated driving heavily employs deep neural networks, facing many challenges. How much data do we need for training and testing? How to use synthetic data to save labeling costs for training? How do we increase robustness and decrease memory usage? For inevitably poor conditions: How do we know that the network is uncertain about its decisions? Can we understand a bit more about what actually happens inside neural networks? This leads to a very practical problem particularly for DNNs employed in automated driving: What are useful validation techniques and how about safety? This book unites the views from both academia and industry, where computer vision and machine learning meet environment perception for highly automated driving. Naturally, aspects of data, robustness, uncertainty quantification, and, last but not least, safety are at the core of it. This book is unique: In its first part, an extended survey of all the relevant aspects is provided. The second part contains the detailed technical elaboration of the various questions mentioned above
Speaker Diarization
DisertaÄŤnĂ práce se zaměřuje na tĂ©ma diarizace Ĺ™eÄŤnĂkĹŻ, coĹľ je Ăşloha zpracovánĂ Ĺ™eÄŤi typicky charakterizovaná otázkou "Kdo kdy mluvĂ?". Práce se takĂ© zabĂ˝vá souvisejĂcĂ Ăşlohou detekce pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je velmi relevantnĂ pro diarizaci.
Teoretická část práce poskytuje pĹ™ehled existujĂcĂch metod diarizace Ĺ™eÄŤnĂkĹŻ, a to jak tÄ›ch offline, tak online, a pĹ™ibliĹľuje nÄ›kolik problematickĂ˝ch oblastĂ, kterĂ© byly identifikovány v ranĂ© fázi autorÄŤina vĂ˝zkumu. V práci je takĂ© pĹ™edloĹľeno rozsáhlĂ© srovnánĂ existujĂcĂch systĂ©mĹŻ se zaměřenĂm na jejich uvádÄ›nĂ© vĂ˝sledky. Jedna kapitola se takĂ© zaměřuje na tĂ©ma pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi a na metody jejĂ detekce.
Experimentálnà část práce pĹ™edkládá praktickĂ© vĂ˝stupy, kterĂ˝ch bylo dosaĹľeno. Experimenty s diarizacĂ se zaměřovaly zejmĂ©na na online systĂ©m zaloĹľenĂ˝ na GMM a na i-vektorovĂ˝ systĂ©m, kterĂ˝ mÄ›l offline i online varianty. ZávÄ›reÄŤná sekce experimentĹŻ takĂ© pĹ™ibliĹľuje novÄ› navrĹľenou metodu pro detekci pĹ™ekrĂ˝vajĂcĂ se Ĺ™eÄŤi, která je zaloĹľena na konvoluÄŤnĂ neuronovĂ© sĂti.ObhájenoThe thesis focuses on the topic of speaker diarization, a speech processing task that is commonly characterized as the question "Who speaks when?". It also addresses the related task of overlapping speech detection, which is very relevant for diarization.
The theoretical part of the thesis provides an overview of existing diarization approaches, both offline and online, and discusses some of the problematic areas which were identified in early stages of the author's research. The thesis also includes an extensive comparison of existing diarization systems, with focus on their reported performance. One chapter is also dedicated to the topic of overlapping speech and the methods of its detection.
The experimental part of the thesis then presents the work which has been done on speaker diarization, which was focused mostly on a GMM-based online diarization system and an i-vector based system with both offline and online variants. The final section also details a newly proposed approach for detecting overlapping speech using a convolutional neural network
Memory Models for Incremental Learning Architectures
Losing V. Memory Models for Incremental Learning Architectures. Bielefeld: Universität Bielefeld; 2019.Technological advancement leads constantly to an exponential growth of generated data in basically every domain, drastically increasing the burden of data storage and maintenance. Most of the data is instantaneously extracted and available in form of endless streams that contain the most current information. Machine learning methods constitute one fundamental way of processing such data in an automatic way, as they generate models that capture the processes behind the data. They are omnipresent in our everyday life as their applications include personalized advertising, recommendations, fraud detection, surveillance, credit ratings, high-speed trading and smart-home devices. Thereby, batch learning, denoting the offline construction of a static model based on large datasets, is the predominant scheme. However, it is increasingly unfit to deal with the accumulating masses of data in given time and in particularly its static nature cannot handle changing patterns. In contrast, incremental learning constitutes one attractive alternative that is a very natural fit for the current demands. Its dynamic adaptation allows continuous processing of data streams, without the necessity to store all data from the past, and results in always up-to-date models, even able to perform in non-stationary environments. In this thesis, we will tackle crucial research questions in the domain of incremental learning by contributing new algorithms or significantly extending existing ones. Thereby, we consider stationary and non-stationary environments and present multiple real-world applications that showcase merits of the methods as well as their versatility. The main contributions are the following:
One novel approach that addresses the question of how to extend a model for prototype-based algorithms based on cost minimization.
We propose local split-time prediction for incremental decision trees to mitigate the trade-off between adaptation speed versus model complexity and run time.
An extensive survey of the strengths and weaknesses of state-of-the-art methods that provides guidance for choosing a suitable algorithm for a given task.
One new approach to extract valuable information about the type of change in a dataset.
We contribute a biologically inspired architecture, able to handle different types of drift using dedicated memories that are kept consistent.
Application of the novel methods within three diverse real-world tasks, highlighting their robustness and versatility.
Investigation of personalized online models in the context of two real-world applications
Multisensory and sensorimotor origins of the sense of self
Cognitive neuroscience has increasingly focused on studying the subject, i.e. the self, of conscious experience. In order to be the subject of an experience, we generally experience owning a physical body, being located within that body, and being able to distinguish the body and its actions from others. These pre-reflective experiences are based on brain mechanisms of multisensory and sensorimotor integration. In this thesis I investigated how our sense of self, in particular the senses of body ownership and of agency, depend on multimodal bodily signals. I achieved this by using approaches developed by cognitive neuroscience to study how the sense of self relates to the processing of bodily signals: creating bodily illusions with multisensory conflicts through the use of virtual reality and robotics. The first part of this thesis describes the investigation of the sense of body ownership in healthy subjects and in spinal cord injury patients, achieved by inducing conflicts between tactile information and visual feedback. The research presented in the second part of the thesis is centered on the experience of self-touch. There, I have first investigated how the manipulation of reference frames influences the perception of the illusion of self-touch, and second, how active self-touch influences the sense of body ownership. Lastly, in the third part of the thesis, I investigated how experimentally induced multisensory and sensorimotor conflicts perturb the sense of self in healthy subjects and induce experiences similar to certain symptoms observed in neurological and psychiatric disorders. I show that particular conflicts between bodily signals not only affect body perception and sense of agency for motor actions but also propagate to higher levels and influence even the sense of agency for mental representations in healthy subjects. Finally, I discuss my results and their relation to existing knowledge on bodily self-consciousness and position them in a broader picture of our current understanding of the self
Haptics: Science, Technology, Applications
This open access book constitutes the proceedings of the 12th International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, EuroHaptics 2020, held in Leiden, The Netherlands, in September 2020. The 60 papers presented in this volume were carefully reviewed and selected from 111 submissions. The were organized in topical sections on haptic science, haptic technology, and haptic applications. This year's focus is on accessibility
Haptics: Science, Technology, Applications
This open access book constitutes the proceedings of the 13th International Conference on Human Haptic Sensing and Touch Enabled Computer Applications, EuroHaptics 2022, held in Hamburg, Germany, in May 2022. The 36 regular papers included in this book were carefully reviewed and selected from 129 submissions. They were organized in topical sections as follows: haptic science; haptic technology; and haptic applications