224 research outputs found
Chapter From the Lab to the Real World: Affect Recognition Using Multiple Cues and Modalities
Interdisciplinary concept of dissipative soliton is unfolded in connection with ultrafast fibre lasers. The different mode-locking techniques as well as experimental realizations of dissipative soliton fibre lasers are surveyed briefly with an emphasis on their energy scalability. Basic topics of the dissipative soliton theory are elucidated in connection with concepts of energy scalability and stability. It is shown that the parametric space of dissipative soliton has reduced dimension and comparatively simple structure that simplifies the analysis and optimization of ultrafast fibre lasers. The main destabilization scenarios are described and the limits of energy scalability are connected with impact of optical turbulence and stimulated Raman scattering. The fast and slow dynamics of vector dissipative solitons are exposed
Toward an affect-sensitive multimodal human-computer interaction
The ability to recognize affective states of a person... This paper argues that next-generation human-computer interaction (HCI) designs need to include the essence of emotional intelligence -- the ability to recognize a user's affective states -- in order to become more human-like, more effective, and more efficient. Affective arousal modulates all nonverbal communicative cues (facial expressions, body movements, and vocal and physiological reactions). In a face-to-face interaction, humans detect and interpret those interactive signals of their communicator with little or no effort. Yet design and development of an automated system that accomplishes these tasks is rather difficult. This paper surveys the past work in solving these problems by a computer and provides a set of recommendations for developing the first part of an intelligent multimodal HCI -- an automatic personalized analyzer of a user's nonverbal affective feedback
Machine Understanding of Human Behavior
A widely accepted prediction is that computing will move to the background, weaving itself into the fabric of our everyday living spaces and projecting the human user into the foreground. If this prediction is to come true, then next generation computing, which we will call human computing, should be about anticipatory user interfaces that should be human-centered, built for humans based on human models. They should transcend the traditional keyboard and mouse to include natural, human-like interactive functions including understanding and emulating certain human behaviors such as affective and social signaling. This article discusses a number of components of human behavior, how they might be integrated into computers, and how far we are from realizing the front end of human computing, that is, how far are we from enabling computers to understand human behavior
Audio-Visual Speaker Verification via Joint Cross-Attention
Speaker verification has been widely explored using speech signals, which has
shown significant improvement using deep models. Recently, there has been a
surge in exploring faces and voices as they can offer more complementary and
comprehensive information than relying only on a single modality of speech
signals. Though current methods in the literature on the fusion of faces and
voices have shown improvement over that of individual face or voice modalities,
the potential of audio-visual fusion is not fully explored for speaker
verification. Most of the existing methods based on audio-visual fusion either
rely on score-level fusion or simple feature concatenation. In this work, we
have explored cross-modal joint attention to fully leverage the inter-modal
complementary information and the intra-modal information for speaker
verification. Specifically, we estimate the cross-attention weights based on
the correlation between the joint feature presentation and that of the
individual feature representations in order to effectively capture both
intra-modal as well inter-modal relationships among the faces and voices. We
have shown that efficiently leveraging the intra- and inter-modal relationships
significantly improves the performance of audio-visual fusion for speaker
verification. The performance of the proposed approach has been evaluated on
the Voxceleb1 dataset. Results show that the proposed approach can
significantly outperform the state-of-the-art methods of audio-visual fusion
for speaker verification
Review of constraints on vision-based gesture recognition for human–computer interaction
The ability of computers to recognise hand gestures visually is essential for progress in human-computer interaction. Gesture recognition has applications ranging from sign language to medical assistance to virtual reality. However, gesture recognition is extremely challenging not only because of its diverse contexts, multiple interpretations, and spatio-temporal variations but also because of the complex non-rigid properties of the hand. This study surveys major constraints on vision-based gesture recognition occurring in detection and pre-processing, representation and feature extraction, and recognition. Current challenges are explored in detail
- …