6,794 research outputs found
Automatic Prediction Of Small Group Performance In Information Sharing Tasks
In this paper, we describe a novel approach, based on Markov jump processes,
to model small group conversational dynamics and to predict small group
performance. More precisely, we estimate conversational events such as turn
taking, backchannels, turn-transitions at the micro-level (1 minute windows)
and then we bridge the micro-level behavior and the macro-level performance. We
tested our approach with a cooperative task, the Information Sharing task, and
we verified the relevance of micro- level interaction dynamics in determining a
good group performance (e.g. higher speaking turns rate and more balanced
participation among group members).Comment: Presented at Collective Intelligence conference, 2012
(arXiv:1204.2991
The application of manifold based visual speech units for visual speech recognition
This dissertation presents a new learning-based representation that is referred to as a Visual
Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise.
The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech
Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of
interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain.
The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive
visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG-
4) viseme models. Two experimental results indicate that:
1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the
recognition rate for the standard set of MPEG-4 visemes was only 62-72%.
2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes
Recommended from our members
Real-time decoding of question-and-answer speech dialogue using human cortical activity.
Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
- âŚ