Search CORE

6,794 research outputs found

Automatic Prediction Of Small Group Performance In Information Sharing Tasks

Author: Dong Wen
Lepri Bruno
Pentland Alex
Publication venue
Publication date: 01/01/2012
Field of study

In this paper, we describe a novel approach, based on Markov jump processes, to model small group conversational dynamics and to predict small group performance. More precisely, we estimate conversational events such as turn taking, backchannels, turn-transitions at the micro-level (1 minute windows) and then we bridge the micro-level behavior and the macro-level performance. We tested our approach with a cooperative task, the Information Sharing task, and we verified the relevance of micro- level interaction dynamics in determining a good group performance (e.g. higher speaking turns rate and more balanced participation among group members).Comment: Presented at Collective Intelligence conference, 2012 (arXiv:1204.2991

arXiv.org e-Print Archive

DSpace@MIT

Archivio della ricerca - Fondazione Bruno Kessler

The application of manifold based visual speech units for visual speech recognition

Author: Yu Dahai
Publication venue: Dublin City University. School of Computing
Publication date: 01/11/2008
Field of study

This dissertation presents a new learning-based representation that is referred to as a Visual Speech Unit for visual speech recognition (VSR). The automated recognition of human speech using only features from the visual domain has become a significant research topic that plays an essential role in the development of many multimedia systems such as audio visual speech recognition(AVSR), mobile phone applications, human-computer interaction (HCI) and sign language recognition. The inclusion of the lip visual information is opportune since it can improve the overall accuracy of audio or hand recognition algorithms especially when such systems are operated in environments characterized by a high level of acoustic noise. The main contribution of the work presented in this thesis is located in the development of a new learning-based representation that is referred to as Visual Speech Unit for Visual Speech Recognition (VSR). The main components of the developed Visual Speech Recognition system are applied to: (a) segment the mouth region of interest, (b) extract the visual features from the real time input video image and (c) to identify the visual speech units. The major difficulty associated with the VSR systems resides in the identification of the smallest elements contained in the image sequences that represent the lip movements in the visual domain. The Visual Speech Unit concept as proposed represents an extension of the standard viseme model that is currently applied for VSR. The VSU model augments the standard viseme approach by including in this new representation not only the data associated with the articulation of the visemes but also the transitory information between consecutive visemes. A large section of this thesis has been dedicated to analysis the performance of the new visual speech unit model when compared with that attained for standard (MPEG- 4) viseme models. Two experimental results indicate that: 1. The developed VSR system achieved 80-90% correct recognition when the system has been applied to the identification of 60 classes of VSUs, while the recognition rate for the standard set of MPEG-4 visemes was only 62-72%. 2. 15 words are identified when VSU and viseme are employed as the visual speech element. The accuracy rate for word recognition based on VSUs is 7%-12% higher than the accuracy rate based on visemes

Irish Universities

DCU Online Research Access Service

Recommended from our members

Neural network-based movie dialogue detection

Author: Benetos E.
Kotropoulos C.
Kotti M.
Publication venue
Publication date: 01/01/2007
Field of study

City Research Online

Spiral - Imperial College Digital Repository

Recommended from our members

Real-time decoding of question-and-answer speech dialogue using human cortical activity.

Author: Chang Edward F
Leonard Matthew K
Makin Joseph G
Moses David A
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

Natural communication often occurs in dialogue, differentially engaging auditory and sensorimotor brain regions during listening and speaking. However, previous attempts to decode speech directly from the human brain typically consider listening or speaking tasks in isolation. Here, human participants listened to questions and responded aloud with answers while we used high-density electrocorticography (ECoG) recordings to detect when they heard or said an utterance and to then decode the utterance's identity. Because certain answers were only plausible responses to certain questions, we could dynamically update the prior probabilities of each answer using the decoded question likelihoods as context. We decode produced and perceived utterances with accuracy rates as high as 61% and 76%, respectively (chance is 7% and 20%). Contextual integration of decoded question likelihoods significantly improves answer decoding. These results demonstrate real-time decoding of speech in an interactive, conversational setting, which has important implications for patients who are unable to communicate

eScholarship - University of California

Improving fusion of surveillance images in sensor networks using independent component analysis

Author: Bull DR
Canagarajah CN
Cvejic N
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/08/2007
Field of study

Explore Bristol Research

Single-channel source separation using non-negative matrix factorization

Author: Schmidt Mikkel Nørgaard
Publication venue: Technical University of Denmark, DTU Informatics, Building 321
Publication date: 01/01/2009
Field of study

Online Research Database In Technology

Speech Recognition

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes

Directory of Open Access Books (DOAB)