Search CORE

2,951 research outputs found

Using multiple visual tandem streams in audio-visual speech recognition

Author: Erdogan Hakan
Erdoğan Hakan
Topkaya İbrahim Saygın
Topkaya Ibrahim Saygin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

The method which is called the "tandem approach" in speech recognition has been shown to increase performance by using classifier posterior probabilities as observations in a hidden Markov model. We study the effect of using visual tandem features in audio-visual speech recognition using a novel setup which uses multiple classifiers to obtain multiple visual tandem features. We adopt the approach of multi-stream hidden Markov models where visual tandem features from two different classifiers are considered as additional streams in the model. It is shown in our experiments that using multiple visual tandem features improve the recognition accuracy in various noise conditions. In addition, in order to handle asynchrony between audio and visual observations, we employ coupled hidden Markov models and obtain improved performance as compared to the synchronous model

CiteSeerX

Sabanci University Research Database

Audio Visual Speech Recognition and Segmentation Based on DBN Models

Author: Dongmei Jiang
Guoyun Lv
Hichem Sahli
Ilse Ravyse
Rongchun Zhao
Xiaoyue Jiang
Yanning Zhang
Publication venue: 'IntechOpen'
Publication date: 01/06/2007
Field of study

IntechOpen

CiteSeerX

Articulatory features for robust visual speech recognition

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2004
Field of study

Crossref

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Examining the Difference Between Asynchronous and Synchronous Training

Author: Craig Colby R
Publication venue: 'Purdue University (bepress)'
Publication date: 01/01/2016
Field of study

For my project, we chose to do a thesis so that it would better help me out in the future in the case I wanted to get my PhD. My thesis so far has been to develop software that will help POD sites better be able to train their volunteers in the case of an emergency. We have already collected some data for our research from a test POD site that was constructed. We took data on the amount of time it took each volunteer to get an individual actor through the line depending on whether they learned via teacher or by my software. The data helped to prove how beneficial teaching via software could be, due to the fact there wasn’t any missing information, and there was a greater retention rate. Currently I just work at Lowes as a customer service administrator, mostly so I get to interact with customers every day to better understand how to communicate and give the information I would have on my software. The general are that my research has been taken so far is in emergency preparedness, and I would like to continue heading this direction until other opportunities arise

Purdue E-Pubs

Recommended from our members

uC: Ubiquitous Collaboration Platform for Multimodal Team Interaction Support

Author: Carstens Deborah
Converse Patrick D
Fiore Stephen M
Gurbuz Sabri
Kepuska Veton Z
Metcalf David
Rodriguez Walter
Publication venue: CSUSB ScholarWorks
Publication date: 01/01/2008
Field of study

A human-centered computing platform that improves teamwork and transforms the “human- computer interaction experience” for distributed teams is presented. This Ubiquitous Collaboration, or uC (“you see”), platform\u27s objective is to transform distributed teamwork (i.e., work occurring when teams of workers and learners are geographically dispersed and often interacting at different times). It achieves this goal through a multimodal team interaction interface realized through a reconfigurable open architecture. The approach taken is to integrate: (1) an intuitive speech- and video-centric multi-modal interface to augment more conventional methods (e.g., mouse, stylus and touch), (2) an open and reconfigurable architecture supporting information gathering, and (3) a machine intelligent approach to analysis and management of heterogeneous live and stored sensor data to support collaboration. The system will transform how teams of people interact with computers by drawing on both the virtual and physical environment

CSUSB ScholarWorks

Overcoming asynchrony in Audio-Visual Speech Recognition

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Multi-Scale Attention for Audio Question Answering

Author: Hu Di
Li Guangyao
Xu Yixin
Publication venue
Publication date: 29/05/2023
Field of study

Audio question answering (AQA), acting as a widely used proxy task to explore scene understanding, has got more attention. The AQA is challenging for it requires comprehensive temporal reasoning from different scales' events of an audio scene. However, existing methods mostly extend the structures of visual question answering task to audio ones in a simple pattern but may not perform well when perceiving a fine-grained audio scene. To this end, we present a Multi-scale Window Attention Fusion Model (MWAFM) consisting of an asynchronous hybrid attention module and a multi-scale window attention module. The former is designed to aggregate unimodal and cross-modal temporal contexts, while the latter captures sound events of varying lengths and their temporal dependencies for a more comprehensive understanding. Extensive experiments are conducted to demonstrate that the proposed MWAFM can effectively explore temporal information to facilitate AQA in the fine-grained scene.Code: https://github.com/GeWu-Lab/MWAFMComment: Accepted by InterSpeech 202

arXiv.org e-Print Archive