Search CORE

2,132 research outputs found

Toward movement-invariant automatic lip-reading and speech recognition

Author: Buesching Dieter
Duchnowski Paul
Hunke Martin
Meier Uwe
Waibel Alexander
Publication venue
Publication date: 02/08/2007
Field of study

Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

Author: Debnath Saswati
González-Crespo Rubén
Namasudra Suyel
Roy Pinki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2023
Field of study

Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition

Re-UNIR

Taking the bite out of automated naming of characters in TV video

Author: Everingham M.
Sivic J.
Zisserman A.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

Oxford University Research Archive

White Rose Research Online

Automatic Visual Speech Recognition

Author: Alin Chiţu
Léon J.M. Rothkrantz
Publication venue: 'IntechOpen'
Publication date: 03/03/2012
Field of study

Intelligent SystemsElectrical Engineering, Mathematics and Computer Scienc

IntechOpen

Crossref

TU Delft Repository

We can hear you with Wi-Fi!

Author: NI Lionel M.
WANG Guanhua
WU Kaishun
ZHOU Zimu
ZOU Yongpan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Institutional Knowledge at Singapore Management University