Search CORE

161 research outputs found

Automatic Speechreading with Application to Speaker Verification

Author: Broun Charles C.
Clements Mark A.
Mersereau Russell M.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 13/05/2002
Field of study

Speech not only conveys the linguistic information, but also characterizes the talker\u27s identify and therefore can be used in personal authentication. While most of the speech information is contained in the acoustic channel, the lip movement during speech production also provides useful information. In this paper we investigate the effectiveness of visual speech features in a speaker veri pound sterling cation task. We pound sterling rst present the visual front-end of the automatic speechreading system. We then develop a recognition engine to train and recognize sequences of visual parameters. The experimental results based on the XM2VTS database [1] demonstrate that visual information is highly effective in reducing both false acceptance and false rejection rates in speaker verification tasks

DigitalCommons@CalPoly

Automatic Speechreading with Applications to Human-Computer Interfaces

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

Audio-visual speech processing system for Polish applicable to human-computer interaction

Author: Jadczyk Tomasz
Publication venue: 'AGHU University of Science and Technology Press'
Publication date: 19/02/2018
Field of study

This paper describes audio-visual speech recognition system for Polish language and a set of performance tests under various acoustic conditions. We first present the overall structure of AVASR systems with three main areas: audio features extraction, visual features extraction and subsequently, audiovisual speech integration. We present MFCC features for audio stream with standard HMM modeling technique, then we describe appearance and shape based visual features. Subsequently we present two feature integration techniques, feature concatenation and model fusion. We also discuss the results of a set of experiments conducted to select best system setup for Polish, under noisy audio conditions. Experiments are simulating human-computer interaction in computer control case with voice commands in difficult audio environments. With Active Appearance Model (AAM) and multistream Hidden Markov Model (HMM) we can improve system accuracy by reducing Word Error Rate for more than 30%, comparing to audio-only speech recognition, when Signal-to-Noise Ratio goes down to 0dB

Computer Science Journal (AGH University of Science and Technology, Krakow)

Adaptive Multimodal Fusion by Uncertainty Compensation With Application to Audiovisual Speech Recognition

Author: Athanassios Katsamanis
George Papandreou
Petros Maragos
Vassilis Pitsikalis
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

A study of lip movements during spontaneous dialog and its application to voice activity detection

Author: Girin Laurent
Jutten Christian
Rivet Bertrand
Savariaux Christophe
Schwartz Jean-Luc
Sodoyer David
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/01/2009
Field of study

International audienceThis paper presents a quantitative and comprehensive study of the lip movements of a given speaker in different speech/nonspeech contexts, with a particular focus on silences i.e., when no sound is produced by the speaker . The aim is to characterize the relationship between "lip activity" and "speech activity" and then to use visual speech information as a voice activity detector VAD . To this aim, an original audiovisual corpus was recorded with two speakers involved in a face-to-face spontaneous dialog, although being in separate rooms. Each speaker communicated with the other using a microphone, a camera, a screen, and headphones. This system was used to capture separate audio stimuli for each speaker and to synchronously monitor the speaker's lip movements. A comprehensive analysis was carried out on the lip shapes and lip movements in either silence or nonsilence i.e., speech+nonspeech audible events . A single visual parameter, defined to characterize the lip movements, was shown to be efficient for the detection of silence sections. This results in a visual VAD that can be used in any kind of environment noise, including intricate and highly nonstationary noises, e.g., multiple and/or moving noise sources or competing speech signals

Crossref

Hal - Université Grenoble Alpes

Multimodal Fusion of Polynomial Classifiers for Automatic Person Recognition

Author: Broun Charles C.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 17/04/2001
Field of study

With the prevalence of the information age, privacy and personalization are forefront in today\u27s society. As such, biometrics are viewed as essential components of current and evolving technological systems. Consumers demand unobtrusive and noninvasive approaches. In our previous work, we have demonstrated a speaker verification system that meets these criteria. However, there are additional constraints for fielded systems. The required recognition transactions are often performed in adverse environments and across diverse populations, necessitating robust solutions. There are two significant problem areas in current generation speaker verification systems. The first is the difficulty in acquiring clean audio signals (in all environments) without encumbering the user with a head-mounted close-talking microphone. Second, unimodal biometric systems do not work with a significant percentage of the population. To combat these issues, multimodal techniques are being investigated to improve system robustness to environmental conditions, as well as improve overall accuracy across the population. We propose a multimodal approach that builds on our current state-of-the-art speaker verification technology. In order to maintain the transparent nature of the speech interface, we focus on optical sensing technology to provide the additional modality–giving us an audio-visual person recognition system. For the audio domain, we use our existing speaker verification system. For the visual domain, we focus on lip motion. This is chosen, rather than static face or iris recognition, because it provides dynamic information about the individual. In addition, the lip dynamics can aid speech recognition to provide liveness testing. The visual processing method makes use of both color and edge information, combined within a Markov random field (MRF) framework, to localize the lips. Geometric features are extracted and input to a polynomial classifier for the person recognition process. A late integration approach, based on a probabilistic model, is employed to combine the two modalities. The system is tested on the XM2VTS database combined with AWGN (in the audio domain) over a range of signal-to-noise ratios

Crossref

DigitalCommons@CalPoly

A novel lip geometry approach for audio-visual speech recognition

Author: Zamri Ibrahim (7201733)
Publication venue
Publication date: 01/01/2014
Field of study

By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination of a skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching technique able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

Loughborough University Institutional Repository

UMP Institutional Repository

A Hierarchical Segmentation Algorithm for Face Analysis. Application to Lipreading

Author: Liévin Marc
Luthon Franck
Publication venue: HAL CCSD
Publication date: 30/07/2000
Field of study

International audienceA hierarchical algorithm for face analysis is presented in this paper. A color video sequence of speaker's face is acquired, under natural lighting conditions and without any particular make-up. The application aims at providing geometrical features of the face for scalable video transmission when no specific model of the speaker face is assumed. First, a logarithmic hue transform is performed from RGB to HI (hue, intensity) color space. Next, a Markov random field modeling regularizes motion and hue information within a spatiotemporal neighborhood. The hierarchical segmentation labels the different areas of the face. Results are shown on the lower part of the face and compared with standard color segmentation algorithm (fuzzy c-means). A speaker's lip shape with inner and outer borders is extracted from the final labeling and used to initialize an active contours stage

Hal - Université Grenoble Alpes

Sensory Communication

Author: Annaswamy Anuradha M.
Aviles Walter A.
Bandy James H.
Beauregard G. Lee
Braida Louis D.
Brantley Merry A.
Bratakos Maroula S.
Brock David L.
Brughera Andrew R.
Brungart Douglas S.
Carmel Erika N.
Chen Frederick W.
Chen Jyh-Shing
Clarkson Brian
Crouch John
Dandekar Kiran B.
De Suvranu
Delhorne Lorraine A.
Denesvich Gail
Desloge Joseph G.
Duchnowski Paul
Durlach Nathaniel I.
Eddington Donald K.
Foley Jeffrey J.
Foxlin Eric M.
Frisbie Joseph A.
Goldman Susan L.
Graaf Isaac
Grant Kenneth W.
Greenberg Julie E.
Gulati Rogeve J.
Gupta Rakesh
Hall Dorrie
Hall Seth M.
Held Richard M.
Held Richard M.
Ho Chih-Hao
Hou Alexandra I.
Howe Robert D.
Jandura Louise
Johnson Owen D.
Jones Gabrielle
Jones Lynette A.
Karason Steingrimur P.
Krause Jean C.
LaMotte Robert H.
Lathan Corrie
Leabman Michael A.
Lee Jeng-Feng
Lemay Danielle G.
Lin Gregory G.
Lippman Rebecca F.
Masaki Kinuku
Morgenbesser Hugh B.
Nadeau Philip M.
O'Connell Michael P.
Ogora T. H.
Park John
Payton Karen L.
Pfautz Jonathan D.
Pioch Nicholas
Plant Geoffrey L.
Power Matthew H.
Rabinowitz William M.
Raju Balasundara I.
Rangaswamy Sudeep
Rankovic Christine M.
Reed Charlotte M.
Richardson Christopher R.
Roby Frederick L.
Rodkin John J.
Sachtler Wendelin L.
Salisbury J. Kenneth
Santos Jonathan R.
Schloerb David W.
Sekiyama Kaoru
Sexton Matthew G.
Shinn-Cunningham Barbara G.
Shnidman Nathan R.
Srikantiah Ranjini
Srinivasan Mandayam A.
Sroka Jason
Takeuchi Anne H.
Tan Hong Z.
Tassa Coral D.
Taylor Francis G.
Voss Kimberly J.
Weber Lukasz A.
Wiegand Thomas E. v.
Wies Evan F.
Yoon John
Zeltzer David
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents for Section 2, an introduction and reports on fifteen research projects.National Institutes of Health Grant RO1 DC00117National Institutes of Health Grant RO1 DC02032National Institutes of Health Contract P01-DC00361National Institutes of Health Contract N01-DC22402National Institutes of Health/National Institute on Deafness and Other Communication Disorders Grant 2 R01 DC00126National Institutes of Health Grant 2 R01 DC00270National Institutes of Health Contract N01 DC-5-2107National Institutes of Health Grant 2 R01 DC00100U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-94-C-0087U.S. Navy - Office of Naval Research/Naval Air Warfare Center Contract N61339-95-K-0014U.S. Navy - Office of Naval Research/Naval Air Warfare Center Grant N00014-93-1-1399U.S. Navy - Office of Naval Research/Naval Air Warfare Center Grant N00014-94-1-1079U.S. Navy - Office of Naval Research Subcontract 40167U.S. Navy - Office of Naval Research Grant N00014-92-J-1814National Institutes of Health Grant R01-NS33778U.S. Navy - Office of Naval Research Grant N00014-88-K-0604National Aeronautics and Space Administration Grant NCC 2-771U.S. Air Force - Office of Scientific Research Grant F49620-94-1-0236U.S. Air Force - Office of Scientific Research Agreement with Brandeis Universit

DSpace@MIT

Sensory Communication

Author: Alvarez Daniel A.
Braida Louis D.
Chen Jyh-Shing
Coffman Bridget L.
Dandekar Kiran B.
Delhorne Lorraine A.
Desloge Joseph G.
Duchnowski Paul
Durlach Nathaniel I.
Eddington Donald K.
Frisbie Joseph A.
Fuchs Eric M.
Goldish Andrew C.
Goldman Susan L.
Greenberg Julie E.
Gulati Rogeve J.
Held Richard M.
Jandura Louise
Keagy Michael T.
Lum David S.
Martin Gregory R.
Mueller Jason
Nadeau Philip P.
Nadelski Mark T.
Power Matthew H.
Rabinowitz William M.
Rankovic Christine M.
Reed Charlotte M.
Salisbury J. Kenneth
Shao Yun
Shinn-Cunningham Barbara G.
Srinivasan Mandayam A.
Stadler Robert W.
Tan Hong Z.
Uchanski Rosalie M.
Wei Min
Wozniak Jennifer A.
Zue Victor W.
Zurek Patrick M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date
Field of study

Contains table of contents on Section 2, an introduction, reports on eleven research projects and a list of publications.National Institutes of Health Grant 5 R01 DC00117National Institutes of Health Grant 5 R01 DC00270National Institutes of Health Contract 2 P01 DC00361National Institutes of Health Grant 5 R01 DC00100National Institutes of Health Contract 7 R29 DC00428National Institutes of Health Grant 2 R01 DC00126U.S. Air Force - Office of Scientific Research Grant AFOSR 90-0200U.S. Navy - Office of Naval Research Grant N00014-90-J-1935National Institutes of Health Grant 5 R29 DC00625U.S. Navy - Office of Naval Research Grant N00014-91-J-1454U.S. Navy - Office of Naval Research Grant N00014-92-J-181

DSpace@MIT