Search CORE

60 research outputs found

Bimodal Fusion in Audio-Visual Speech Recognition

Author: Clements Mark A.
Mersereau Russell M.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 01/01/2002
Field of study

Extending automatic speech recognition (ASR) to the visual modality has been shown to greatly increase recognition accuracy and improve system robustness over purely acoustic systems. especially in acoustically hostile environments. An important aspect of designing such systems is how to incorporate the visual component Into the acoustic speech recognizer to achieve optimal performance. In this paper, we investigate methods of Integrating the audio and visual modalities within HMM-based classification models. We examine existing integration schemes and propose the use of a coupled hidden Markov model (CHMM) to exploit audio-visual interaction. Our experimental results demonstrate that the CHMM consistently outperforms other integration models for a large range of acoustic noise levels and suggest that it better captures temporal correlations between the two streams of information

CiteSeerX

DigitalCommons@CalPoly

Digital Signal Processing

Author: Dudgeon Dan E.
Hersey Harlan S.
Mersereau Russell M.
Publication venue: Research Laboratory of Electronics (RLE) at the Massachusetts Institute of Technology (MIT)
Publication date: 15/07/1974
Field of study

Contains reports on three research projects.U. S. Navy Office of Naval Research (Contract N00014-67-A-0204-0064)National Science Foundation (Grant GK-31353

DSpace@MIT

Automatic Speechreading with Application to Speaker Verification

Author: Broun Charles C.
Clements Mark A.
Mersereau Russell M.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 13/05/2002
Field of study

Speech not only conveys the linguistic information, but also characterizes the talker\u27s identify and therefore can be used in personal authentication. While most of the speech information is contained in the acoustic channel, the lip movement during speech production also provides useful information. In this paper we investigate the effectiveness of visual speech features in a speaker veri pound sterling cation task. We pound sterling rst present the visual front-end of the automatic speechreading system. We then develop a recognition engine to train and recognize sequences of visual parameters. The experimental results based on the XM2VTS database [1] demonstrate that visual information is highly effective in reducing both false acceptance and false rejection rates in speaker verification tasks

DigitalCommons@CalPoly

From screening to postpartum follow-up – the determinants and barriers for gestational diabetes mellitus (GDM) services, a systematic review

Author: A Ferrara
A Ferrara
A Jiwani
A Kapur
A Stuebe
AH Xiang
AJ Blatt
AK Shea
AM Baker
American College of Obstetricians and Gynecologists
American Diabetes Association
Anil Kapur
AZ Khambalia
B Simkhada
BE Metzger
BJ Smith
BR Shah
C Kim
C Kim
C Kim
C Kim
C Kim
CA Crowther
CA Persily
CV Almario
D Koh
DD Symons
DG Marrero
E Keely
E Stage
EJ Lawson
F Doran
G Hu
H Razee
HD Clark
HT Neufeld
I Ruengkhachorn
Ib Christian Bygbjerg
International Diabetes Federation
J Cullinan
J Cullinan
J Tuomilehto
JA Gazmararian
JE Hirst
JJ Infanti
JL Sievenpiper
JM Lawrence
JM Nicklas
K Hjelm
K Hjelm
K Zehle
Karoline Kragelund Nielsen
KJ Hunt
KK Nielsen
KL Pedula
KV Smirnakis
L Ruggiero
M Bandyopadhyay
M Graco
M Hoedjes
M Persson
M Yapa
MA Russell
Maximilian de Courten
MB Landon
MB Landon
MH Black
MK Evans
MK Morrison
N Aljohani
NA Beischer
P Damm
P McNamee
P Mersereau
Peter Damm
PM Dietz
Q Gong
RC Kaufmann
RE Ratner
RE Ratner
RG Moses
S Kwong
SA Collier
SG Gabbe
ST Shih
TA Buchanan
V Anna
V Seshiah
W Swan
WC Knowler
WL Bennett
WW Zhu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Lip Feature Extraction Towards an Automatic Speechreading System

Author: Mersereau Russell M.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 10/09/2000
Field of study

The use of color information can significantly improve efficiency and robustness of lip feature extraction capability over purely grayscale-based methods. Edge information provides another useful tool in characterizing lip boundaries. In this paper we present a method of integrating both types of information to address the problem of lip feature extraction for the purpose of speechreading. We first examine various color models and view hue as an effective descriptor to characterize the lips due to its invariance to luminance and human skin color, and its discriminative properties. We use prominent red hue as an indicator to locate the position of the lips. Based on the identified lip area, we further refine the interior and exterior lip boundary using both color and spatial edge information, where those two are combined within a Markov random field (MRF) framework. Experimental results are presented to show the effectiveness of this method

DigitalCommons@CalPoly

Digital reconstruction of multidimensional signals from their projections.

Author: Mersereau Russell M
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/1973
Field of study

Massachusetts Institute of Technology. Dept. of Electrical Engineering. Thesis. 1973. Sc.D.MICROFICHE COPY ALSO AVAILABLE IN BARKER ENGINEERING LIBRARY.Vita.Bibliography: leaves 181-184.Sc.D

DSpace@MIT

Novel methods for video signal analysis and compression

Author: Mersereau Russell M.
Publication venue: Georgia Institute of Technology
Publication date: 01/01/1998
Field of study

Issued as final repor

Scholarly Materials And Research @ Georgia Tech

Multiple Global Affine Motion Models Used in Video Coding

Author: M. Mersereau
Professor Russell
Publication venue: Georgia Institute of Technology
Publication date: 01/01/2006
Field of study

With low bit rate scenarios, a hybrid video coder (e.g. AVC/H.264) tends to allocate greater portion of bits for motion vectors, while saving bits on residual errors. According to this fact, a coding scheme with non-normative global motion models in combination with conventional local motion vectors is proposed, which describes the motion of a frame by the affine motion parameter sets drawn by motion segmentation of the luminance channel. The motion segmentation task is capable of adapting the number of motion objects to the video contents. 6-D affine model sets are driven by linear regression from the scalable block-based motion fields estimated by the existent MPEG encoder. In cases that the number of motion objects exceeds a certain threshold, the global affine models are disabled. Otherwise the 4 scaling factors of the affine models are compressed by a vector quantizer, designed with a unique cache memory for efficient searching and coding. The affine motion information is written in the slice header as a syntax. The global motion information is used for compensating those macroblocks whose Lagrange cost is minimized by the AFFINE mode. The rate-distortion cost is computed by a modified Lagrange equation, which takes into consideration the perceptual discrimination of human vision in different areas. Besides increasing the coding efficiency, the global affine model manifests the following two features that refine the compressed video quality. i) When the number of slices per frame is more than 1, the global affine motion model can enhance the error-resilience of the video stream, because the affine motion parameters are duplicated in the headers of different slices over the same frame. ii) The global motion model predicts a frame by warping the whole reference frame and this helps to decrease blocking artifacts in the compensation frame.Ph.D.Committee Chair: Jackson, Joel; Committee Member: anderson, david; Committee Member: fritz, hermann; Committee Member: Mersereau, Russel; Committee Member: Yezzi, Anthon

Scholarly Materials And Research @ Georgia Tech

CiteSeerX

Audio-Visual Speech Recognition by Speechreading

Author: Clements Mark A.
Mersereau Russell M.
Zhang Xiaozheng
Publication venue: DigitalCommons@CalPoly
Publication date: 01/01/2002
Field of study

Speechreading increases intelligibility in human speech perception. This suggests that conventional acoustic-based speech processing can benefit from the addition of visual information. This paper exploits speechreading for joint audio-visual speech recognition. We first present a color-based feature extraction algorithm that is able to extract salient visual speech features reliably from a frontal view of the talker in a video sequence. Then, a new fusion strategy using a coupled hidden Markov model (CHMM) is proposed to incorporate visual modality into the acoustic subsystem. By maintaining temporal coupling across the two modalities at the feature level and allowing asynchrony in the state at the same time, a CHMM provides a better model for capturing temporal correlations between the two streams of information. The experimental results demonstrate that the combined audio-visual system outperforms the acoustic-only recognizer over a wide range of noise levels

DigitalCommons@CalPoly