Search CORE

353 research outputs found

Designing a Visual Front End in Audio-Visual Automatic Speech Recognition System

Author: Dong Junda
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2015
Field of study

Audio-visual automatic speech recognition (AVASR) is a speech recognition technique integrating audio and video signals as input. Traditional audio-only speech recognition system only uses acoustic information from an audio source. However the recognition performance degrades significantly in acoustically noisy environments. It has been shown that visual information also can be used to identify speech. To improve the speech recognition performance, audio-visual automatic speech recognition has been studied. In this paper, we focus on the design of the visual front end of an AVASR system, which mainly consists of face detection and lip localization. The front end is built upon the AVICAR database that was recorded in moving vehicles. Therefore, diverse lighting conditions and poor quality of imagery are the problems we must overcome. We first propose the use of the Viola-Jones face detection algorithm that can process images rapidly with high detection accuracy. When the algorithm is applied to the AVICAR database, we reach an accuracy of 89% face detection rate. By separately detecting and integrating the detection results from all different color channels, we further improve the detection accuracy to 95%. To reliably localize the lips, three algorithms are studied and compared: the Gabor filter algorithm, the lip enhancement algorithm, and the modified Viola-Jones algorithm for lip features. Finally, to increase detection rate, a modified Viola-Jones algorithm and lip enhancement algorithms are cascaded based on the results of three lip localization methods. Overall, the front end achieves an accuracy of 90% for lip localization

DigitalCommons@CalPoly

An Efficient Boosted Classifier Tree-Based Feature Point Tracking System for Facial Expression Analysis

Author: Livingston Adam Redd
Publication venue: ODU Digital Commons
Publication date: 01/04/2012
Field of study

The study of facial movement and expression has been a prominent area of research since the early work of Charles Darwin. The Facial Action Coding System (FACS), developed by Paul Ekman, introduced the first universal method of coding and measuring facial movement. Human-Computer Interaction seeks to make human interaction with computer systems more effective, easier, safer, and more seamless. Facial expression recognition can be broken down into three distinctive subsections: Facial Feature Localization, Facial Action Recognition, and Facial Expression Classification. The first and most important stage in any facial expression analysis system is the localization of key facial features. Localization must be accurate and efficient to ensure reliable tracking and leave time for computation and comparisons to learned facial models while maintaining real-time performance. Two possible methods for localizing facial features are discussed in this dissertation. The Active Appearance Model is a statistical model describing an object\u27s parameters through the use of both shape and texture models, resulting in appearance. Statistical model-based training for object recognition takes multiple instances of the object class of interest, or positive samples, and multiple negative samples, i.e., images that do not contain objects of interest. Viola and Jones present a highly robust real-time face detection system, and a statistically boosted attentional detection cascade composed of many weak feature detectors. A basic algorithm for the elimination of unnecessary sub-frames while using Viola-Jones face detection is presented to further reduce image search time. A real-time emotion detection system is presented which is capable of identifying seven affective states (agreeing, concentrating, disagreeing, interested, thinking, unsure, and angry) from a near-infrared video stream. The Active Appearance Model is used to place 23 landmark points around key areas of the eyes, brows, and mouth. A prioritized binary decision tree then detects, based on the actions of these key points, if one of the seven emotional states occurs as frames pass. The completed system runs accurately and achieves a real-time frame rate of approximately 36 frames per second. A novel facial feature localization technique utilizing a nested cascade classifier tree is proposed. A coarse-to-fine search is performed in which the regions of interest are defined by the response of Haar-like features comprising the cascade classifiers. The individual responses of the Haar-like features are also used to activate finer-level searches. A specially cropped training set derived from the Cohn-Kanade AU-Coded database is also developed and tested. Extensions of this research include further testing to verify the novel facial feature localization technique presented for a full 26-point face model, and implementation of a real-time intensity sensitive automated Facial Action Coding System

Old Dominion University

Gender Classification from Facial Images

Author: Chen Cunjian
Publication venue: The Research Repository @ WVU
Publication date: 01/08/2011
Field of study

Gender classification based on facial images has received increased attention in the computer vision community. In this work, a comprehensive evaluation of state-of-the-art gender classification methods is carried out on publicly available databases and extended to reallife face images, where face detection and face normalization are essential for the success of the system. Next, the possibility of predicting gender from face images acquired in the near-infrared spectrum (NIR) is explored. In this regard, the following two questions are addressed: (a) Can gender be predicted from NIR face images; and (b) Can a gender predictor learned using visible (VIS) images operate successfully on NIR images and vice-versa? The experimental results suggest that NIR face images do have some discriminatory information pertaining to gender, although the degree of discrimination is noticeably lower than that of VIS images. Further, the use of an illumination normalization routine may be essential for facilitating cross-spectral gender prediction. By formulating the problem of gender classification in the framework of both visible and near-infrared images, the guidelines for performing gender classification in a real-world scenario is provided, along with the strengths and weaknesses of each methodology. Finally, the general problem of attribute classification is addressed, where features such as expression, age and ethnicity are derived from a face image

The Research Repository @ WVU (West Virginia University)