4 research outputs found
Adaptive threshold optimisation for colour-based lip segmentation in automatic lip-reading systems
A thesis submitted to the Faculty of Engineering and the Built Environment,
University of the Witwatersrand, Johannesburg, in ful lment of the requirements for
the degree of Doctor of Philosophy.
Johannesburg, September 2016Having survived the ordeal of a laryngectomy, the patient must come to terms with
the resulting loss of speech. With recent advances in portable computing power,
automatic lip-reading (ALR) may become a viable approach to voice restoration. This
thesis addresses the image processing aspect of ALR, and focuses three contributions
to colour-based lip segmentation.
The rst contribution concerns the colour transform to enhance the contrast
between the lips and skin. This thesis presents the most comprehensive study to
date by measuring the overlap between lip and skin histograms for 33 di erent
colour transforms. The hue component of HSV obtains the lowest overlap of 6:15%,
and results show that selecting the correct transform can increase the segmentation
accuracy by up to three times.
The second contribution is the development of a new lip segmentation algorithm
that utilises the best colour transforms from the comparative study. The algorithm
is tested on 895 images and achieves percentage overlap (OL) of 92:23% and segmentation
error (SE) of 7:39 %.
The third contribution focuses on the impact of the histogram threshold on the
segmentation accuracy, and introduces a novel technique called Adaptive Threshold
Optimisation (ATO) to select a better threshold value. The rst stage of ATO
incorporates -SVR to train the lip shape model. ATO then uses feedback of shape
information to validate and optimise the threshold. After applying ATO, the SE
decreases from 7:65% to 6:50%, corresponding to an absolute improvement of 1:15 pp
or relative improvement of 15:1%. While this thesis concerns lip segmentation in
particular, ATO is a threshold selection technique that can be used in various
segmentation applications.MT201
A motion-based approach for audio-visual automatic speech recognition
The research work presented in this thesis introduces novel approaches for both visual
region of interest extraction and visual feature extraction for use in audio-visual
automatic speech recognition. In particular, the speaker‘s movement that occurs
during speech is used to isolate the mouth region in video sequences and motionbased
features obtained from this region are used to provide new visual features for
audio-visual automatic speech recognition. The mouth region extraction approach
proposed in this work is shown to give superior performance compared with existing
colour-based lip segmentation methods. The new features are obtained from three
separate representations of motion in the region of interest, namely the difference in
luminance between successive images, block matching based motion vectors and
optical flow. The new visual features are found to improve visual-only and audiovisual
speech recognition performance when compared with the commonly-used
appearance feature-based methods.
In addition, a novel approach is proposed for visual feature extraction from either the
discrete cosine transform or discrete wavelet transform representations of the mouth
region of the speaker. In this work, the image transform is explored from a new
viewpoint of data discrimination; in contrast to the more conventional data
preservation viewpoint. The main findings of this work are that audio-visual
automatic speech recognition systems using the new features extracted from the
frequency bands selected according to their discriminatory abilities generally
outperform those using features designed for data preservation.
To establish the noise robustness of the new features proposed in this work, their
performance has been studied in presence of a range of different types of noise and at
various signal-to-noise ratios. In these experiments, the audio-visual automatic speech
recognition systems based on the new approaches were found to give superior
performance both to audio-visual systems using appearance based features and to
audio-only speech recognition systems
A motion based approach for audio-visual automatic speech recognition
The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Lip print based authentication in physical access control Environments
Abstract: In modern society, there is an ever-growing need to determine the identity of a person in many applications including computer security, financial transactions, borders, and forensics. Early automated methods of authentication relied mostly on possessions and knowledge. Notably these authentication methods such as passwords and access cards are based on properties that can be lost, stolen, forgotten, or disclosed. Fortunately, biometric recognition provides an elegant solution to these shortcomings by identifying a person based on their physiological or behaviourial characteristics. However, due to the diverse nature of biometric applications (e.g., unlocking a mobile phone to cross an international border), no biometric trait is likely to be ideal and satisfy the criteria for all applications. Therefore, it is necessary to investigate novel biometric modalities to establish the identity of individuals on occasions where techniques such as fingerprint or face recognition are unavailable. One such modality that has gained much attention in recent years which originates from forensic practices is the lip. This research study considers the use of computer vision methods to recognise different lip prints for achieving the task of identification. To determine whether the research problem of the study is valid, a literature review is conducted which helps identify the problem areas and the different computer vision methods that can be used for achieving lip print recognition. Accordingly, the study builds on these areas and proposes lip print identification experiments with varying models which identifies individuals solely based on their lip prints and provides guidelines for the implementation of the proposed system. Ultimately, the experiments encapsulate the broad categories of methods for achieving lip print identification. The implemented computer vision pipelines contain different stages including data augmentation, lip detection, pre-processing, feature extraction, feature representation and classification. Three pipelines were implemented from the proposed model which include a traditional machine learning pipeline, a deep learning-based pipeline and a deep hybridlearning based pipeline. Different metrics reported in literature are used to assess the performance of the prototype such as IoU, mAP, accuracy, precision, recall, F1 score, EER, ROC curve, PR curve, accuracy and loss curves. The first pipeline of the current study is a classical pipeline which employs a facial landmark detector (One Millisecond Face Alignment algorithm) to detect the lip, SURF for feature extraction, BoVW for feature representation and an SVM or K-NN classifier. The second pipeline makes use of the facial landmark detector and a VGG16 or ResNet50 architecture. The findings reveal that the ResNet50 is the best performing method for lip print identification for the current study. The third pipeline also employs the facial landmark detector, the ResNet50 architecture for feature extraction with an SVM classifier. The development of the experiments is validated and benchmarked to determine the extent or performance at which it can achieve lip print identification. The results of the benchmark for the prototype, indicate that the study accomplishes the objective of identifying individuals based on their lip prints using computer vision methods. The results also determine that the use of deep learning architectures such as ResNet50 yield promising results.M.Sc. (Science