2,114 research outputs found

    A motion-based approach for audio-visual automatic speech recognition

    Get PDF
    The research work presented in this thesis introduces novel approaches for both visual region of interest extraction and visual feature extraction for use in audio-visual automatic speech recognition. In particular, the speaker‘s movement that occurs during speech is used to isolate the mouth region in video sequences and motionbased features obtained from this region are used to provide new visual features for audio-visual automatic speech recognition. The mouth region extraction approach proposed in this work is shown to give superior performance compared with existing colour-based lip segmentation methods. The new features are obtained from three separate representations of motion in the region of interest, namely the difference in luminance between successive images, block matching based motion vectors and optical flow. The new visual features are found to improve visual-only and audiovisual speech recognition performance when compared with the commonly-used appearance feature-based methods. In addition, a novel approach is proposed for visual feature extraction from either the discrete cosine transform or discrete wavelet transform representations of the mouth region of the speaker. In this work, the image transform is explored from a new viewpoint of data discrimination; in contrast to the more conventional data preservation viewpoint. The main findings of this work are that audio-visual automatic speech recognition systems using the new features extracted from the frequency bands selected according to their discriminatory abilities generally outperform those using features designed for data preservation. To establish the noise robustness of the new features proposed in this work, their performance has been studied in presence of a range of different types of noise and at various signal-to-noise ratios. In these experiments, the audio-visual automatic speech recognition systems based on the new approaches were found to give superior performance both to audio-visual systems using appearance based features and to audio-only speech recognition systems

    Unimodal Multi-Feature Fusion and one-dimensional Hidden Markov Models for Low-Resolution Face Recognition

    Get PDF
    The objective of low-resolution face recognition is to identify faces from small size or poor quality images with varying pose, illumination, expression, etc. In this work, we propose a robust low face recognition technique based on one-dimensional Hidden Markov Models. Features of each facial image are extracted using three steps: firstly, both Gabor filters and Histogram of Oriented Gradients (HOG) descriptor are calculated. Secondly, the size of these features is reduced using the Linear Discriminant Analysis (LDA) method in order to remove redundant information. Finally, the reduced features are combined using Canonical Correlation Analysis (CCA) method. Unlike existing techniques using HMMs, in which authors consider each state to represent one facial region (eyes, nose, mouth, etc), the proposed system employs 1D-HMMs without any prior knowledge about the localization of interest regions in the facial image. Performance of the proposed method will be measured using the AR database

    A Framework for Human Motion Strategy Identification and Analysis

    Get PDF
    The human body has many biomechanical degrees of freedom and thus multiple movement strategies can be employed to execute any given task. Automated identification and classification of these movement strategies have potential applications in various fields including sports performance research, rehabilitation, and injury prevention. For example, in the field of rehabilitation, the choice of movement strategy can impact joint loading patterns and risk of injury. The problem of identifying movement strategies is related to the problem of classifying variations in the observed motions. When differences between two movement trajectories performing the same task are large, they are considered to be different movement strategies. Conversely, when the differences between observed movements are small, they are considered to be variations of the same movement strategy. In the simplest scenario a movement strategy can represent a cluster of similar movement trajectories, but in more complicated scenarios differences in movements could also lie on a continuum. The goal of this thesis is to develop a computational framework to automatically recognize different movement strategies for performing a task and to identify what makes each strategy different. The proposed framework utilizes Gaussian Process Dynamical Models (GPDM) to convert human motion trajectories from their original high dimensional representation to a trajectory in a lower dimensional space (i.e. the latent space). The dimensionality of the latent space is determined by iteratively increasing the dimensionality until the reduction in reconstruction error between iterations becomes small. Then, the lower dimensional trajectories are clustered using a Hidden Markov Model (HMM) clustering algorithm to identify movement strategies in an unsupervised manner. Next, we introduce an HMM-based technique for detecting differences in signals between two HMM models. This technique is used to compare latent space variables between the low-dimensional trajectory models as well as differences in degrees-of-freedom (DoF) between the corresponding high-dimensional (original) trajectory models. Then, through correlating latent variable and DoF differences movement synergies are discovered. To validate the proposed framework, it was tested on 3 different datasets – a synthetic dataset, a real labeled motion capture dataset, and an unlabeled motion capture dataset. The proposed framework achieved higher classification accuracy against competing algorithms (Joint Component Vector and Kinematic Synergies) where labels were known apriori. Additionally, the proposed algorithm showed that it was able to discover strategies that were not known apriori and how the strategies differed

    Text-independent speaker recognition

    Get PDF
    This research presents new text-independent speaker recognition system with multivariate tools such as Principal Component Analysis (PCA) and Independent Component Analysis (ICA) embedded into the recognition system after the feature extraction step. The proposed approach evaluates the performance of such a recognition system when trained and used in clean and noisy environments. Additive white Gaussian noise and convolutive noise are added. Experiments were carried out to investigate the robust ability of PCA and ICA using the designed approach. The application of ICA improved the performance of the speaker recognition model when compared to PCA. Experimental results show that use of ICA enabled extraction of higher order statistics thereby capturing speaker dependent statistical cues in a text-independent recognition system. The results show that ICA has a better de-correlation and dimension reduction property than PCA. To simulate a multi environment system, we trained our model such that every time a new speech signal was read, it was contaminated with different types of noises and stored in the database. Results also show that ICA outperforms PCA under adverse environments. This is verified by computing recognition accuracy rates obtained when the designed system was tested for different train and test SNR conditions with additive white Gaussian noise and test delay conditions with echo effect
    corecore