60,775 research outputs found

    Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition

    Full text link
    Automatic Speech Recognition (ASR) by machine is an attractive research topic in signal processing domain and has attracted many researchers to contribute in this area. In recent year, there have been many advances in automatic speech reading system with the inclusion of audio and visual speech features to recognize words under noisy conditions. The objective of audio-visual speech recognition system is to improve recognition accuracy. In this paper we computed visual features using Zernike moments and audio feature using Mel Frequency Cepstral Coefficients (MFCC) on vVISWa (Visual Vocabulary of Independent Standard Words) dataset which contains collection of isolated set of city names of 10 speakers. The visual features were normalized and dimension of features set was reduced by Principal Component Analysis (PCA) in order to recognize the isolated word utterance on PCA space.The performance of recognition of isolated words based on visual only and audio only features results in 63.88 and 100 respectively

    Improving Performance of Speaker Identification System Using Complementary Information Fusion

    Full text link
    Feature extraction plays an important role as a front-end processing block in speaker identification (SI) process. Most of the SI systems utilize like Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), Linear Predictive Cepstral Coefficients (LPCC), as a feature for representing speech signal. Their derivations are based on short term processing of speech signal and they try to capture the vocal tract information ignoring the contribution from the vocal cord. Vocal cord cues are equally important in SI context, as the information like pitch frequency, phase in the residual signal, etc could convey important speaker specific attributes and are complementary to the information contained in spectral feature sets. In this paper we propose a novel feature set extracted from the residual signal of LP modeling. Higher-order statistical moments are used here to find the nonlinear relationship in residual signal. To get the advantages of complementarity vocal cord based decision score is fused with the vocal tract based score. The experimental results on two public databases show that fused mode system outperforms single spectral features.Comment: 6 Pages, 3 figure

    i Vector used in Speaker Identification by Dimension Compactness

    Full text link
    The automatic speaker identification procedure is used to extract features that help to identify the components of the acoustic signal by discarding all the other stuff like background noise, emotion, hesitation, etc. The acoustic signal is generated by a human that is filtered by the shape of the vocal tract, including tongue, teeth, etc. The shape of the vocal tract determines and produced, what signal comes out in real time. The analytically develops shape of the vocal tract, which exhibits envelop for the short time power spectrum. The ASR needs efficient way of extracting features from the acoustic signal that is used effectively to makes the shape of the individual vocal tract. To identify any acoustic signal in the large collection of acoustic signal i.e. corpora, it needs dimension compactness of total variability space by using the GMM mean super vector. This work presents the efficient way to implement dimension compactness in total variability space and using cosine distance scoring to predict a fast output score for small size utterance.Comment: 6 pages,7 figure

    Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments

    Full text link
    This research is an effort to present an effective approach to enhance text-independent speaker identification performance in emotional talking environments based on novel classifier called cascaded Gaussian Mixture Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing, implementing and evaluating a new approach for speaker identification in emotional talking environments based on cascaded Gaussian Mixture Model-Deep Neural Network as a classifier. The results point out that the cascaded GMM-DNN classifier improves speaker identification performance at various emotions using two distinct speech databases: Emirati speech database (Arabic United Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS) English dataset. The proposed classifier outperforms classical classifiers such as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each dataset. Speaker identification performance that has been attained based on the cascaded GMM-DNN is similar to that acquired from subjective assessment by human listeners.Comment: 15 page

    A text-independent speaker verification model: A comparative analysis

    Full text link
    The most pressing challenge in the field of voice biometrics is selecting the most efficient technique of speaker recognition. Every individual's voice is peculiar, factors like physical differences in vocal organs, accent and pronunciation contributes to the problem's complexity. In this paper, we explore the various methods available in each block in the process of speaker recognition with the objective to identify best of techniques that could be used to get precise results. We study the results on text independent corpora. We use MFCC (Melfrequency cepstral coefficient), LPCC (linear predictive cepstral coefficient) and PLP (perceptual linear prediction) algorithms for feature extraction, PCA (Principal Component Analysis) and tSNE for dimensionality reduction and SVM (Support Vector Machine), feed forward, nearest neighbor and decision tree algorithms for classification block in speaker recognition system and comparatively analyze each block to determine the best techniqueComment: presented and accepted by 2017 International Conference on Intelligent Computing and Control (I2C2

    Intelligent System for Speaker Identification using Lip features with PCA and ICA

    Full text link
    Biometric authentication techniques are more consistent and efficient than conventional authentication techniques and can be used in monitoring, transaction authentication, information retrieval, access control, forensics, etc. In this paper, we have presented a detailed comparative analysis between Principle Component Analysis (PCA) and Independent Component Analysis (ICA) which are used for feature extraction on the basis of different Artificial Neural Network (ANN) such as Back Propagation (BP), Radial Basis Function (RBF) and Learning Vector Quantization (LVQ). In this paper, we have chosen "TULIPS1 database, (Movellan, 1995)" which is a small audiovisual database of 12 subjects saying the first 4 digits in English for the incorporation of above methods. The six geometric lip features i.e. height of the outer corners of the mouth, width of the outer corners of the mouth, height of the inner corners of the mouth, width of the inner corners of the mouth, height of the upper lip, and height of the lower lip which extracts the identity relevant information are considered for the research work. After the comprehensive analysis and evaluation a maximum of 91.07% accuracy in speaker recognition is achieved using PCA and RBF and 87.36% accuracy is achieved using ICA and RBF. Speaker identification has a wide scope of applications such as access control, monitoring, transaction authentication, information retrieval, forensics, etc.Comment: https://sites.google.com/site/journalofcomputing

    Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction

    Full text link
    Very often data we encounter in practice is a collection of matrices rather than a single matrix. These multi-block data are naturally linked and hence often share some common features and at the same time they have their own individual features, due to the background in which they are measured and collected. In this study we proposed a new scheme of common and individual feature analysis (CIFA) that processes multi-block data in a linked way aiming at discovering and separating their common and individual features. According to whether the number of common features is given or not, two efficient algorithms were proposed to extract the common basis which is shared by all data. Then feature extraction is performed on the common and the individual spaces separately by incorporating the techniques such as dimensionality reduction and blind source separation. We also discussed how the proposed CIFA can significantly improve the performance of classification and clustering tasks by exploiting common and individual features of samples respectively. Our experimental results show some encouraging features of the proposed methods in comparison to the state-of-the-art methods on synthetic and real data.Comment: 13 pages,11 figure

    Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review

    Full text link
    Pattern analysis often requires a pre-processing stage for extracting or selecting features in order to help the classification, prediction, or clustering stage discriminate or represent the data in a better way. The reason for this requirement is that the raw data are complex and difficult to process without extracting or selecting appropriate features beforehand. This paper reviews theory and motivation of different common methods of feature selection and extraction and introduces some of their applications. Some numerical implementations are also shown for these methods. Finally, the methods in feature selection and extraction are compared.Comment: 14 pages, 1 figure, 2 tables, survey (literature review) pape

    Voice Activity Detection: Merging Source and Filter-based Information

    Full text link
    Voice Activity Detection (VAD) refers to the problem of distinguishing speech segments from background noise. Numerous approaches have been proposed for this purpose. Some are based on features derived from the power spectral density, others exploit the periodicity of the signal. The goal of this paper is to investigate the joint use of source and filter-based features. Interestingly, a mutual information-based assessment shows superior discrimination power for the source-related features, especially the proposed ones. The features are further the input of an artificial neural network-based classifier trained on a multi-condition database. Two strategies are proposed to merge source and filter information: feature and decision fusion. Our experiments indicate an absolute reduction of 3% of the equal error rate when using decision fusion. The final proposed system is compared to four state-of-the-art methods on 150 minutes of data recorded in real environments. Thanks to the robustness of its source-related features, its multi-condition training and its efficient information fusion, the proposed system yields over the best state-of-the-art VAD a substantial increase of accuracy across all conditions (24% absolute on average)

    Speech Recognition by Machine, A Review

    Full text link
    This paper presents a brief survey on Automatic Speech Recognition and discusses the major themes and advances made in the past 60 years of research, so as to provide a technological perspective and an appreciation of the fundamental progress that has been accomplished in this important area of speech communication. After years of research and development the accuracy of automatic speech recognition remains one of the important research challenges (e.g., variations of the context, speakers, and environment).The design of Speech Recognition system requires careful attentions to the following issues: Definition of various types of speech classes, speech representation, feature extraction techniques, speech classifiers, database and performance evaluation. The problems that are existing in ASR and the various techniques to solve these problems constructed by various research workers have been presented in a chronological order. Hence authors hope that this work shall be a contribution in the area of speech recognition. The objective of this review paper is to summarize and compare some of the well known methods used in various stages of speech recognition system and identify research topic and applications which are at the forefront of this exciting and challenging field.Comment: 25 pages IEEE format, International Journal of Computer Science and Information Security, IJCSIS December 2009, ISSN 1947 5500, http://sites.google.com/site/ijcsis
    • …
    corecore