60,775 research outputs found
Recognition of Isolated Words using Zernike and MFCC features for Audio Visual Speech Recognition
Automatic Speech Recognition (ASR) by machine is an attractive research topic
in signal processing domain and has attracted many researchers to contribute in
this area. In recent year, there have been many advances in automatic speech
reading system with the inclusion of audio and visual speech features to
recognize words under noisy conditions. The objective of audio-visual speech
recognition system is to improve recognition accuracy. In this paper we
computed visual features using Zernike moments and audio feature using Mel
Frequency Cepstral Coefficients (MFCC) on vVISWa (Visual Vocabulary of
Independent Standard Words) dataset which contains collection of isolated set
of city names of 10 speakers. The visual features were normalized and dimension
of features set was reduced by Principal Component Analysis (PCA) in order to
recognize the isolated word utterance on PCA space.The performance of
recognition of isolated words based on visual only and audio only features
results in 63.88 and 100 respectively
Improving Performance of Speaker Identification System Using Complementary Information Fusion
Feature extraction plays an important role as a front-end processing block in
speaker identification (SI) process. Most of the SI systems utilize like
Mel-Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP),
Linear Predictive Cepstral Coefficients (LPCC), as a feature for representing
speech signal. Their derivations are based on short term processing of speech
signal and they try to capture the vocal tract information ignoring the
contribution from the vocal cord. Vocal cord cues are equally important in SI
context, as the information like pitch frequency, phase in the residual signal,
etc could convey important speaker specific attributes and are complementary to
the information contained in spectral feature sets. In this paper we propose a
novel feature set extracted from the residual signal of LP modeling.
Higher-order statistical moments are used here to find the nonlinear
relationship in residual signal. To get the advantages of complementarity vocal
cord based decision score is fused with the vocal tract based score. The
experimental results on two public databases show that fused mode system
outperforms single spectral features.Comment: 6 Pages, 3 figure
i Vector used in Speaker Identification by Dimension Compactness
The automatic speaker identification procedure is used to extract features
that help to identify the components of the acoustic signal by discarding all
the other stuff like background noise, emotion, hesitation, etc. The acoustic
signal is generated by a human that is filtered by the shape of the vocal
tract, including tongue, teeth, etc. The shape of the vocal tract determines
and produced, what signal comes out in real time. The analytically develops
shape of the vocal tract, which exhibits envelop for the short time power
spectrum. The ASR needs efficient way of extracting features from the acoustic
signal that is used effectively to makes the shape of the individual vocal
tract. To identify any acoustic signal in the large collection of acoustic
signal i.e. corpora, it needs dimension compactness of total variability space
by using the GMM mean super vector. This work presents the efficient way to
implement dimension compactness in total variability space and using cosine
distance scoring to predict a fast output score for small size utterance.Comment: 6 pages,7 figure
Novel Cascaded Gaussian Mixture Model-Deep Neural Network Classifier for Speaker Identification in Emotional Talking Environments
This research is an effort to present an effective approach to enhance
text-independent speaker identification performance in emotional talking
environments based on novel classifier called cascaded Gaussian Mixture
Model-Deep Neural Network (GMM-DNN). Our current work focuses on proposing,
implementing and evaluating a new approach for speaker identification in
emotional talking environments based on cascaded Gaussian Mixture Model-Deep
Neural Network as a classifier. The results point out that the cascaded GMM-DNN
classifier improves speaker identification performance at various emotions
using two distinct speech databases: Emirati speech database (Arabic United
Arab Emirates dataset) and Speech Under Simulated and Actual Stress (SUSAS)
English dataset. The proposed classifier outperforms classical classifiers such
as Multilayer Perceptron (MLP) and Support Vector Machine (SVM) in each
dataset. Speaker identification performance that has been attained based on the
cascaded GMM-DNN is similar to that acquired from subjective assessment by
human listeners.Comment: 15 page
A text-independent speaker verification model: A comparative analysis
The most pressing challenge in the field of voice biometrics is selecting the
most efficient technique of speaker recognition. Every individual's voice is
peculiar, factors like physical differences in vocal organs, accent and
pronunciation contributes to the problem's complexity. In this paper, we
explore the various methods available in each block in the process of speaker
recognition with the objective to identify best of techniques that could be
used to get precise results. We study the results on text independent corpora.
We use MFCC (Melfrequency cepstral coefficient), LPCC (linear predictive
cepstral coefficient) and PLP (perceptual linear prediction) algorithms for
feature extraction, PCA (Principal Component Analysis) and tSNE for
dimensionality reduction and SVM (Support Vector Machine), feed forward,
nearest neighbor and decision tree algorithms for classification block in
speaker recognition system and comparatively analyze each block to determine
the best techniqueComment: presented and accepted by 2017 International Conference on
Intelligent Computing and Control (I2C2
Intelligent System for Speaker Identification using Lip features with PCA and ICA
Biometric authentication techniques are more consistent and efficient than
conventional authentication techniques and can be used in monitoring,
transaction authentication, information retrieval, access control, forensics,
etc. In this paper, we have presented a detailed comparative analysis between
Principle Component Analysis (PCA) and Independent Component Analysis (ICA)
which are used for feature extraction on the basis of different Artificial
Neural Network (ANN) such as Back Propagation (BP), Radial Basis Function (RBF)
and Learning Vector Quantization (LVQ). In this paper, we have chosen "TULIPS1
database, (Movellan, 1995)" which is a small audiovisual database of 12
subjects saying the first 4 digits in English for the incorporation of above
methods. The six geometric lip features i.e. height of the outer corners of the
mouth, width of the outer corners of the mouth, height of the inner corners of
the mouth, width of the inner corners of the mouth, height of the upper lip,
and height of the lower lip which extracts the identity relevant information
are considered for the research work. After the comprehensive analysis and
evaluation a maximum of 91.07% accuracy in speaker recognition is achieved
using PCA and RBF and 87.36% accuracy is achieved using ICA and RBF. Speaker
identification has a wide scope of applications such as access control,
monitoring, transaction authentication, information retrieval, forensics, etc.Comment: https://sites.google.com/site/journalofcomputing
Group Component Analysis for Multiblock Data: Common and Individual Feature Extraction
Very often data we encounter in practice is a collection of matrices rather
than a single matrix. These multi-block data are naturally linked and hence
often share some common features and at the same time they have their own
individual features, due to the background in which they are measured and
collected. In this study we proposed a new scheme of common and individual
feature analysis (CIFA) that processes multi-block data in a linked way aiming
at discovering and separating their common and individual features. According
to whether the number of common features is given or not, two efficient
algorithms were proposed to extract the common basis which is shared by all
data. Then feature extraction is performed on the common and the individual
spaces separately by incorporating the techniques such as dimensionality
reduction and blind source separation. We also discussed how the proposed CIFA
can significantly improve the performance of classification and clustering
tasks by exploiting common and individual features of samples respectively. Our
experimental results show some encouraging features of the proposed methods in
comparison to the state-of-the-art methods on synthetic and real data.Comment: 13 pages,11 figure
Feature Selection and Feature Extraction in Pattern Analysis: A Literature Review
Pattern analysis often requires a pre-processing stage for extracting or
selecting features in order to help the classification, prediction, or
clustering stage discriminate or represent the data in a better way. The reason
for this requirement is that the raw data are complex and difficult to process
without extracting or selecting appropriate features beforehand. This paper
reviews theory and motivation of different common methods of feature selection
and extraction and introduces some of their applications. Some numerical
implementations are also shown for these methods. Finally, the methods in
feature selection and extraction are compared.Comment: 14 pages, 1 figure, 2 tables, survey (literature review) pape
Voice Activity Detection: Merging Source and Filter-based Information
Voice Activity Detection (VAD) refers to the problem of distinguishing speech
segments from background noise. Numerous approaches have been proposed for this
purpose. Some are based on features derived from the power spectral density,
others exploit the periodicity of the signal. The goal of this paper is to
investigate the joint use of source and filter-based features. Interestingly, a
mutual information-based assessment shows superior discrimination power for the
source-related features, especially the proposed ones. The features are further
the input of an artificial neural network-based classifier trained on a
multi-condition database. Two strategies are proposed to merge source and
filter information: feature and decision fusion. Our experiments indicate an
absolute reduction of 3% of the equal error rate when using decision fusion.
The final proposed system is compared to four state-of-the-art methods on 150
minutes of data recorded in real environments. Thanks to the robustness of its
source-related features, its multi-condition training and its efficient
information fusion, the proposed system yields over the best state-of-the-art
VAD a substantial increase of accuracy across all conditions (24% absolute on
average)
Speech Recognition by Machine, A Review
This paper presents a brief survey on Automatic Speech Recognition and
discusses the major themes and advances made in the past 60 years of research,
so as to provide a technological perspective and an appreciation of the
fundamental progress that has been accomplished in this important area of
speech communication. After years of research and development the accuracy of
automatic speech recognition remains one of the important research challenges
(e.g., variations of the context, speakers, and environment).The design of
Speech Recognition system requires careful attentions to the following issues:
Definition of various types of speech classes, speech representation, feature
extraction techniques, speech classifiers, database and performance evaluation.
The problems that are existing in ASR and the various techniques to solve these
problems constructed by various research workers have been presented in a
chronological order. Hence authors hope that this work shall be a contribution
in the area of speech recognition. The objective of this review paper is to
summarize and compare some of the well known methods used in various stages of
speech recognition system and identify research topic and applications which
are at the forefront of this exciting and challenging field.Comment: 25 pages IEEE format, International Journal of Computer Science and
Information Security, IJCSIS December 2009, ISSN 1947 5500,
http://sites.google.com/site/ijcsis
- …