Search CORE

343 research outputs found

High level speaker specific features modeling in automatic speaker recognition system

Author: Singh Satyanand
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2020
Field of study

Spoken words convey several levels of information. At the primary level, the speech conveys words or spoken messages, but at the secondary level, the speech also reveals information about the speakers. This work is based on the high-level speaker-specific features on statistical speaker modeling techniques that express the characteristic sound of the human voice. Using Hidden Markov model (HMM), Gaussian mixture model (GMM), and Linear Discriminant Analysis (LDA) models build Automatic Speaker Recognition (ASR) system that are computational inexpensive can recognize speakers regardless of what is said. The performance of the ASR system is evaluated for clear speech to a wide range of speech quality using a standard TIMIT speech corpus. The ASR efficiency of HMM, GMM, and LDA based modeling technique are 98.8%, 99.1%, and 98.6% and Equal Error Rate (EER) is 4.5%, 4.4% and 4.55% respectively. The EER improvement of GMM modeling technique based ASR systemcompared with HMM and LDA is 4.25% and 8.51% respectively

Institute of Advanced Engineering and Science

Audio-Visual Automatic Speech Recognition Towards Education for Disabilities

Author: Debnath Saswati
González-Crespo Rubén
Namasudra Suyel
Roy Pinki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/04/2023
Field of study

Education is a fundamental right that enriches everyone’s life. However, physically challenged people often debar from the general and advanced education system. Audio-Visual Automatic Speech Recognition (AV-ASR) based system is useful to improve the education of physically challenged people by providing hands-free computing. They can communicate to the learning system through AV-ASR. However, it is challenging to trace the lip correctly for visual modality. Thus, this paper addresses the appearance-based visual feature along with the co-occurrence statistical measure for visual speech recognition. Local Binary Pattern-Three Orthogonal Planes (LBP-TOP) and Grey-Level Co-occurrence Matrix (GLCM) is proposed for visual speech information. The experimental results show that the proposed system achieves 76.60 % accuracy for visual speech and 96.00 % accuracy for audio speech recognition

Re-UNIR

Acoustic Approaches to Gender and Accent Identification

Author: DeMarco Andrea
Publication venue
Publication date: 01/06/2015
Field of study

There has been considerable research on the problems of speaker and language recognition from samples of speech. A less researched problem is that of accent recognition. Although this is a similar problem to language identification, di�erent accents of a language exhibit more fine-grained di�erences between classes than languages. This presents a tougher problem for traditional classification techniques. In this thesis, we propose and evaluate a number of techniques for gender and accent classification. These techniques are novel modifications and extensions to state of the art algorithms, and they result in enhanced performance on gender and accent recognition. The first part of the thesis focuses on the problem of gender identification, and presents a technique that gives improved performance in situations where training and test conditions are mismatched. The bulk of this thesis is concerned with the application of the i-Vector technique to accent identification, which is the most successful approach to acoustic classification to have emerged in recent years. We show that it is possible to achieve high accuracy accent identification without reliance on transcriptions and without utilising phoneme recognition algorithms. The thesis describes various stages in the development of i-Vector based accent classification that improve the standard approaches usually applied for speaker or language identification, which are insu�cient. We demonstrate that very good accent identification performance is possible with acoustic methods by considering di�erent i-Vector projections, frontend parameters, i-Vector configuration parameters, and an optimised fusion of the resulting i-Vector classifiers we can obtain from the same data. We claim to have achieved the best accent identification performance on the test corpus for acoustic methods, with up to 90% identification rate. This performance is even better than previously reported acoustic-phonotactic based systems on the same corpus, and is very close to performance obtained via transcription based accent identification. Finally, we demonstrate that the utilization of our techniques for speech recognition purposes leads to considerably lower word error rates. Keywords: Accent Identification, Gender Identification, Speaker Identification, Gaussian Mixture Model, Support Vector Machine, i-Vector, Factor Analysis, Feature Extraction, British English, Prosody, Speech Recognition

University of East Anglia digital repository

An Analysis of Facial Expression Recognition Techniques

Author: Vandana Patidar, Dr. Devang Pandya, Asst. Prof. Dharmesh Tank
Publication venue: Auricle Global Society of Education and Research
Publication date: 30/11/2017
Field of study

In present era of technology , we need applications which could be easy to use and are user-friendly , that even people with specific disabilities use them easily. Facial Expression Recognition has vital role and challenges in communities of computer vision, pattern recognition which provide much more attention due to potential application in many areas such as human machine interaction, surveillance , robotics , driver safety, non- verbal communication, entertainment, health- care and psychology study. Facial Expression Recognition has major importance ration in face recognition for significant image applications understanding and analysis. There are many algorithms have been implemented on different static (uniform background, identical poses, similar illuminations ) and dynamic (position variation, partial occlusion orientation, varying lighting )conditions. In general way face expression recognition consist of three main steps first is face detection then feature Extraction and at last classification. In this survey paper we discussed different types of facial expression recognition techniques and various methods which is used by them and their performance measures

International Journal on Future Revolution in Computer Science & Communication Engineering

Statistical Methods for Signal Processing with Application to Automatic Accent Recognition

Author: Ma Zichen
Publication venue: RIT Scholar Works
Publication date: 01/12/2014
Field of study

The problem of classification of people based on their phonetic features of accents is posted. This thesis intends to construct an automatic accent recognition machine that can accomplish this classification task with a decent accuracy. The machine consists of two crucial steps, feature extraction and pattern recognition. In the thesis, we review and explore multiple techniques of both steps in great detail. Specifically, in terms of feature extraction, we explore the techniques of principal component analysis and cepstral analysis, and in terms of pattern recognition, we explore the algorithms of discriminant function, support vector machine, and k-nearest neighbors. Since signal data usually exhibit the feature of High Dimension Low Sample Size, it is crucial in the automatic accent recognition task to reduce the dimensionality. Two studies are constructed in which speech signals are collected and a binary classification of American English accent and non-American English accent is performed. In the first study, a total of 330 speech signals, without the disturbance of noise, of an average dimensionality of 44050 are classified into two categories. In the time domain, the dimensionality is reduced to 250 using principal component analysis. Although the in-sample prediction shows an optimistic accuracy of over 90%, the out-of-sample prediction accuracy using cross-validation is as low as 60%. Alternatively, a feature extraction technique in the frequency domain, cepstral analysis, is implemented instead of principal component analysis, by which a special type of feature called mel-frequency cepstral coefficients is extracted and the dimensionality is reduced to some values between 12 and 39. The out-of-sample prediction accuracy can be as high as around 95%. Although cepstral analysis demonstrates itself as a powerful tool in accent recognition, through asecond study we further show that it may quickly fail when there is evident amount of noise in the signal. The prediction performance is reduced to 80% or lower, depending on the amplitude of the noise and the length of the signals

RIT Scholar Works

Wavelet-based acoustic detection of moving vehicles

Author: A. Averbuch
A.Z. Averbuch
Alon Schclar
Amir Averbuch
D. Donoho
D. Donoho
E. Candes
H. Wu
H.C. Choe
K.B. Eom
L. Breiman
L. Sirovich
Neta Rabin
Valery A. Zheludev
W.V. Wickerhauser
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref