Search CORE

564 research outputs found

A REVIEW ON VOICE ACTIVITY DETECTION AND MEL-FREQUENCY CEPSTRAL COEFFICIENTS FOR SPEAKER RECOGNITION (TREND ANALYSIS)

Author: MAHALAKSHMI P.
Publication venue: 'Innovare Academic Sciences Pvt Ltd'
Publication date: 01/12/2016
Field of study

ABSTRACTObjective: The objective of this review article is to give a complete review of various techniques that are used for speech recognition purposes overtwo decades.Methods: VAD-Voice Activity Detection, SAD-Speech Activity Detection techniques are discussed that are used to distinguish voiced from unvoicedsignals and MFCC- Mel Frequency Cepstral Coefficient technique is discussed which detects specific features.Results: The review results show that research in MFCC has been dominant in signal processing in comparison to VAD and other existing techniques.Conclusion: A comparison of different speaker recognition techniques that were used previously were discussed and those in current research werealso discussed and a clear idea of the better technique was identified through the review of multiple literature for over two decades.Keywords: Cepstral analysis, Mel-frequency cepstral coefficients, signal processing, speaker recognition, voice activity detection

Innovare Academic Sciences: E-Journals

Analysis and Implementation of Speech Recognition System using ARM7 Processor

Author: Vijayan Nishiya
Publication venue: The International Institute for Science, Technology and Education (IISTE)
Publication date: 31/12/2014
Field of study

This paper introduces implementation and analysis of speech recognition system. Speech Recognition is the process of automatically recognizing a certain word spoken by a particular speaker based on individual information included in speech waves. This paper presents one of the techniques to extract the feature set from a speech signal, which can be used in speech recognition systems and an analysis study has been performed. A wide range of possibilities exist for parametrically representing the speech signal for the speaker recognition task, such as Linear Prediction Coding (LPC), Mel-Frequency Cepstrum Coefficients (MFCC),and others. Studies and experiments show that MFCC provides better results than LPC. Here vector quantization is used to increase speech recognition accuracy. Experiments shows that as the no. of MFCC coefficients increases get better accuracy, code book size also affects accuracy. The MFCC and VQ algorithm, for speech recognition have been implemented in MATLAB 7.7(R2008b) version on Windows7 platform. The control circuitry has been implemented in Keil µVision3; the supporting hardware setup is being implemented. Keywords: Speech Recognition; MFCC; Vector Quantization; LP

International Institute for Science, Technology and Education (IISTE): E-Journals

Study of Speaker Recognition Systems

Author: Panda Ashish Kumar
Sahoo Amit Kumar
Publication venue
Publication date: 12/05/2011
Field of study

Speaker Recognition is the computing task of validating a user’s claimed identity using characteristics extracted from their voices. This technique is one of the most useful and popular biometric recognition techniques in the world especially related to areas in which security is a major concern. It can be used for authentication, surveillance, forensic speaker recognition and a number of related activities. Speaker recognition can be classified into identification and verification. Speaker identification is the process of determining which registered speaker provides a given utterance. Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. The process of Speaker recognition consists of 2 modules namely: - feature extraction and feature matching. Feature extraction is the process in which we extract a small amount of data from the voice signal that can later be used to represent each speaker. Feature matching involves identification of the unknown speaker by comparing the extracted features from his/her voice input with the ones from a set of known speakers. Our proposed work consists of truncating a recorded voice signal, framing it, passing it through a window function, calculating the Short Term FFT, extracting its features and matching it with a stored template. Cepstral Coefficient Calculation and Mel frequency Cepstral Coefficients (MFCC) are applied for feature extraction purpose. VQLBG (Vector Quantization via Linde-Buzo-Gray), DTW (Dynamic Time Warping) and GMM (Gaussian Mixture Modelling) algorithms are used for generating template and feature matching purpose

ethesis@nitr

Real time speaker recognition using MFCC and VQ

Author: G Arun Rajsekhar
Publication venue
Publication date: 01/01/2008
Field of study

Speaker Recognition is a process of automatically recognizing who is speaking on the basis of the individual information included in speech waves. Speaker Recognition is one of the most useful biometric recognition techniques in this world where insecurity is a major threat. Many organizations like banks, institutions, industries etc are currently using this technology for providing greater security to their vast databases.Speaker Recognition mainly involves two modules namely feature extraction and feature matching. Feature extraction is the process that extracts a small amount of data from the speaker’s voice signal that can later be used to represent that speaker. Feature matching involves the actual procedure to identify the unknown speaker by comparing the extracted features from his/her voice input with the ones that are already stored in our speech database.In feature extraction we find the Mel Frequency Cepstrum Coefficients, which are based on the known variation of the human ear’s critical bandwidths with frequency and these, are vector quantized using LBG algorithm resulting in the speaker specific codebook. In feature matching we find the VQ distortion between the input utterance of an unknown speaker and the codebooks stored in our database. Based on this VQ distortion we decide whether to accept/reject the unknown speaker’s identity. The system I implemented in my work is 80% accurate in recognizing the correct speaker.In second phase we implement on the acoustic of Real Time speaker ecognition using mfcc and vq on a TMS320C6713 DSP board. We analyze the workload and identify the most timeconsuming operations

ethesis@nitr

Text-Independent, Open-Set Speaker Recognition

Author: Pellissier Stephen V.
Publication venue: AFIT Scholar
Publication date: 01/03/1996
Field of study

Speaker recognition, like other biometric personal identification techniques, depends upon a person\u27s intrinsic characteristics. A realistically viable system must be capable of dealing with the open-set task. This effort attacks the open-set task, identifying the best features to use, and proposes the use of a fuzzy classifier followed by hypothesis testing as a model for text-independent, open-set speaker recognition. Using the TIMIT corpus and Rome Laboratory\u27s GREENFLAG tactical communications corpus, this thesis demonstrates that the proposed system succeeded in open-set speaker recognition. Considering the fact that extremely short utterances were used to train the system (compared to other closed-set speaker identification work), this system attained reasonable open-set classification error rates as low as 23% for TIMIT and 26% for GREENFLAG. Feature analysis identified the filtered linear prediction cepstral coefficients with or without the normalized log energy or pitch appended as a robust feature set (based on the 17 feature sets considered), well suited for clean speech and speech degraded by tactical communications channels

AFTI Scholar (Air Force Institute of Technology)

Survey of Features Extraction and Classification Techniques for Speaker Identification

Author: Al-Sultan Ali Yakoob
Kadhum Sahar Adil
Muslim Ahmed Badri
Publication venue: 'University of Babylon - Physical Education and Sports Sciences'
Publication date: 15/12/2020
Field of study

تكسب تقنيات معالجة الكلام شيوعًا اكثر يومًا بعد يوم لتوفير قدر هائل من الأمان.كما يشيع استخدام الكلام لغرض التوثيق. التعرف على المتكلم هو الطريقة التي يمكن من خلالها فحص المتكلم والتعرف عليه. يختلف نظام التعرف على الكلام عن طريقة التعرف على المتكلم. يشيع استخدام التعرف على المتكلمين في القطاعات والمستشفيات والمختبرات وما إلى ذلك. فوائده أكثر أمانًا وأسهل في التنفيذ وأكثر سهولة في الاستخدام. تعد طريقة تحديد المتكلم واحدة من أكثر التقنيات شيوعًا في المنطقة حيث تعتبر السلامة أمرًا بالغ الأهمية. تقدم هذه المقالة نظرة عامة على الطرق المختلفة التي يمكن استخدامها للتعرف على المتكلمين مثل الترميز الخطي التنبؤي (LPC) ، معاملات الطيف التنبؤية الخطية (LPCC) ، التحويل الحقيقي الفريد المعين (UMRT) ، معاملات Cepstral الحقيقية (RCC) ، "تردد ميل Cepstrum" (MFCC).   بالإضافة إلى مجموعة من المصنفات المختلفة مثل "نموذج الخليط الغاوسي (GMM)"، "تزييف الوقت الديناميكي (DTW)" ، آلة المتجهات الداعمة (SVM) ، الشبكة العصبية (NN) ، "تكميم المتجهات" (VQ). الغرض الأساسي من شرح طرق التعرف على السماعات الشائعة. النتائج التي تم الحصول عليها هي أنه تم اختيار MFCC لكفاءة عالية ومنخفضة التعقيد. و GMM مفيد في تصنيف ذاكرة أقل ونتائج تخطيط واختبار أقل.Speech processing is more common day by day to provide enormous safety. The speech for the purpose of authentication is commonly used. Recognition of the speaker is the method that can check and recognize the speaker. The scheme of speech recognition is distinct from the scheme of speaker recognition. Recognition of speakers is commonly used in sectors, hospitals, laboratories, etc. Its benefits are safer, easier to implement, more user-friendly. Speaker identification method is one of the most commonly used techniques for the region where safety is very crucial. This article presents an overview of various methods that can be used to recognize speakers’ systems, the feature extraction techniques such as Linear Predictive Coding (LPC), Linear Predictive Cepstral Coefficients (LPCC), Unique Mapped Real Transform (UMRT), Real Cepstral Coefficients (RCC), “Mel-frequency Cepstrum” (MFCC), in addition to  various classification techniques such as “Gaussian mixture model (GMM)”, “Dynamic Time Warping (DTW)”, Support Vector Machine (SVM), Neural Network (NN), “Vector Quantization” (VQ). The primary purpose of is to explain the common speaker recognition methods. The obtained results are that, MFCC is chosen for high efficiency and low complexity. and GMM is helpful in classifying less memory and less planning and efficient test results

Journals of University of Babylon

Percepcijska utemeljenost kepstranih mjera udaljenosti za primjene u obradi govora

Author: Antonio Vasilijević
Davor Petrinović
Publication venue: KoREMA - Croatian Society for Communications, Computing, Electronics, Measurement and Control
Publication date: 01/01/2011
Field of study

Currently, one of the most widely used distance measures in speech and speaker recognition is the Euclidean distance between mel frequency cepstral coefﬁcients (MFCC). MFCCs are based on ﬁlter bank algorithm whose ﬁlters are equally spaced on a perceptually motivated mel frequency scale. The value of mel cepstral vector, as well as the properties of the corresponding cepstral distance, are determined by several parameters used in mel cepstral analysis. The aim of this work is to examine compatibility of MFCC measure with human perception for different values of parameters in the analysis. By analysing mel ﬁlter bank parameters it is found that ﬁlter bank with 24 bands, 220 mels bandwidth and band overlap coefﬁcient equal and higher than one gives optimal spectral distortion (SD) distance measures. For this kind of mel ﬁlter bank, the difference between vowels can be recognised for full-length mel cepstral SD RMS measure higher than 0.4 - 0.5 dB. Further on, we will show that usage of truncated mel cepstral vector (12 coefﬁcients) is justiﬁed for speech recognition, but may be arguable for speaker recognition. We also analysed the impact of aliasing in cepstral domain on cepstral distortion measures. The results showed high correlation of SD distances calculated from aperiodic and periodic mel cepstrum, leading to the conclusion that the impact of aliasing is generally minor. There are rare exceptions where aliasing is present, and these were also analysed.Jedna od danas najčešće korištenih mjera u automatskom prepoznavanju govora i govornika je mjera euklidske udaljenosti MFCC vektora. Algoritam za izračunavanje mel frekvencijskih kepstralnih koeﬁcijenata zasniva se na ﬁltarskom slogu kod kojeg su pojasi ekvidistantno raspoređeni na percepcijski motiviranoj mel skali. Na vrijednost mel kepstralnog vektora, a samim time i na svojstva kepstralne mjere udaljenosti glasova, utječe veći broj parametara sustava za kepstralnu analizu. Tema ovog rada je ispitati usklađenost MFCC mjere sa stvarnim percepcijskim razlikama za različite vrijednosti parametara analize. Analizom parametara mel ﬁltarskog sloga utvrdili smo da ﬁltar sa 24 pojasa, širine 220 mel-a i faktorom preklapanja ﬁltra većim ili jednakim jedan, daje optimalne SD mjere koje se najbolje slažu s percepcijom. Za takav mel ﬁltarski slog granica čujnosti razlike između glasova je 0.4-0.5 dB, mjereno SD RMS razlikom potpunih mel kepstralnih vektora. Također, pokazat ćemo da je korištenje mel kepstralnog vektora odrezanog na konačnu dužinu (12 koeﬁcijenata) opravdano za prepoznavanje govora, ali da bi moglo biti upitno u primjenama prepoznavanja govornika. Analizirali smo i utjecaj preklapanja spektara u kepstralnoj domeni na mjere udaljenosti glasova. Utvrđena je izrazita koreliranost SD razlika izračunatih iz aperiodskog i periodičkog mel kepstra iz čega zaključujemo da je utjecaj preklapanja spektara generalno zanemariv. Postoje rijetke iznimke kod kojih je utjecaj preklapanja spektara prisutan, te su one posebno analizirane

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

A Review of Audio Features and Statistical Models Exploited for Voice Pattern Design

Author: Duong Hien-Thanh
Duong Ngoc Q. K.
Publication venue
Publication date: 24/02/2015
Field of study

Audio fingerprinting, also named as audio hashing, has been well-known as a powerful technique to perform audio identification and synchronization. It basically involves two major steps: fingerprint (voice pattern) design and matching search. While the first step concerns the derivation of a robust and compact audio signature, the second step usually requires knowledge about database and quick-search algorithms. Though this technique offers a wide range of real-world applications, to the best of the authors' knowledge, a comprehensive survey of existing algorithms appeared more than eight years ago. Thus, in this paper, we present a more up-to-date review and, for emphasizing on the audio signal processing aspect, we focus our state-of-the-art survey on the fingerprint design step for which various audio features and their tractable statistical models are discussed.Comment: http://www.iaria.org/conferences2015/PATTERNS15.html ; Seventh International Conferences on Pervasive Patterns and Applications (PATTERNS 2015), Mar 2015, Nice, Franc

arXiv.org e-Print Archive

Continuous kannada speech segmentation and speech recognition based on threshold using MFCC And VQ

Author: Gowda Vanajakshi Puttaswamy
Murugavelu Mathivanan
Thangamuthu Senthil Kumaran
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/12/2019
Field of study

Continuous speech segmentation and its recognition is playing important role in natural language processing. Continuous context based Kannada speech segmentation depends on context, grammer and semantics rules present in the kannada language. The significant feature extraction of kannada speech signal for recognition system is quite exciting for researchers. In this paper proposed method is divided into two parts. First part of the method is continuous kannada speech signal segmentation with respect to the context based is carried out by computing average short term energy and its spectral centroid coefficients of the speech signal present in the specified window. The segmented outputs are completely meaningful segmentation for different scenarios with less segmentation error. The second part of the method is speech recognition by extracting less number Mel frequency cepstral coefficients with less number of codebooks using vector quantization .In this recognition is completely based on threshold value.This threshold setting is a challenging task however the simple method is used to achieve better recognition rate.The experimental results shows more efficient and effective segmentation with high recognition rate for any continuous context based kannada speech signal with different accents for male and female than the existing methods and also used minimal feature dimensions for training data

Crossref

ZENODO

Institute of Advanced Engineering and Science