4 research outputs found

    Multilingual Speaker Identification using analysis of Pitch and Formant frequencies

    Get PDF
    In the modern digital automated world, speaker identification system plays a very important role in the field of fast growing internet based communications. In India there are many people who are bi-lingual or multilingual, so the requirements to design such system which is used to identify the multilingual speakers. Present paper explores the idea to identify multi-lingual person by basic features. For this the speech signals of three indian languages i.e Hindi, Marathi and Rajasthani are recorded and basic features pitch, first three formant frequency calculated from PRAAT software. The observation has been presented that the pitch and first three formant frequencies F1,F2 and F3 of speaker are increases when speaker change the language from rajasthani to hindi to marathi. The percentage deviation in pitch as well as formant frequencies for Rajasthani and Marathi from hindi are positive and negative respectively for utterance “p”. Similar analysis has been perform for ’k aand >. This observation will help to make such system which is used to identify the speaker in multilingual environments

    Capstrum Coefficient Features Analysis for Multilingual Speaker Identification System

    Get PDF
    The Capstrum coefficient features analysis plays a crucial role in the overall performance of the multilingual speaker identification system. The objective of the research work to investigates the results that can be obtained when you combine Mel-Frequency Cepstral Coefficients (MFCC) and Gammatone Frequency Cepstral Coefficients (GFCC) as feature components for the front-end processing of a multilingual speaker identification system. The MFCC and GFCC feature components combined are suggested to improve the reliability of a multilingual speaker identification system. The GFCC features in recent studies have shown very good robustness against noise and acoustic change. The main idea is to integrate MFCC & GFCC features to improve the overall multilingual speaker identification system performance. The experiment carried out on recently collected multilingual speaker speech database to analysis of GFCC and MFCC. The speech database consists of speech data recorded from 100 speakers including male and female. The speech samples are collected in three different languages Hindi, Marathi and Rajasthani. The extracted features of the speech signals of multiple languages are observed. The results provide an empirical comparison of the MFCC-GFCC combined features and the individual counterparts. The average language-independent multilingual speaker identification rate 84.66% (using MFCC), 93.22% (using GFCC)and 94.77% (using combined features)has been achieved

    Automatic Identity Recognition Using Speech Biometric

    Get PDF
    Biometric technology refers to the automatic identification of a person using physical or behavioral traits associated with him/her. This technology can be an excellent candidate for developing intelligent systems such as speaker identification, facial recognition, signature verification...etc. Biometric technology can be used to design and develop automatic identity recognition systems, which are highly demanded and can be used in banking systems, employee identification, immigration, e-commerce…etc. The first phase of this research emphasizes on the development of automatic identity recognizer using speech biometric technology based on Artificial Intelligence (AI) techniques provided in MATLAB. For our phase one, speech data is collected from 20 (10 male and 10 female) participants in order to develop the recognizer. The speech data include utterances recorded for the English language digits (0 to 9), where each participant recorded each digit 3 times, which resulted in a total of 600 utterances for all participants. For our phase two, speech data is collected from 100 (50 male and 50 female) participants in order to develop the recognizer. The speech data is divided into text-dependent and text-independent data, whereby each participant selected his/her full name and recorded it 30 times, which makes up the text-independent data. On the other hand, the text-dependent data is represented by a short Arabic language story that contains 16 sentences, whereby every sentence was recorded by every participant 5 times. As a result, this new corpus contains 3000 (30 utterances * 100 speakers) sound files that represent the text-independent data using their full names and 8000 (16 sentences * 5 utterances * 100 speakers) sound files that represent the text-dependent data using the short story. For the purpose of our phase one of developing the automatic identity recognizer using speech, the 600 utterances have undergone the feature extraction and feature classification phases. The speech-based automatic identity recognition system is based on the most dominating feature extraction technique, which is known as the Mel-Frequency Cepstral Coefficient (MFCC). For feature classification phase, the system is based on the Vector Quantization (VQ) algorithm. Based on our experimental results, the highest accuracy achieved is 76%. The experimental results have shown acceptable performance, but can be improved further in our phase two using larger speech data size and better performance classification techniques such as the Hidden Markov Model (HMM)
    corecore