243 research outputs found

    Acoustic Echo and Noise Cancellation System for Hand-Free Telecommunication using Variable Step Size Algorithms

    Get PDF
    In this paper, acoustic echo cancellation with doubletalk detection system is implemented for a hand-free telecommunication system using Matlab. Here adaptive noise canceller with blind source separation (ANC-BSS) system is proposed to remove both background noise and far-end speaker echo signal in presence of double-talk. During the absence of double-talk, far-end speaker echo signal is cancelled by adaptive echo canceller. Both adaptive noise canceller and adaptive echo canceller are implemented using LMS, NLMS, VSLMS and VSNLMS algorithms. The normalized cross-correlation method is used for double-talk detection. VSNLMS has shown its superiority over all other algorithms both for double-talk and in absence of double-talk. During the absence of double-talk it shows its superiority in terms of increment in ERLE and decrement in misalignment. In presence of double-talk, it shows improvement in SNR of near-end speaker signal

    Speaker Identification System for Hindi And Marathi Languages using Wavelet and Support Vector Machine

    Get PDF
    In this paper, a speaker identification system using speech processing for Hindi and Marathi languages is developed. Database of common words between Hindi and Marathi languages whose script is common but pronunciation is different is created. Here feature extraction is performed by using Wavelet Packet Decomposition (WPD) and classification is performed by using Support Vector Machine (SVM). As compared to the conventional feature extraction techniques wavelet transform is very much suitable for processing speech signals which are non-stationary in nature because of its efficient time frequency localizations and multi-resolution characteristics. Also SVM is well suitable for addressing speaker identification task. Recognition accuracy of 99.77% is obtained whereas real time recognition accuracy of 84.66% is obtained in identical condition using this hybrid architecture of WPD and SVM. In noisy conditions recognition accuracy of 60% is obtained. DOI: 10.17762/ijritcc2321-8169.16049

    Wavelet Based Feature Extraction for The Indonesian CV Syllables Sound

    Get PDF
    This paper proposes the combined methods of Wavelet Transform (WT) and Euclidean Distance (ED) to estimate the expected value of the possibly feature vector of Indonesian syllables. This research aims to find the best properties in effectiveness and efficiency on performing feature extraction of each syllable sound to be applied in the speech recognition systems. This proposed approach which is the state-of-the-art of the previous study consist of three main phase. In the first phase, the speech signal is segmented and normalized. In the second phase, the signal is transformed into frequency domain by using the WT. In the third phase, to estimate the expected feature vector, the ED algorithm is used. Th e result shows the list of features of each syllables can be used for the next research, and some recommendations on the most effective and efficient WT to be used in performing syllable sound recognition

    A Novel Approach for Multilingual Speech Recognition with Back Propagation Artificial Neural Network

    Get PDF
    “Speech Recognition” of audio signal is important for telecommunication, language identification and speaker verification. Robust Speech Recognition can be applied to automation of houses, offices and telecommunication services. In this paper Speech Recognition & Language Identification have done for Bengali, Chhattisgarhi, English and Hindi speech signals. The Bengali, Chhattisgarhi, English, Hindi speech signals are “Ekhone Tumi Jao”, “Ae Bar Teha Ja”, “Now This Time You Go” and “Ab Is Bar tum Jao” respectively. This method is mainly applied in two phases, in the first phase Speech Recognition and Language identification have done with Back Propagation Artificial neural Network (BPANN) and in the second phase Speech Recognition and Language Identification have done with the combination of the Particle Swarm Optimization (PSO) feature selection technique and BPANN. For the feature extraction Mel Frequency Cepstral Coefficients (MFCC) & Linear Predictive Coding (LPC) is used. MFCC and LPC are the most widely used feature extraction method. BPANN is a feed forward type neural network, it can trace back the error signal for weight modification, error signal generates when the actual output value differs from the target output value. The system accuracy and performance is measured on the basis of “Recognition Rate” and amount of error. Multilingual Speech Recognition and Language Identification with PSO feature selection technique gives the better Recognition Rate as compare to the without PSO feature selection technique

    COMPARISON OF FIVE CLASSIFIERS FOR CLASSIFICATION OF SYLLABLES SOUND USING TIME-FREQUENCY FEATURES

    Get PDF
    In a speech recognition and classification system, the step of determining the suitable and reliable classifier is essential in order to obtain optimal classification result. This paper presents Indonesian syllables sound classification by a C4.5 decision tree, a Naive Bayes classifier, a Sequential Minimal Optimization (SMO) algorithm, a Random Forest decision tree, and a Multi-Layer Perceptron (MLP) for classifying twelve classes of syllables. This research applies five different features set, those are combination features of Discrete Wavelet Transform (DWT) with statistical denoted as WS, the Renyi Entropy (RE) features, the combination of Autoregressive Power Spectral Density (AR-PSD) and Statistical denoted as PSDS, the combination of PSDS and the selected features of RE by using Correlation-Based Feature Selection (CFS) denoted as RPSDS, and the combination of DWT, RE, and AR-PSD denoted as WRPSDS. The results show that the classifier of MLP has the highest performance when it is combined with WRPSDS

    Speaker gender recognition system

    Get PDF
    Abstract. Automatic gender recognition through speech is one of the fundamental mechanisms in human-machine interaction. Typical application areas of this technology range from gender-targeted advertising to gender-specific IoT (Internet of Things) applications. It can also be used to narrow down the scope of investigations in crime scenarios. There are many possible methods of recognizing the gender of a speaker. In machine learning applications, the first step is to acquire and convert the natural human voice into a form of machine understandable signal. Useful voice features then could be extracted and labelled with gender information so that are then trained by machines. After that, new input voice can be captured and processed and the machine is able to extract the features by pattern modelling. In this thesis, a real-time speaker gender recognition system was designed within Matlab environment. This system could automatically identify the gender of a speaker by voice. The implementation work utilized voice processing and feature extraction techniques to deal with an input speech coming from a microphone or a recorded speech file. The response features are extracted and classified. Then the machine learning classification method (Naïve Bayes Classifier) is used to distinguish the gender features. The recognition result with gender information is then finally displayed. The evaluation of the speaker gender recognition systems was done in an experiment with 40 participants (half male and half female) in a quite small room. The experiment recorded 400 speech samples by speakers from 16 countries in 17 languages. These 400 speech samples were tested by the gender recognition system and showed a considerably good performance, with only 29 errors of recognition (92.75% accuracy). In comparison with previous speaker gender recognition systems, most of them obtained the accuracy no more than 90% and only one obtained 100% accuracy with very limited testers. We can then conclude that the performance of the speaker gender recognition system designed in this thesis is reliable
    corecore