13 research outputs found

    PENGENALAN PERINTAH SUARA BERBASIS HYBRID DEEP LEARNING DENGAN METODE EKSTRAKSI CIRI BERBASIS POWER-LAW

    Get PDF
    Masalah ketahanan derau masih menjadi hal yang menantang pada sistem pengenalan suara, meskipun kemajuan teknologi deep learning telah digunakan. Kehadiran derau dapat menyebabkan ketidaksesuaian antara pelatihan yang dilakukan dalam kondisi bersih dan kondisi pengujian yang bising. Model deep learning yang banyak digunakan pada pengenalan suara hanya melibatkan model tunggal yang memiliki kemampuan belajar terbatas. Selain itu, fitur berbasis fungsi logaritmik menjadi fitur standar pada banyak sistem pengenalan ucapan. Namun, keandalannya terhadap derau telah menjadi masalah utama. Dalam penelitian ini, penggunaan hybrid deep learning dan ekstraksi ciri berbasis power-law diusulkan. Power-law dapat memberikan kompresi yang lebih baik di daerah berenergi rendah sehingga tidak sensitif ketika sinyal suara terdistorsi oleh derau. Fitur tersebut diimplementasikan pada model dengan menggabungkan dua algoritma deep learning secara paralel, yang selanjutnya disebut dengan hybrid deep learning. Eksperimen ini menggunakan Speech Command Dataset yang disediakan oleh TensorFlow dan dicampur dengan berbagai derau. Hasil eksperimen menunjukkan bahwa penerapan hybrid deep learning dan power-law memperoleh akurasi 84,82% hingga 89,16% dalam hal mengklasifikasi suara berderau. ***** The problem of noise robustness is still a challenge for speech recognition systems, even though advances in deep learning technology have been used. The presence of noise may cause a mismatch between training, which is performed in clean conditions, and noisy testing conditions. The deep learning model that is widely used in voice recognition only involves a single model that has limited learning ability. In addition, features based on logarithmic functions are becoming a standard feature in many speech recognition systems. However, its noise robustness has been a major problem. In this study, the use of hybrid deep learning and power-law based feature extraction is proposed. The power law can provide better compression in low-energy regions so that it is not sensitive when the speech signal is distorted by noise. This feature is implemented in the model by combining two deep learning algorithms in parallel, hereinafter named to as hybrid deep learning. This experiment uses the Speech Command Dataset provided by TensorFlow and mixed with various noises. The experimental results show that the application of hybrid deep learning and power-law obtains an accuracy of 84.82% to 89.16% in terms of classifying noisy sounds

    Порівняння ефективності класифікаторів машинного навчання у контексті голосової біометрії

    Get PDF
    The purpose of this work was to compare the seven popular classifiers of scikit-learn python-based library in the context of the performance of the voice biometrics system. The MFCCs (Mel-Frequency Cepstral Coefficients) method was used to compute the feature vectors of the person's voice undergoing verification. The classifiers involved in this study are the following: K-NN (K-Nearest neighbors classifier), MLP (Multilayer perceptron), SVM (Support vector machine), DTC (Decision tree classifier), GNB (Gaussian Naive Bayes classifier), ABC (AdaBoost classifier), RFC (Random forest classifier). As the data, we used voice samples from 40 individuals with an average duration of 9 minutes per person. The performance criteria of the classifiers were dictated by the needs of voice biometrics systems. Thus, in the framework of this work, the fraud simulation was conducted during authentication. The most effective in voice recognition was the K-NN classifier, which, with zero number of incorrectly admitted persons, provided 3-85% better accuracy of verification than other classifiers

    Identification of Age Voiceprint Using Machine Learning Algorithms

    Get PDF
    The voice is considered a biometric trait since we can extract information from the speech signal that allows us to identify the person speaking in a specific recording. Fingerprints, iris, DNA, or speech can be used in biometric systems, with speech being the most intuitive, basic, and easy to create characteristic. Speech-based services are widely used in the banking and mobile sectors, although these services do not employ voice recognition to identify consumers. As a result, the possibility of using these services under a fake name is always there. To reduce the possibility of fraudulent identification, voice-based recognition systems must be designed. In this research, Mel Frequency Cepstral Coefficients (MFCC) characteristics were retrieved from the gathered voice samples to train five different machine learning algorithms, namely, the decision tree, random forest (RF), support vector machines (SVM), closest neighbor (k-NN), and multi-layer sensor (MLP). Accuracy, precision, recall, specificity, and F1 score were used as classification performance metrics to compare these algorithms. According to the findings of the study, the MLP approach had a high classification accuracy of 91%. In addition, it seems that RF performs better than other measurements. This finding demonstrates how these categorization algorithms may assist voice-based biometric systems

    Preprocessing signal for Speech Emotion Recognition

    Get PDF
    Abstract: In this paper, we introduce and study preprocessing signal for speech emotion recognition. The aim of our work is to get pure signal which is created by sampling the signal from speaker. The discrimination between speech and music waves was achieved. A good signal is obtained by using preprocessing then it used for feature extraction. The files we used in this paper are wave-type for male, female and music have sample rate 48000, bit resolution is 16-bits and Mono channel. The Berlin dataset and RAVDESS dataset are used in this work

    Diagnosing Localized and Distributed Bearing Faults by Bearing Noise Signal Using Machine Learning and Kurstogram

    Get PDF
    Bearings are a common component and crucial to most rotating machinery. Their failures are the causes for more than half of the total machine failures, each with the potential to cause extreme damage, injury, and downtime. Therefore, fault detection through condition monitoring has a significant importance. Since the initial cost of standard condition monitoring techniques such as vibration signature analysis is high and has a long payback period, the condition monitoring via audio signal processing is proposed for both localized faults and distributed/ generalized roughness faults in the rolling bearing. It is not appropriate to analyze bearing faults using Fast Fourier Transform (FFT) of the noise signal of bearing since localized faults are Amplitude Modulated (AM) and mixed up with background noises. Localized faults are processed using Kurstogram technique for finding the appropriate filtering band because localized faulty bearings produce impulsive signal

    Arabic digits speech recognition and speaker identification in noisy environment using a hybrid model of VQ and GMM

    Get PDF
    This paper presents an automatic speaker identification and speech recognition for Arabic digits in noisy environment. In this work, the proposed system is able to identify the speaker after saving his voice in the database and adding noise. The mel frequency cepstral coefficients (MFCC) is the best approach used in building a program in the Matlab platform; also, the quantization is used for generating the codebooks. The Gaussian mixture modelling (GMM) algorithms are used to generate template, feature-matching purpose. In this paper, we have proposed a system based on MFCC-GMM and MFCC-VQ Approaches on the one hand and by using the Hybrid Approach MFCC-VQ-GMM on the other hand for speaker modeling. The White Gaussian noise is added to the clean speech at several signal-to-noise ratio (SNR) levels to test the system in a noisy environment. The proposed system gives good results in recognition rate

    A new system to detect coronavirus social distance violation

    Get PDF
    In this paper, a novel solution to avoid new infections is presented. Instead of tracing users’ locations, the presence of individuals is detected by analysing the voices, and people’s faces are detected by the camera. To do this, two different Android applications were implemented. The first one uses the camera to detect people’s faces whenever the user answers or performs a phone call. Firebase Platform will be used to detect faces captured by the camera and determine its size and estimate their distance to the phone terminal. The second application uses voice biometrics to differentiate the users’ voice from unknown speakers and creates a neural network model based on 5 samples of the user’s voice. This feature will only be activated whenever the user is surfing the Internet or using other applications to prevent undesired contacts. Currently, the patient’s tracking is performed by geolocation or by using Bluetooth connection. Although face detection and voice recognition are existing methods, this paper aims to use them and integrate both in a single device. Our application cannot violate privacy since it does not save the data used to carry out the detection and does not associate this data to people
    corecore