2 research outputs found

    Monen puhujan tunnistaminen takaisinkytkeytyvillä neuroverkoilla

    Get PDF
    Speech recognition is a popular research topic that analyzes human speech. In addition to understanding the spoken message, it is beneficial to know who is speaking. This thesis studies speaker recognition and presents a machine learning based system for identifying the speakers from audio streams. Our implementation is based on Mel-frequency cepstral coefficients (MFCC) and recurrent neural networks. The system is developed and evaluated on AMI Meeting Corpus dataset. The dataset contains annotated meeting recordings with typically four participants in each. Our system processes the audio files of the recordings in 20 millisecond slices and produces a list of active speakers at each time step. We measure the performance of our system using various metrics. The results indicate that our system is capable of identifying the speakers with decent accuracy. The best classifier model that we examined is a 1-layer long short-term memory (LSTM) neural network with layer size 256. Neural networks that are more complex than it do not seem to improve the classification results, but they suffer from increased training times. We also suggest alternative classifications methods for future research

    Pitch Estimation and Voicing Classification Using Reconstructed Spectrum from MFCC

    No full text
    corecore