Search CORE

2 research outputs found

Monen puhujan tunnistaminen takaisinkytkeytyvillä neuroverkoilla

Author: Niskanen Lauri
Publication venue
Publication date: 05/12/2018
Field of study

Speech recognition is a popular research topic that analyzes human speech. In addition to understanding the spoken message, it is beneficial to know who is speaking. This thesis studies speaker recognition and presents a machine learning based system for identifying the speakers from audio streams. Our implementation is based on Mel-frequency cepstral coefficients (MFCC) and recurrent neural networks. The system is developed and evaluated on AMI Meeting Corpus dataset. The dataset contains annotated meeting recordings with typically four participants in each. Our system processes the audio files of the recordings in 20 millisecond slices and produces a list of active speakers at each time step. We measure the performance of our system using various metrics. The results indicate that our system is capable of identifying the speakers with decent accuracy. The best classifier model that we examined is a 1-layer long short-term memory (LSTM) neural network with layer size 256. Neural networks that are more complex than it do not seem to improve the classification results, but they suffer from increased training times. We also suggest alternative classifications methods for future research

Trepo - Institutional Repository of Tampere University

Pitch Estimation and Voicing Classification Using Reconstructed Spectrum from MFCC

Author: HuiBin QIN
JianFeng WU
LingYan FAN
YongZhu HUA
Publication venue: 'Institute of Electronics, Information and Communications Engineers (IEICE)'
Publication date: 01/01/2018
Field of study

Crossref