2 research outputs found
Spoken Persian digits recognition using deep learning
Classification of isolated digits is a fundamental challenge for many speech classification systems. Previous works on spoken digits have been limited to the numbers 0 to 9. In this paper, we propose two deep learning-based models for spoken digit recognition in the range of 0 to 599. The first model is a Convolutional Neural Network (CNN) model that uses the Mel spectrogram obtained from the audio data. The second model uses the recent advances in deep sequential models, especially the Transformer model followed by a Long Short-Term Memory (LSTM) Network and a classifier. Moreover, we also collected a dataset, including audio data by a contribution of 145 people, covering the numerical range from 0 to 599. The experimental results on the collected dataset indicate a validation accuracy of 98.03%
Amharic spoken digits recognition using convolutional neural network
Authors would like to acknowledge and thanks to the participants in the collection of voice samples.Peer reviewe