Search CORE

5 research outputs found

Can non-linear readout nodes enhance the performance of reservoir-based speech recognizers?

Author: Martens Jean-Pierre
Triefenbach Fabian
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

Crossref

Ghent University Academic Bibliography

A transfer learning framework for predicting the emotional content of generalized sound events

Author: S. Ntalampiras
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 01/03/2017
Field of study

Predicting the emotions evoked by generalized sound events is a relatively recent research domain which still needs attention. In this work a framework aiming to reveal potential similarities existing during the perception of emotions evoked by sound events and songs is presented. To this end the following are proposed: (a) the usage of temporal modulation features, (b) a transfer learning module based on an echo state network, and (c) a k-medoids clustering algorithm predicting valence and arousal measurements associated with generalized sound events. The effectiveness of the proposed solution is demonstrated after a thoroughly designed experimental phase employing both sound and music data. The results demonstrate the importance of transfer learning in the specific field and encourage further research on approaches which manage the problem in a synergistic way

AIR Universita degli studi di Milano

Design of reservoir computing systems for the recognition of noise corrupted speech and handwriting

Author: Jalalvand Azarakhsh
Publication venue: Ghent University. Faculty of Engineering and Architecture
Publication date: 01/01/2015
Field of study

Ghent University Academic Bibliography

Connected digit recognition by means of Reservoir Computing

Author: Jalalvand Azarakhsh
Martens Jean-Pierre
Triefenbach Fabian
Verstraeten David
Publication venue: International Speech Communication Association (ISCA)
Publication date: 01/01/2011
Field of study

Most automatic speech recognition systems employ Hidden Markov Models with Gaussian mixture emission distributions to model the acoustics. There have been several attempts however to challenge this approach, e.g. by introducing a neural network (NN) as an alternative acoustic model. Although the performance of these so-called hybrid systems is actually quite good, their training is often problematic and time consuming. By using a reservoir – this is a recurrent NN with only the output weights being trainable – we can overcome this disadvantage and yet obtain good accuracy. In this paper, we propose the first reservoir-based connected digit recognition system, and we demonstrate good performance on the Aurora-2 testbed. Since RC is a new technology, we anticipate that our present system is still sub-optimal, and further improvements are possible

Ghent University Academic Bibliography

Recommended from our members

Deep neural network acoustic models for multi-dialect Arabic speech recognition

Author: Hmad NF
Publication venue
Publication date: 01/07/2015
Field of study

Speech is a desirable communication method between humans and computers. The major concerns of the automatic speech recognition (ASR) are determining a set of classification features and finding a suitable recognition model for these features. Hidden Markov Models (HMMs) have been demonstrated to be powerful models for representing time varying signals. Artificial Neural Networks (ANNs) have also been widely used for representing time varying quasi-stationary signals. Arabic is one of the oldest living languages and one of the oldest Semitic languages in the world, it is also the fifth most generally used language and is the mother tongue for roughly 200 million people. Arabic speech recognition has been a fertile area of reasearch over the previous two decades, as attested by the various papers that have been published on this subject. This thesis investigates phoneme and acoustic models based on Deep Neural Networks (DNN) and Deep Echo State Networks for multi-dialect Arabic Speech Recognition. Moreover, the TIMIT corpus with a wide variety of American dialects is also aimed to evaluate the proposed models. The availability of speech data that is time-aligned and labelled at phonemic level is a fundamental requirement for building speech recognition systems. A developed Arabic phoneme database (APD) was manually timed and phonetically labelled. This dataset was constructed from the King Abdul-Aziz Arabic Phonetics Database (KAPD) database for Saudi Arabia dialect and the Centre for Spoken Language Understanding (CSLU2002) database for different Arabic dialects. This dataset covers 8148 Arabic phonemes. In addition, a corpus of 120 speakers (13 hours of Arabic speech) randomly selected from the Levantine Arabic dialect database that is used for training and 24 speakers (2.4 hours) for testing are revised and transcription errors were manually corrected. The selected dataset is labelled automatically using the HTK Hidden Markov Model toolkit. TIMIT corpus is also used for phone recognition and acoustic modelling task. We used 462 speakers (3.14 hours) for training and 24 speakers (0.81 hours) for testing. For Automatic Speech Recognition (ASR), a Deep Neural Network (DNN) is used to evaluate its adoption in developing a framewise phoneme recognition and an acoustic modelling system for Arabic speech recognition. Restricted Boltzmann Machines (RBMs) DNN models have not been explored for any Arabic corpora previously. This allows us to claim priority for adopting this RBM DNN model for the Levantine Arabic acoustic models. A post-processing enhancement was also applied to the DNN acoustic model outputs in order to improve the recognition accuracy and to obtain the accuracy at a phoneme level instead of the frame level. This post process has significantly improved the recognition performance. An Echo State Network (ESN) is developed and evaluated for Arabic phoneme recognition with different learning algorithms. This investigated the use of the conventional ESN trained with supervised and forced learning algorithms. A novel combined supervised/forced supervised learning algorithm (unsupervised adaptation) was developed and tested on the proposed optimised Arabic phoneme recognition datasets. This new model is evaluated on the Levantine dataset and empirically compared with the results obtained from the baseline Deep Neural Networks (DNNs). A significant improvement on the recognition performance was achieved when the ESN model was implemented compared to the baseline RBM DNN model’s result. The results show that the ESN model has a better ability for recognizing phonemes sequences than the DNN model for a small vocabulary size dataset. The adoption of the ESNs model for acoustic modeling is seen to be more valid than the adoption of the DNNs model for acoustic modeling speech recognition, as ESNs are recurrent models and expected to support sequence models better than the RBM DNN models even with the contextual input window. The TIMIT corpus is also used to investigate deep learning for framewise phoneme classification and acoustic modelling using Deep Neural Networks (DNNs) and Echo State Networks (ESNs) to allow us to make a direct and valid comparison between the proposed systems investigated in this thesis and the published works in equivalent projects based on framewise phoneme recognition used the TIMIT corpus. Our main finding on this corpus is that ESN network outperform time-windowed RBM DNN ones. However, our developed system ESN-based shows 10% lower performance when it was compared to the other systems recently reported in the literature that used the same corpus. This due to the hardware availability and not applying speaker and noise adaption that can improve the results in this thesis as our aim is to investigate the proposed models for speech recognition and to make a direct comparison between these models

Nottingham Trent Institutional Repository (IRep)