Search CORE

27 research outputs found

Fast vocabulary acquisition in an NMF-based self-learning vocal user interface

Author: Gemmeke Jort F.
Ons Bart
Van hamme Hugo
Publication venue: The Authors. Published by Elsevier Ltd.
Publication date
Field of study

AbstractIn command-and-control applications, a vocal user interface (VUI) is useful for handsfree control of various devices, especially for people with a physical disability. The spoken utterances are usually restricted to a predefined list of phrases or to a restricted grammar, and the acoustic models work well for normal speech. While some state-of-the-art methods allow for user adaptation of the predefined acoustic models and lexicons, we pursue a fully adaptive VUI by learning both vocabulary and acoustics directly from interaction examples. A learning curve usually has a steep rise in the beginning and an asymptotic ceiling at the end. To limit tutoring time and to guarantee good performance in the long run, the word learning rate of the VUI should be fast and the learning curve should level off at a high accuracy. In order to deal with these performance indicators, we propose a multi-level VUI architecture and we investigate the effectiveness of alternative processing schemes. In the low-level layer, we explore the use of MIDA features (Mutual Information Discrimination Analysis) against conventional MFCC features. In the mid-level layer, we enhance the acoustic representation by means of phone posteriorgrams and clustering procedures. In the high-level layer, we use the NMF (Non-negative Matrix Factorization) procedure which has been demonstrated to be an effective approach for word learning. We evaluate and discuss the performance and the feasibility of our approach in a realistic experimental setting of the VUI-user learning context

Elsevier - Publisher Connector

Active-set newton algorithm for overcomplete non-negative representations of audio

Author: Gemmeke Jort F.
Raj Bhiksha
Virtanen Tuomas
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

acceptedVersionPeer reviewe

CiteSeerX

Trepo - Institutional Repository of Tampere University

Investigating NMF Speech Enhancement for Neural Network based Acoustic Models

Author: Björn Schuller
Gerhard Rigoll
Jort F. Gemmeke
Jürgen T. Geiger
Publication venue
Publication date
Field of study

In the light of the improvements that were made in the last years with neural network-based acoustic models, it is an interesting question whether these models are also suited for noise-robust recognition. This has not yet been fully explored, although first experiments confirm this question. Furthermore, preprocessing techniques that improve the robustness should be re-evaluated with these new models. In this work, we present experimental results to address these questions. Acoustic models based on Gaussian mixture models (GMMs), deep neural networks (DNNs), and long short-term memory (LSTM) recurrent neural networks (which have an improved ability to exploit context) are evaluated for their robustness after clean or multi-condition training. In addition, the influence of non-negative matrix factorization (NMF) for speech enhancement is investigated. Experiments are performed with the Aurora-4 database and the results show that DNNs perform slightly better than LSTMs and, as expected, both beat GMMs. Furthermore, speech enhancement is capable of improving the DNN result. Index Terms: robust speech recognition, long short-term memory, speech enhancemen

CiteSeerX

CNN Architectures for Large-Scale Audio Classification

Author: Chaudhuri Sourish
Ellis Daniel P. W.
Gemmeke Jort F.
Hershey Shawn
Jansen Aren
Moore R. Channing
Plakal Manoj
Platt Devin
Saurous Rif A.
Seybold Bryan
Slaney Malcolm
Weiss Ron J.
Wilson Kevin
Publication venue
Publication date: 10/01/2017
Field of study

Convolutional Neural Networks (CNNs) have proven very effective in image classification and show promise for audio. We use various CNN architectures to classify the soundtracks of a dataset of 70M training videos (5.24 million hours) with 30,871 video-level labels. We examine fully connected Deep Neural Networks (DNNs), AlexNet [1], VGG [2], Inception [3], and ResNet [4]. We investigate varying the size of both training set and label vocabulary, finding that analogs of the CNNs used in image classification do well on our audio classification task, and larger training and label sets help up to a point. A model using embeddings from these classifiers does much better than raw features on the Audio Set [5] Acoustic Event Detection (AED) classification task.Comment: Accepted for publication at ICASSP 2017 Changes: Added definitions of mAP, AUC, and d-prime. Updated mAP/AUC/d-prime numbers for Audio Set based on changes of latest Audio Set revision. Changed wording to fit 4 page limit with new addition

arXiv.org e-Print Archive

Crossref

Combination of Sparse Classification and Multilayer Perceptron for Noise Robust ASR

Author: Boves Lou
Cranen B.
F. Gemmeke Jort
Magimai.-Doss Mathew
Sun Yang
ten Bosch Louis
Publication venue
Publication date: 19/12/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Using Sparse Classification Outputs as Feature Observations for Noise Robust ASR

Author: Boves Lou
Cranen B.
F. Gemmeke Jort
Magimai.-Doss Mathew
Sun Yang
ten Bosch Louis
Publication venue
Publication date: 19/12/2013
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Modelling non-stationary noise with spectral factorisation in automatic speech recognition

Author: Acero
Antti Hurmalainen
Barker
Cichocki
Cooke
Delcroix
Demuynck
Gales
Gemmeke
Gemmeke
Heittola
Hershey
Hurmalainen
Hurmalainen
Hurmalainen
Jort F. Gemmeke
Kinoshita
Lee
Maas
Mahkonen
Ming
Mysore
O’Grady
Raj
Schmidth
Smaragdis
Sundaram
Tuomas Virtanen
Van Segbroeck
Vipperla
Virtanen
Virtanen
Wachter
Wachter
Wang
Wang
Weninger
Weninger
Wilson
Wilson
Young
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Exemplar-based joint channel and noise compensation

Author: Jort F. Gemmeke
Kris Demuynck
Tuomas Virtanen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

In this paper two models for channel estimation in exemplar-based noise robust speech recognition are proposed. Building on a compo-sitional model that models noisy speech and a combination of noise and speech atoms, the first model iteratively estimates a filter to best compensate the mismatch with the observed noisy speech. The sec-ond model estimates separate filters for the noise and speech atoms. We show that both models enable noise-robust ASR even if the chan-nel characteristics of the noisy speech do not match those of the ex-emplars in the dictionary. Moreover, the second model, which is able to estimate separate filters for speech and noise, is shown to be robust even in the presence of bandwidth-limited sources. Index Terms — Speech recognition, source separation, matrix factorization, noise robustness, channel compensatio

CiteSeerX

Crossref