Search CORE

26 research outputs found

A Neural Network Approach for Mixing Language Models

Author: Klakow Dietrich
Oualil Youssef
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/08/2017
Field of study

The performance of Neural Network (NN)-based language models is steadily improving due to the emergence of new architectures, which are able to learn different natural language characteristics. This paper presents a novel framework, which shows that a significant improvement can be achieved by combining different existing heterogeneous models in a single architecture. This is done through 1) a feature layer, which separately learns different NN-based models and 2) a mixture layer, which merges the resulting model features. In doing so, this architecture benefits from the learning capabilities of each model with no noticeable increase in the number of model parameters or the training time. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.Comment: Published at IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017. arXiv admin note: text overlap with arXiv:1703.0806

arXiv.org e-Print Archive

Crossref

A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models

Author: Klakow Dietrich
Oualil Youssef
Publication venue
Publication date: 22/08/2017
Field of study

Training large vocabulary Neural Network Language Models (NNLMs) is a difficult task due to the explicit requirement of the output layer normalization, which typically involves the evaluation of the full softmax function over the complete vocabulary. This paper proposes a Batch Noise Contrastive Estimation (B-NCE) approach to alleviate this problem. This is achieved by reducing the vocabulary, at each time step, to the target words in the batch and then replacing the softmax by the noise contrastive estimation approach, where these words play the role of targets and noise samples at the same time. In doing so, the proposed approach can be fully formulated and implemented using optimal dense matrix operations. Applying B-NCE to train different NNLMs on the Large Text Compression Benchmark (LTCB) and the One Billion Word Benchmark (OBWB) shows a significant reduction of the training time with no noticeable degradation of the models performance. This paper also presents a new baseline comparative study of different standard NNLMs on the large OBWB on a single Titan-X GPU.Comment: Accepted for publication at INTERSPEECH'1

arXiv.org e-Print Archive

Crossref

Sequential estimation techniques and application to multiple speaker tracking and language modeling

Author: Oualil Youssef
Publication venue: Saarländische Universitäts- und Landesbibliothek
Publication date: 01/01/2017
Field of study

For many real-word applications, the considered data is given as a time sequence that becomes available in an orderly fashion, where the order incorporates important information about the entities of interest. The work presented in this thesis deals with two such cases by introducing new sequential estimation solutions. More precisely, we introduce a: I. Sequential Bayesian estimation framework to solve the multiple speaker localization, detection and tracking problem. This framework is a complete pipeline that includes 1) new observation estimators, which extract a fixed number of potential locations per time frame; 2) new unsupervised Bayesian detectors, which classify these estimates into noise/speaker classes and 3) new Bayesian filters, which use the speaker class estimates to track multiple speakers. This framework was developed to tackle the low overlap detection rate of multiple speakers and to reduce the number of constraints generally imposed in standard solutions. II. Sequential neural estimation framework for language modeling, which overcomes some of the shortcomings of standard approaches through merging of different models in a hybrid architecture. That is, we introduce two solutions that tightly merge particular models and then show how a generalization can be achieved through a new mixture model. In order to speed-up the training of large vocabulary language models, we introduce a new extension of the noise contrastive estimation approach to batch training.Bei vielen Anwendungen kommen Daten als zeitliche Sequenz vor, deren Reihenfolge wichtige Informationen über die betrachteten Entitäten enthält. In der vorliegenden Arbeit werden zwei derartige Fälle bearbeitet, indem neue sequenzielle Schätzverfahren eingeführt werden: I. Ein Framework für ein sequenzielles bayessches Schätzverfahren zur Lokalisation, Erkennung und Verfolgung mehrerer Sprecher. Es besteht aus 1) neuen Beobachtungsschätzern, welche pro Zeitfenster eine bestimmte Anzahl möglicher Aufenthaltsorte bestimmen; 2) neuen, unüberwachten bayesschen Erkennern, die diese Abschätzungen nach Sprechern/Rauschen klassifizieren und 3) neuen bayesschen Filtern, die Schätzungen aus der Sprecher-Klasse zur Verfolgung mehrerer Sprecher verwenden. Dieses Framework wurde speziell zur Verbesserung der i.A. niedrigen Erkennungsrate bei gleichzeitig Sprechenden entwickelt und benötigt weniger Randbedingungen als Standardlösungen. II. Ein sequenzielles neuronales Vorhersageframework für Sprachmodelle, das einige Nachteile von Standardansätzen durch das Zusammenführen verschiedener Modelle in einer Hybridarchitektur beseitigt. Konkret stellen wir zwei Lösungen vor, die bestimmte Modelle integrieren, und leiten dann eine Verallgemeinerung durch die Verwendung eines neuen Mischmodells her. Um das Trainieren von Sprachmodellen mit sehr großem Vokabular zu beschleunigen, wird eine Erweiterung des rauschkontrastiven Schätzverfahrens für Batch-Training vorgestellt

Universaar

Acronym

Sequential Recurrent Neural Networks for Language Modeling

Author: Greenberg Clayton
Klakow Dietrich
Oualil Youssef
Singh Mittul
Publication venue
Publication date: 23/03/2017
Field of study

Feedforward Neural Network (FNN)-based language models estimate the probability of the next word based on the history of the last N words, whereas Recurrent Neural Networks (RNN) perform the same task based only on the last word and some context information that cycles in the network. This paper presents a novel approach, which bridges the gap between these two categories of networks. In particular, we propose an architecture which takes advantage of the explicit, sequential enumeration of the word history in FNN structure while enhancing each word representation at the projection layer through recurrent context information that evolves in the network. The context integration is performed using an additional word-dependent weight matrix that is also learned during the training. Extensive experiments conducted on the Penn Treebank (PTB) and the Large Text Compression Benchmark (LTCB) corpus showed a significant reduction of the perplexity when compared to state-of-the-art feedforward as well as recurrent neural network architectures.Comment: published (INTERSPEECH 2016), 5 pages, 3 figures, 4 table

arXiv.org e-Print Archive

Crossref

A Multiple Hypothesis Gaussian Mixture Filter for Acoustic Source Localization and Tracking

Author: Faubel Friedrich
Klakow Dietrich
Oualil Youssef
Publication venue
Publication date: 19/12/2013
Field of study

In this work, we address the problem of tracking an acoustic source based on measured time differences of arrival (TDOA). The classical solution to this problem consists in using a detector, which estimates the TDOA for each microphone pair, and then applying a tracking algorithm, which integrates the “measured” TDOAs in time. Such a two-stage approach presumes 1) that TDOAs can be estimated reliably; and 2) that the errors in detection behave in a well-defined fashion. The presence of noise and reverberation, however, causes larger errors in the TDOA estimates and, thereby, ultimately lowers the tracking performance. We propose to counteract this effect by considering a multiple hypothesis filter, which propagates the TDOA estimation uncertainty to the tracking stage. That is achieved by considering multiple TDOA estimates and then integrating the resulting TDOA observations in the framework of a Gaussian mixture filter. Experimental results show that the proposed filter has a significantly lower angular error than a multiple hypothesis particle filter

Infoscience - École polytechnique fédérale de Lausanne

CiteSeerX

The Metalogue Debate Trainee Corpus: Data Collection and Annotation

Author: Albert Pierre
Alexandersson Jan
Campbell Nick
Haider Fasih
Klakow Dietrich
Koryzis Dimitris
Linz Nicklas
Luz Saturnino
Malchanau Andrei
Oualil Youssef
Petukhova Volha
Spiliotopoulos Dimitris
Publication venue
Publication date: 07/05/2018
Field of study

Edinburgh Research Explorer

QUANTIFYING THE BENEFITS OF SPEECH RECOGNITION FOR AN AIR TRAFFIC MANAGEMENT APPLICATION

Author: Helmke Hartmut
Oualil Youssef
Schulder Marc
Publication venue
Publication date: 01/01/2017
Field of study

Abstract: The project AcListant® (Active Listening Assistant), which uses automatic speech recognition to recognize the commands in air traffic controller to pilot communication, has achieved command recognition rates above 95%. These high rates were obtained with an Assistance-Based Speech Recognition (ABSR). An Arrival Manager (AMAN) cannot exactly predict the next actions of a controller, but it knows which commands are plausible in the current situation and which not. Therefore, the AMAN generates a set of possible commands every 20 seconds, which serves as context information for the speech recognizer. Different validation trials have been performed with controllers from Düsseldorf, Frankfurt, Munich, Prague and Vienna in DLR’s air traffic simulator in Braunschweig from 2014 to 2015. Decision makers of air navigation providers (ANSPs) are primary not interested in high recognition rates, respectively, low error rates. They are interested in reducing costs and efforts. Therefore, the validation trials, that were performed at the end of 2015, aimed at quantifying the benefits of using speech recognition with respect to both efficiency and controller workload. The paper describes the experiments performed to show that with ABSR support controller workload for radar label maintenance could be reduced by a factor of three and that ABSR enables fuel savings of 50 to 65 liters per fligh

Institute of Transport Research:Publications

Application-Agnostic Language Modeling for On-Device ASR

Author: Nußbaum-Thom Markus
Oualil Youssef
Verwimp Lyan
Publication venue
Publication date: 16/05/2023
Field of study

On-device automatic speech recognition systems face several challenges compared to server-based systems. They have to meet stricter constraints in terms of speed, disk size and memory while maintaining the same accuracy. Often they have to serve several applications with different distributions at once, such as communicating with a virtual assistant and speech-to-text. The simplest solution to serve multiple applications is to build application-specific (language) models, but this leads to an increase in memory. Therefore, we explore different data- and architecture-driven language modeling approaches to build a single application-agnostic model. We propose two novel feed-forward architectures that find an optimal trade off between different on-device constraints. In comparison to the application-specific solution, one of our novel approaches reduces the disk size by half, while maintaining speed and accuracy of the original model.Comment: accepted for ACL 2023 industry trac

arXiv.org e-Print Archive