Search CORE

1,083 research outputs found

Use of a Confusion Network to Detect and Correct Errors in an On-Line Handwritten Sentence Recognition System

Author: Anquetil Eric
Quiniou Solen
Publication venue: HAL CCSD
Publication date: 23/09/2007
Field of study

International audienceIn this paper we investigate the integration of a confusion network into an on-line handwritten sentence recognition system. The word posterior probabilities from the confusion network are used as confidence scored to detect potential errors in the output sentence from the Maximum A Posteriori decoding on a word graph. Dedicated classifiers (here, SVMs) are then trained to correct these errors and combine the word posterior probabilities with other sources of knowledge. A rejection phase is also introduced in the detection process. Experiments on handwritten sentences show a 28.5i% relative reduction of the word error rate

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Handling out-of-vocabulary words and recognition errors based on word linguistic context for handwritten sentence recognition

Author: Anquetil Eric
Cheriet Mohamed
Quiniou Solen
Publication venue: HAL CCSD
Publication date: 26/07/2009
Field of study

International audienceIn this paper we investigate the use of linguistic information given by language models to deal with word recognition errors on handwritten sentences. We focus especially on errors due to out-of-vocabulary (OOV) words. First, word posterior probabilities are computed and used to detect error hypotheses on output sentences. An SVM classifier allows these errors to be categorized according to defined types. Then, a post-processing step is performed using a language model based on Part-of-Speech (POS) tags which is combined to the n-gram model previously used. Thus, error hypotheses can be further recognized and POS tags can be assigned to the OOV words. Experiments on on-line handwritten sentences show that the proposed approach allows a significant reduction of the word error rate

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Utilisation de réseaux de confusion pour la reconnaissance de phrases manuscrites en-ligne

Author: Anquetil Eric
Quiniou Solen
Publication venue: HAL CCSD
Publication date: 22/01/2008
Field of study

National audienceDans cet article, nous nous intéressons à l'intégration d'une représentation des hypothèses de phrases sous forme de réseau de confusion, dans un système de reconnaissance de phrases manuscrites en-ligne. Les probabilités a posteriori des mots, obtenues à partir du réseau de confusion, sont utilisées comme score de confiance afin de détecter d'éventuelles erreurs dans la phrase issue d'un décodage au Maximum A Posteriori sur un graphe de mots. Des classifieurs dédiés (ici, des SVM) sont ensuite appris afin de corriger ces erreurs, en combinant les probabilités a posterio des mots à d'autres sources de connaissance. Une phase de rejet est aussi introduite dans le processus de détection. Des expérimentations menées sur une base de 320 phrases manuscrites montrent une réduction relative du taux d'erreur sur les mots de 31,3%, dans le cas de l'extraction manuelle des mots, et une diminution relative de 60%, lorsque ces mots sont extraits automatiquement

HAL Descartes

Hal-Diderot

HAL-Rennes 1

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Spoken command recognition for robotics

Author: Higy Bertrand
Publication venue: Universit\ue0 degli studi di Genova
Publication date: 08/04/2019
Field of study

In this thesis, I investigate spoken command recognition technology for robotics. While high robustness is expected, the distant and noisy conditions in which the system has to operate make the task very challenging. Unlike commercial systems which all rely on a "wake-up" word to initiate the interaction, the pipeline proposed here directly detect and recognizes commands from the continuous audio stream. In order to keep the task manageable despite low-resource conditions, I propose to focus on a limited set of commands, thus trading off flexibility of the system against robustness. Domain and speaker adaptation strategies based on a multi-task regularization paradigm are first explored. More precisely, two different methods are proposed which rely on a tied loss function which penalizes the distance between the output of several networks. The first method considers each speaker or domain as a task. A canonical task-independent network is jointly trained with task-dependent models, allowing both types of networks to improve by learning from one another. While an improvement of 3.2% on the frame error rate (FER) of the task-independent network is obtained, this only partially carried over to the phone error rate (PER), with 1.5% of improvement. Similarly, a second method explored the parallel training of the canonical network with a privileged model having access to i-vectors. This method proved less effective with only 1.2% of improvement on the FER. In order to make the developed technology more accessible, I also investigated the use of a sequence-to-sequence (S2S) architecture for command classification. The use of an attention-based encoder-decoder model reduced the classification error by 40% relative to a strong convolutional neural network (CNN)-hidden Markov model (HMM) baseline, showing the relevance of S2S architectures in such context. In order to improve the flexibility of the trained system, I also explored strategies for few-shot learning, which allow to extend the set of commands with minimum requirements in terms of data. Retraining a model on the combination of original and new commands, I managed to achieve 40.5% of accuracy on the new commands with only 10 examples for each of them. This scores goes up to 81.5% of accuracy with a larger set of 100 examples per new command. An alternative strategy, based on model adaptation achieved even better scores, with 68.8% and 88.4% of accuracy with 10 and 100 examples respectively, while being faster to train. This high performance is obtained at the expense of the original categories though, on which the accuracy deteriorated. Those results are very promising as the methods allow to easily extend an existing S2S model with minimal resources. Finally, a full spoken command recognition system (named iCubrec) has been developed for the iCub platform. The pipeline relies on a voice activity detection (VAD) system to propose a fully hand-free experience. By segmenting only regions that are likely to contain commands, the VAD module also allows to reduce greatly the computational cost of the pipeline. Command candidates are then passed to the deep neural network (DNN)-HMM command recognition system for transcription. The VoCub dataset has been specifically gathered to train a DNN-based acoustic model for our task. Through multi-condition training with the CHiME4 dataset, an accuracy of 94.5% is reached on VoCub test set. A filler model, complemented by a rejection mechanism based on a confidence score, is finally added to the system to reject non-command speech in a live demonstration of the system

Archivio istituzionale della ricerca - Università di Genova

The effects of child language development on the performance of automatic speech recognition

Author: Fringi Evangelia
Publication venue
Publication date: 12/07/2019
Field of study

In comparison to adults’, children’s ASR appears to be more challenging and yields inferior results. It has been suggested that for this issue to be addressed, linguistic understanding of children’s speech development needs to be employed to either provide a solution or an explanation. The present work aims to explore the influence of phonological effects associated with language acquisition (PEALA) in children’s ASR and investigate whether they can be detected in systematic patterns of ASR phone confusion errors or they can be evidenced in systematic patterns of acoustic feature structure. Findings from speech development research are used as the framework upon which a set of predictable error patterns is defined and guides the analysis of the experimental results reported. Several ASR experiments are conducted involving both children’s and adults’ speech. ASR phone confusion matrices are extracted and analysed according to a statistical significance test, proposed for the purposes of this work. A mathematical model is introduced to interpret the emerging results. Additionally, bottleneck features and i-vectors representing the acoustic features in one of the systems developed, are extracted and visualised using linear discriminant analysis (LDA). A qualitative analysis is conducted with reference to patterns that can be predicted through PEALA

University of Birmingham Research Archive, E-theses Repository

Towards Cognizant Hearing Aids: Modeling of Content, Affect and Attention

Author: Karadogan Seliz
Publication venue: Technical University of Denmark
Publication date: 01/01/2012
Field of study

Online Research Database In Technology

Bayesian Multi-Model Frameworks - Properly Addressing Conceptual Uncertainty in Applied Modelling

Author: Höge Marvin
Publication venue: Universität Tübingen
Publication date: 16/04/2019
Field of study

We use models to understand or predict a system. Often, there are multiple plausible but competing model concepts. Hence, modelling is associated with conceptual uncertainty, i.e., the question about proper handling of such model alternatives. For mathematical models, it is possible to quantify their plausibility based on data and rate them accordingly. Bayesian probability calculus offers several formal multi-model frameworks to rate models in a finite set and to quantify their conceptual uncertainty as model weights. These frameworks are Bayesian model selection and averaging (BMS/BMA), Pseudo-BMS/BMA and Bayesian Stacking. The goal of this dissertation is to facilitate proper utilization of these Bayesian multi-model frameworks. They follow different principles in model rating, which is why derived model weights have to be interpreted differently, too. These principles always concern the model setting, i.e., how the models in the set relate to one another and the true model of the system that generated observed data. This relation is formalized in model scores that are used for model weighting within each framework. The scores resemble framework-specific compromises between the ability of a model to fit the data and the therefore required model complexity. Hence, first, the scores are investigated systematically regarding their respective take on model complexity and are allocated in a developed classification scheme. This shows that BMS/BMA always pursues to identify the true model in the set, that Pseudo-BMS/BMA searches the model with largest predictive power despite none of the models being the true one, and that, on that condition, Bayesian Stacking seeks reliability in prediction by combining predictive distributions of multiple models. An application example with numerical models illustrates these behaviours and demonstrates which misinterpretations of model weights impend, if a certain framework is applied despite being unsuitable for the underlying model setting. Regarding applied modelling, first, a new setting is proposed that allows to identify a ``quasi-true'' model in a set. Second, Bayesian Bootstrapping is employed to take into account that rating of predictive capability is based on only limited data. To ensure that the Bayesian multi-model frameworks are employed properly and goal-oriented, a guideline is set up. With respect to a clearly defined modelling goal and the allocation of available models to the respective setting, it leads to the suitable multi-model framework. Aside of the three investigated frameworks, this guideline further contains an additional one that allows to identify a (quasi-)true model if it is composed of a linear combination of the model alternatives in the set. The gained insights enable a broad range of users in science practice to properly employ Bayesian multi-model frameworks in order to quantify and handle conceptual uncertainty. Thus, maximum reliability in system understanding and prediction with multiple models can be achieved. Further, the insights pave the way for systematic model development and improvement.Wir benutzen Modelle, um ein System zu verstehen oder vorherzusagen. Oft gibt es dabei mehrere plausible aber konkurrierende Modellkonzepte. Daher geht Modellierung einher mit konzeptioneller Unsicherheit, also der Frage nach dem angemessenen Umgang mit solchen Modellalternativen. Bei mathematischen Modellen ist es möglich, die Plausibilität jedes Modells anhand von Daten des Systems zu quantifizieren und Modelle entsprechend zu bewerten. Bayes'sche Wahrscheinlichkeitsrechnung bietet dazu verschiedene formale Multi-Modellrahmen, um Modellalternativen in einem endlichen Set zu bewerten und ihre konzeptionelle Unsicherheit als Modellgewichte zu beziffern. Diese Rahmen sind Bayes'sche Modellwahl und -mittelung (BMS/BMA), Pseudo-BMS/BMA und Bayes'sche Modellstapelung. Das Ziel dieser Dissertation ist es, den adäquaten Umgang mit diesen Bayes'schen Multi-Modellrahmen zu ermöglichen. Sie folgen unterschiedlichen Prinzipien in der Modellbewertung weshalb die abgeleiteten Modellgewichte auch unterschiedlich zu interpretieren sind. Diese Prinzipien beziehen sich immer auf das Modellsetting, also darauf, wie sich die Modelle im Set zueinander und auf das wahre Modell des Systems beziehen, welches bereits gemessene Daten erzeugt hat. Dieser Bezug ist in Kenngrößen formalisiert, die innerhalb jedes Rahmens der Modellgewichtung dienen. Die Kenngrößen stellen rahmenspezifische Kompromisse dar, zwischen der Fähigkeit eines Modells die Daten zu treffen und der dazu benötigten Modellkomplexität. Daher werden die Kenngrößen zunächst systematisch auf ihre jeweilige Bewertung von Modellkomplexität untersucht und in einem entsprechend entwickelten Klassifikationschema zugeordnet. Dabei zeigt sich, dass BMS/BMA stets verfolgt das wahre Modell im Set zu identifizieren, dass Pseudo-BMS/BMA das Modell mit der höchsten Vorsagekraft sucht, obwohl kein wahres Modell verfügbar ist, und dass Bayes'sche Modellstapelung unter dieser Bedingung Verlässlichkeit von Vorhersagen anstrebt, indem die Vorhersageverteilungen mehrerer Modelle kombiniert werden. Ein Anwendungsbeispiel mit numerischen Modellen verdeutlicht diese Verhaltenweisen und zeigt auf, welche Fehlinterpretationen der Modellgewichte drohen, wenn ein bestimmter Rahmen angewandt wird, obwohl er nicht zum zugrundeliegenden Modellsetting passt. Mit Bezug auf anwendungsorientierte Modellierung wird dabei erstens ein neues Setting vorgestellt, das es ermöglicht, ein ``quasi-wahres'' Modell in einem Set zu identifizieren. Zweitens wird Bayes'sches Bootstrapping eingesetzt um bei der Bewertung der Vorhersagegüte zu berücksichtigen, dass diese auf Basis weniger Daten erfolgt. Um zu gewährleisten, dass die Bayes'schen Multi-Modellrahmen angemessen und zielführend eingesetzt werden, wird schließlich ein Leitfaden erstellt. Anhand eines klar definierten Modellierungszieles und der Einordnung der gegebenen Modelle in das entspechende Setting leitet dieser zum geeigneten Multi-Modellrahmen. Neben den drei untersuchten Rahmen enthält dieser Leitfaden zudem einen weiteren, der es ermöglicht ein (quasi-)wahres Modell zu identifizieren, wenn dieses aus einer Linearkombination der Modellalternativen im Set besteht. Die gewonnenen Erkenntnisse ermöglichen es einer breiten Anwenderschaft in Wissenschaft und Praxis, Bayes'sche Multi-Modellrahmen zur Quantifizierung und Handhabung konzeptioneller Unsicherheit adäquat einzusetzen. Dadurch lässt sich maximale Verlässlichkeit in Systemverständis und -vorhersage durch mehrere Modelle erreichen. Die Erkenntnisse ebnen darüber hinaus den Weg für systematische Modellentwicklung und -verbesserung

Publikationsserver der Universität Tübingen

Recommended from our members

Ticker: An Adaptive Single-Switch Text Entry Method for Visually Impaired Users.

Author: Kristensson Per Ola
MacKay David JC
Nel Emli-Mari
Publication venue: IEEE Trans Pattern Anal Mach Intell
Publication date: 01/11/2019
Field of study

Ticker is a probabilistic stereophonic single-switch text entry method for visually-impaired users with motor disabilities who rely on single-switch scanning systems to communicate. Such scanning systems are sensitive to a variety of noise sources, which are inevitably introduced in practical use of single-switch systems. Ticker uses a novel interaction model based on stereophonic sound coupled with statistical models for robust inference of the user's intended text in the presence of noise. As a consequence of its design, Ticker is resilient to noise and therefore a practical solution for single-switch scanning systems. Ticker's performance is validated using a combination of simulations and empirical user studies.The work was funded by the Aegis EU project and the Gatsby Charitable Foundation

Apollo (Cambridge)