6,647 research outputs found

    Holistic Vocabulary Independent Spoken Term Detection

    Get PDF
    Within this thesis, we aim at designing a loosely coupled holistic system for Spoken Term Detection (STD) on heterogeneous German broadcast data in selected application scenarios. Starting from STD on the 1-best output of a word-based speech recognizer, we study the performance of several subword units for vocabulary independent STD on a linguistically and acoustically challenging German corpus. We explore the typical error sources in subword STD, and find that they differ from the error sources in word-based speech search. We select, extend and combine a set of state-of-the-art methods for error compensation in STD in order to explicitly merge the corresponding STD error spaces through anchor-based approximate lattice retrieval. Novel methods for STD result verification are proposed in order to increase retrieval precision by exploiting external knowledge at search time. Error-compensating methods for STD typically suffer from high response times on large scale databases, and we propose scalable approaches suitable for large corpora. Highest STD accuracy is obtained by combining anchor-based approximate retrieval from both syllable lattice ASR and syllabified word ASR into a hybrid STD system, and pruning the result list using external knowledge with hybrid contextual and anti-query verification.Die vorliegende Arbeit beschreibt ein lose gekoppeltes, ganzheitliches System zur Sprachsuche auf heterogenenen deutschen Sprachdaten in unterschiedlichen Anwendungsszenarien. Ausgehend von einer wortbasierten Sprachsuche auf dem Transkript eines aktuellen Wort-Erkenners werden zunächst unterschiedliche Subwort-Einheiten für die vokabularunabhängige Sprachsuche auf deutschen Daten untersucht. Auf dieser Basis werden die typischen Fehlerquellen in der Subwort-basierten Sprachsuche analysiert. Diese Fehlerquellen unterscheiden sich vom Fall der klassichen Suche im Worttranskript und müssen explizit adressiert werden. Die explizite Kompensation der unterschiedlichen Fehlerquellen erfolgt durch einen neuartigen hybriden Ansatz zur effizienten Ankerbasierten unscharfen Wortgraph-Suche. Darüber hinaus werden neuartige Methoden zur Verifikation von Suchergebnissen vorgestellt, die zur Suchzeit verfügbares externes Wissen einbeziehen. Alle vorgestellten Verfahren werden auf einem umfangreichen Satz von deutschen Fernsehdaten mit Fokus auf ausgewählte, repräsentative Einsatzszenarien evaluiert. Da Methoden zur Fehlerkompensation in der Sprachsuchforschung typischerweise zu hohen Laufzeiten bei der Suche in großen Archiven führen, werden insbesondere auch Szenarien mit sehr großen Datenmengen betrachtet. Die höchste Suchleistung für Archive mittlerer Größe wird durch eine unscharfe und Anker-basierte Suche auf einem hybriden Index aus Silben-Wortgraphen und silbifizierter Wort-Erkennung erreicht, bei der die Suchergebnisse mit hybrider Verifikation bereinigt werden

    Energy-based Self-attentive Learning of Abstractive Communities for Spoken Language Understanding

    Full text link
    Abstractive community detection is an important spoken language understanding task, whose goal is to group utterances in a conversation according to whether they can be jointly summarized by a common abstractive sentence. This paper provides a novel approach to this task. We first introduce a neural contextual utterance encoder featuring three types of self-attention mechanisms. We then train it using the siamese and triplet energy-based meta-architectures. Experiments on the AMI corpus show that our system outperforms multiple energy-based and non-energy based baselines from the state-of-the-art. Code and data are publicly available.Comment: Update baseline

    Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

    Get PDF

    System-independent ASR error detection and classification using Recurrent Neural Network

    Get PDF
    This paper addresses errors in continuous Automatic Speech Recognition (ASR) in two stages: error detection and error type classification. Unlike the majority of research in this field, we propose to handle the recognition errors independently from the ASR decoder. We first establish an effective set of generic features derived exclusively from the recognizer output to compensate for the absence of ASR decoder information. Then, we apply a variant Recurrent Neural Network (V-RNN) based models for error detection and error type classification. Such model learn additional information to the recognized word classification using label dependency. As a result, experiments on Multi-Genre Broadcast Media corpus have shown that the proposed generic features setup leads to achieve competitive performances, compared to state of the art systems in both tasks. Furthermore, we have shown that V-RNN trained on the proposed feature set appear to be an effective classifier for the ASR error detection with an Accuracy of 85.43%

    A Review of Accent-Based Automatic Speech Recognition Models for E-Learning Environment

    Get PDF
    The adoption of electronics learning (e-learning) as a method of disseminating knowledge in the global educational system is growing at a rapid rate, and has created a shift in the knowledge acquisition methods from the conventional classrooms and tutors to the distributed e-learning technique that enables access to various learning resources much more conveniently and flexibly. However, notwithstanding the adaptive advantages of learner-centric contents of e-learning programmes, the distributed e-learning environment has unconsciously adopted few international languages as the languages of communication among the participants despite the various accents (mother language influence) among these participants. Adjusting to and accommodating these various accents has brought about the introduction of accents-based automatic speech recognition into the e-learning to resolve the effects of the accent differences. This paper reviews over 50 research papers to determine the development so far made in the design and implementation of accents-based automatic recognition models for the purpose of e-learning between year 2001 and 2021. The analysis of the review shows that 50% of the models reviewed adopted English language, 46.50% adopted the major Chinese and Indian languages and 3.50% adopted Swedish language as the mode of communication. It is therefore discovered that majority of the ASR models are centred on the European, American and Asian accents, while unconsciously excluding the various accents peculiarities associated with the less technologically resourced continents

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF
    corecore