Search CORE

136 research outputs found

Fast and Accurate OOV Decoder on High-Level Features

Author: Khokhlov Yuri
Medennikov Ivan
Romanenko Alexei
Tomashenko Natalia
Publication venue
Publication date: 19/07/2017
Field of study

This work proposes a novel approach to out-of-vocabulary (OOV) keyword search (KWS) task. The proposed approach is based on using high-level features from an automatic speech recognition (ASR) system, so called phoneme posterior based (PPB) features, for decoding. These features are obtained by calculating time-dependent phoneme posterior probabilities from word lattices, followed by their smoothing. For the PPB features we developed a special novel very fast, simple and efficient OOV decoder. Experimental results are presented on the Georgian language from the IARPA Babel Program, which was the test language in the OpenKWS 2016 evaluation campaign. The results show that in terms of maximum term weighted value (MTWV) metric and computational speed, for single ASR systems, the proposed approach significantly outperforms the state-of-the-art approach based on using in-vocabulary proxies for OOV keywords in the indexed database. The comparison of the two OOV KWS approaches on the fusion results of the nine different ASR systems demonstrates that the proposed OOV decoder outperforms the proxy-based approach in terms of MTWV metric given the comparable processing speed. Other important advantages of the OOV decoder include extremely low memory consumption and simplicity of its implementation and parameter optimization.Comment: Interspeech 2017, August 2017, Stockholm, Sweden. 201

arXiv.org e-Print Archive

Crossref

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

Author: Kraaij W.
Larson M.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 24/07/2008
Field of study

University of Twente Research Information

Holistic Vocabulary Independent Spoken Term Detection

Author: Schneider Daniel
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Within this thesis, we aim at designing a loosely coupled holistic system for Spoken Term Detection (STD) on heterogeneous German broadcast data in selected application scenarios. Starting from STD on the 1-best output of a word-based speech recognizer, we study the performance of several subword units for vocabulary independent STD on a linguistically and acoustically challenging German corpus. We explore the typical error sources in subword STD, and find that they differ from the error sources in word-based speech search. We select, extend and combine a set of state-of-the-art methods for error compensation in STD in order to explicitly merge the corresponding STD error spaces through anchor-based approximate lattice retrieval. Novel methods for STD result verification are proposed in order to increase retrieval precision by exploiting external knowledge at search time. Error-compensating methods for STD typically suffer from high response times on large scale databases, and we propose scalable approaches suitable for large corpora. Highest STD accuracy is obtained by combining anchor-based approximate retrieval from both syllable lattice ASR and syllabified word ASR into a hybrid STD system, and pruning the result list using external knowledge with hybrid contextual and anti-query verification.Die vorliegende Arbeit beschreibt ein lose gekoppeltes, ganzheitliches System zur Sprachsuche auf heterogenenen deutschen Sprachdaten in unterschiedlichen Anwendungsszenarien. Ausgehend von einer wortbasierten Sprachsuche auf dem Transkript eines aktuellen Wort-Erkenners werden zunächst unterschiedliche Subwort-Einheiten für die vokabularunabhängige Sprachsuche auf deutschen Daten untersucht. Auf dieser Basis werden die typischen Fehlerquellen in der Subwort-basierten Sprachsuche analysiert. Diese Fehlerquellen unterscheiden sich vom Fall der klassichen Suche im Worttranskript und müssen explizit adressiert werden. Die explizite Kompensation der unterschiedlichen Fehlerquellen erfolgt durch einen neuartigen hybriden Ansatz zur effizienten Ankerbasierten unscharfen Wortgraph-Suche. Darüber hinaus werden neuartige Methoden zur Verifikation von Suchergebnissen vorgestellt, die zur Suchzeit verfügbares externes Wissen einbeziehen. Alle vorgestellten Verfahren werden auf einem umfangreichen Satz von deutschen Fernsehdaten mit Fokus auf ausgewählte, repräsentative Einsatzszenarien evaluiert. Da Methoden zur Fehlerkompensation in der Sprachsuchforschung typischerweise zu hohen Laufzeiten bei der Suche in großen Archiven führen, werden insbesondere auch Szenarien mit sehr großen Datenmengen betrachtet. Die höchste Suchleistung für Archive mittlerer Größe wird durch eine unscharfe und Anker-basierte Suche auf einem hybriden Index aus Silben-Wortgraphen und silbifizierter Wort-Erkennung erreicht, bei der die Suchergebnisse mit hybrider Verifikation bereinigt werden

bonndoc – Der Publikationsserver der Universität Bonn

End-to-End Open Vocabulary Keyword Search With Multilingual Neural Representations

Author: Cernocky Jan
Saraclar Murat
Yusuf Bolaji
Publication venue
Publication date: 15/08/2023
Field of study

Conventional keyword search systems operate on automatic speech recognition (ASR) outputs, which causes them to have a complex indexing and search pipeline. This has led to interest in ASR-free approaches to simplify the search procedure. We recently proposed a neural ASR-free keyword search model which achieves competitive performance while maintaining an efficient and simplified pipeline, where queries and documents are encoded with a pair of recurrent neural network encoders and the encodings are combined with a dot-product. In this article, we extend this work with multilingual pretraining and detailed analysis of the model. Our experiments show that the proposed multilingual training significantly improves the model performance and that despite not matching a strong ASR-based conventional keyword search system for short queries and queries comprising in-vocabulary words, the proposed model outperforms the ASR-based system for long queries and queries that do not appear in the training data.Comment: Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing (TASLP), 202

arXiv.org e-Print Archive

Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems

Author: Margan Domagoj
Martinčić-Ipšić Sanda
Meštrović Ana
Publication venue
Publication date: 30/07/2015
Field of study

Recently, the focus of complex networks research has shifted from the analysis of isolated properties of a system toward a more realistic modeling of multiple phenomena - multilayer networks. Motivated by the prosperity of multilayer approach in social, transport or trade systems, we propose the introduction of multilayer networks for language. The multilayer network of language is a unified framework for modeling linguistic subsystems and their structural properties enabling the exploration of their mutual interactions. Various aspects of natural language systems can be represented as complex networks, whose vertices depict linguistic units, while links model their relations. The multilayer network of language is defined by three aspects: the network construction principle, the linguistic subsystem and the language of interest. More precisely, we construct a word-level (syntax, co-occurrence and its shuffled counterpart) and a subword level (syllables and graphemes) network layers, from five variations of original text (in the modeled language). The obtained results suggest that there are substantial differences between the networks structures of different language subsystems, which are hidden during the exploration of an isolated layer. The word-level layers share structural properties regardless of the language (e.g. Croatian or English), while the syllabic subword level expresses more language dependent structural properties. The preserved weighted overlap quantifies the similarity of word-level layers in weighted and directed networks. Moreover, the analysis of motifs reveals a close topological structure of the syntactic and syllabic layers for both languages. The findings corroborate that the multilayer network framework is a powerful, consistent and systematic approach to model several linguistic subsystems simultaneously and hence to provide a more unified view on language

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Identifying cognates in English-Dutch and French-Dutch by means of orthographic information and cross-lingual word embeddings

Author: Labat Sofie
Lefever Els
Singh Pranaydeep
Publication venue: European Language Resources Association (ELRA)
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography