Search CORE

1,567 research outputs found

Spoken content retrieval: A survey of techniques and technologies

Author: Ani Nenkova
C A. Nenkova
K. Mckeown
Kathleen Mckeown
Publication venue: 'Now Publishers'
Publication date: 01/01/2012
Field of study

Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

Author: Raaijmakers Stephan
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 27/07/2007
Field of study

University of Twente Research Information

Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

Author: A Abad
A Cardenal-Lopez
A Cardenal-López
A Jansen
A Jansen
A Martin
A Moreno
A Moreno
A Moreno-Sandoval
A Stolcke
Alejandro Coucheiro-Limeres
AM Azmi
Antonio Cardenal
Antonio Miguel
B Logan
B Logan
B Ma
B Taras
B Zhang
C Ni
C Parada
Carmen Garcia-Mateo
CJ Chen
D Can
D Karakos
D Povey
D Vergyri
D Vergyri
Doroteo T. Toledano
F Metze
F Metze
GJF Jones
H Joho
H Joho
H Su
H-Y Lee
H-Y Lee
HVD Heuvel
I Szöke
I Szöke
I-F Chen
I-F Chen
J Chiu
J Chiu
J Chiu
J Garofolo
J Li
J Mamou
J Mamou
J Pinto
J Tejedor
J Tejedor
J Trmal
J van Hout
Javier Tejedor
JG Fiscus
Julia Olcoz
Julian David Echeverry-Correa
K Iwata
K Thambiratmann
KM Knill
KM Knill
L Docío-Fernández
L Mangu
Laura Docio-Fernandez
LJ Rodríguez-Fuentes
M Bisani
M Cai
M Ma
M Saraclar
M Wollmer
M Zelenák
MJF Gales
MS Seigel
N Rajput
NF Chen
NF Chen
P Yu
Paula Lopez-Otero
R Justo
S Nakagawa
SP Rath
T Ng
T Ohno
T Sakai
V Mitra
V-B Le
X Anguera
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Springer - Publisher Connector

Repositorio Universidad de Zaragoza

Biblos-e Archivo

Multimedia Retrieval

Author
Publication venue: Springer
Publication date: 01/01/2007
Field of study

University of Twente Research Information

Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

Author: Aren Jansen
Douglas W. Oard
Jerome White
Jiaul Paik
Rashmi Sankepally
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2015
Field of study

Research on ranked retrieval of spoken con-tent has assumed the existence of some auto-mated (word or phonetic) transcription. Re-cently, however, methods have been demon-strated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked re-trieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assess-ment was performed by other native speak-ers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, cou-pled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

CiteSeerX

Crossref

Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection

Author: Ma Min
Publication venue: CUNY Academic Works
Publication date: 01/05/2018
Field of study

Selecting the best prediction from a set of candidates is an essential problem for many spoken language processing tasks, including automatic speech recognition (ASR) and spoken keyword spotting (KWS). Generally, the selection is determined by a confidence score assigned to each candidate. Calibrating these confidence scores (i.e., rescoring them) could make better selections and improve the system performance. This dissertation focuses on using tailored language models to rescore ASR hypotheses as well as keyword search results for ASR-based KWS. This dissertation introduces three kinds of rescoring techniques: (1) Freezing most model parameters while fine-tuning the output layer in order to adapt neural network language models (NNLMs) from the written domain to the spoken domain. Experiments on a large-scale Italian corpus show a 30.2% relative reduction in perplexity at the word-cluster level and a 2.3% relative reduction in WER in a state-of-the-art Italian ASR system. (2) Incorporating source application information associated with speech queries. By exploring a range of adaptation model architectures, we achieve a 21.3% relative reduction in perplexity compared to a fine-tuned baseline. Initial experiments using a state-of-the-art Italian ASR system show a 3.0% relative reduction in WER on top of an unadapted 5-gram LM. In addition, human evaluations show significant improvements by using the source application information. (3) Marrying machine learning algorithms (classification and ranking) with a variety of signals to rescore keyword search results in the context of KWS for low-resource languages. These systems, built for the IARPA BABEL Program, enhance search performance in terms of maximum term-weighted value (MTWV) across six different low-resource languages: Vietnamese, Tagalog, Pashto, Turkish, Zulu and Tamil

City University of New York

Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

Author: Kraaij W.
Larson M.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 24/07/2008
Field of study

University of Twente Research Information

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive