1,567 research outputs found

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Proceedings of the ACM SIGIR Workshop ''Searching Spontaneous Conversational Speech''

    Get PDF

    Spoken term detection ALBAYZIN 2014 evaluation: overview, systems, results, and discussion

    Get PDF
    The electronic version of this article is the complete one and can be found online at: http://dx.doi.org/10.1186/s13636-015-0063-8Spoken term detection (STD) aims at retrieving data from a speech repository given a textual representation of the search term. Nowadays, it is receiving much interest due to the large volume of multimedia information. STD differs from automatic speech recognition (ASR) in that ASR is interested in all the terms/words that appear in the speech data, whereas STD focuses on a selected list of search terms that must be detected within the speech data. This paper presents the systems submitted to the STD ALBAYZIN 2014 evaluation, held as a part of the ALBAYZIN 2014 evaluation campaign within the context of the IberSPEECH 2014 conference. This is the first STD evaluation that deals with Spanish language. The evaluation consists of retrieving the speech files that contain the search terms, indicating their start and end times within the appropriate speech file, along with a score value that reflects the confidence given to the detection of the search term. The evaluation is conducted on a Spanish spontaneous speech database, which comprises a set of talks from workshops and amounts to about 7 h of speech. We present the database, the evaluation metrics, the systems submitted to the evaluation, the results, and a detailed discussion. Four different research groups took part in the evaluation. Evaluation results show reasonable performance for moderate out-of-vocabulary term rate. This paper compares the systems submitted to the evaluation and makes a deep analysis based on some search term properties (term length, in-vocabulary/out-of-vocabulary terms, single-word/multi-word terms, and in-language/foreign terms).This work has been partly supported by project CMC-V2 (TEC2012-37585-C02-01) from the Spanish Ministry of Economy and Competitiveness. This research was also funded by the European Regional Development Fund, the Galician Regional Government (GRC2014/024, “Consolidation of Research Units: AtlantTIC Project” CN2012/160)

    Multimedia Retrieval

    Get PDF

    Using Zero-Resource Spoken Term Discovery for Ranked Retrieval

    Full text link
    Research on ranked retrieval of spoken con-tent has assumed the existence of some auto-mated (word or phonetic) transcription. Re-cently, however, methods have been demon-strated for matching spoken terms to spoken content without the need for language-tuned transcription. This paper describes the first application of such techniques to ranked re-trieval, evaluated using a newly created test collection. Both the queries and the collection to be searched are based on Gujarati produced naturally by native speakers; relevance assess-ment was performed by other native speak-ers of Gujarati. Ranked retrieval is based on fast acoustic matching that identifies a deeply nested set of matching speech regions, cou-pled with ways of combining evidence from those matching regions. Results indicate that the resulting ranked lists may be useful for some practical similarity-based ranking tasks.

    Adaptation and Augmentation: Towards Better Rescoring Strategies for Automatic Speech Recognition and Spoken Term Detection

    Full text link
    Selecting the best prediction from a set of candidates is an essential problem for many spoken language processing tasks, including automatic speech recognition (ASR) and spoken keyword spotting (KWS). Generally, the selection is determined by a confidence score assigned to each candidate. Calibrating these confidence scores (i.e., rescoring them) could make better selections and improve the system performance. This dissertation focuses on using tailored language models to rescore ASR hypotheses as well as keyword search results for ASR-based KWS. This dissertation introduces three kinds of rescoring techniques: (1) Freezing most model parameters while fine-tuning the output layer in order to adapt neural network language models (NNLMs) from the written domain to the spoken domain. Experiments on a large-scale Italian corpus show a 30.2% relative reduction in perplexity at the word-cluster level and a 2.3% relative reduction in WER in a state-of-the-art Italian ASR system. (2) Incorporating source application information associated with speech queries. By exploring a range of adaptation model architectures, we achieve a 21.3% relative reduction in perplexity compared to a fine-tuned baseline. Initial experiments using a state-of-the-art Italian ASR system show a 3.0% relative reduction in WER on top of an unadapted 5-gram LM. In addition, human evaluations show significant improvements by using the source application information. (3) Marrying machine learning algorithms (classification and ranking) with a variety of signals to rescore keyword search results in the context of KWS for low-resource languages. These systems, built for the IARPA BABEL Program, enhance search performance in terms of maximum term-weighted value (MTWV) across six different low-resource languages: Vietnamese, Tagalog, Pashto, Turkish, Zulu and Tamil

    Searching Spontaneous Conversational Speech:Proceedings of ACM SIGIR Workshop (SSCS2008)

    Get PDF

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
    • 

    corecore