8,679 research outputs found

    DCU at the NTCIR-12 SpokenQuery&Doc-2 task

    Get PDF
    We describe DCUā€™s participation in the NTCIR-12 SpokenQuery&Doc (SQD-2) task. In the context of the slide-group retrieval sub-task, we experiment with a passage retrieval method that re-scores each passage according to the relevance score of the document from which the passage is taken. This is performed by linearly interpolating their relevance scores which are calculated using the Okapi BM25 model of probabilistic retrieval for passages and documents independently. In conjunction with this, we assess the benefits of using pseudo-relevance feedback for expanding the textual representation of the spoken queries with terms found in the top-ranked documents and passages, and experiment with a general multidimensional optimisation method to jointly tune the BM25 and query expansion parameters with queries and relevance data from the NTCIR-11 SQD-1 task. Retrieval experiments performed over the SQD-1 and SQD-2 queries confirm previous findings which affirm that integrating document information when ranking passages can lead to improved passage retrieval effectiveness. Furthermore, results indicate that no significant gains in retrieval effectiveness can be obtained by using query expansion in combination with our retrieval models over these two query sets

    Spoken query processing for interactive information retrieval

    Get PDF
    It has long been recognised that interactivity improves the effectiveness of information retrieval systems. Speech is the most natural and interactive medium of communication and recent progress in speech recognition is making it possible to build systems that interact with the user via speech. However, given the typical length of queries submitted to information retrieval systems, it is easy to imagine that the effects of word recognition errors in spoken queries must be severely destructive on the system's effectiveness. The experimental work reported in this paper shows that the use of classical information retrieval techniques for spoken query processing is robust to considerably high levels of word recognition errors, in particular for long queries. Moreover, in the case of short queries, both standard relevance feedback and pseudo relevance feedback can be effectively employed to improve the effectiveness of spoken query processing

    Vocal Access to a Newspaper Archive: Design Issues and Preliminary Investigation

    Get PDF
    This paper presents the design and the current prototype implementation of an interactive vocal Information Retrieval system that can be used to access articles of a large newspaper archive using a telephone. The results of preliminary investigation into the feasibility of such a system are also presented

    Language Modeling for Multi-Domain Speech-Driven Text Retrieval

    Full text link
    We report experimental results associated with speech-driven text retrieval, which facilitates retrieving information in multiple domains with spoken queries. Since users speak contents related to a target collection, we produce language models used for speech recognition based on the target collection, so as to improve both the recognition and retrieval accuracy. Experiments using existing test collections combined with dictated queries showed the effectiveness of our method

    Users' perception of relevance of spoken documents

    Get PDF
    We present the results of a study of user's perception of relevance of documents. The aim is to study experimentally how users' perception varies depending on the form that retrieved documents are presented. Documents retrieved in response to a query are presented to users in a variety of ways, from full text to a machine spoken query-biased automatically-generated summary, and the difference in users' perception of relevance is studied. The experimental results suggest that the effectiveness of advanced multimedia information retrieval applications may be affected by the low level of users' perception of relevance of retrieved documents

    Search of spoken documents retrieves well recognized transcripts

    Get PDF
    This paper presents a series of analyses and experiments on spoken document retrieval systems: search engines that retrieve transcripts produced by speech recognizers. Results show that transcripts that match queries well tend to be recognized more accurately than transcripts that match a query less well. This result was described in past literature, however, no study or explanation of the effect has been provided until now. This paper provides such an analysis showing a relationship between word error rate and query length. The paper expands on past research by increasing the number of recognitions systems that are tested as well as showing the effect in an operational speech retrieval system. Potential future lines of enquiry are also described

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Subword-based Indexing for a Minimal False Positive Rate

    Get PDF
    Subword-based Indexing for a Minimal False Positive Rat
    • ā€¦
    corecore