4,454 research outputs found

    Topic based language models for ad hoc information retrieval

    Get PDF
    We propose a topic based approach lo language modelling for ad-hoc Information Retrieval (IR). Many smoothed estimators used for the multinomial query model in IR rely upon the estimated background collection probabilities. In this paper, we propose a topic based language modelling approach, that uses a more informative prior based on the topical content of a document. In our experiments, the proposed model provides comparable IR performance to the standard models, but when combined in a two stage language model, it outperforms all other estimated models

    KERT: Automatic Extraction and Ranking of Topical Keyphrases from Content-Representative Document Titles

    Full text link
    We introduce KERT (Keyphrase Extraction and Ranking by Topic), a framework for topical keyphrase generation and ranking. By shifting from the unigram-centric traditional methods of unsupervised keyphrase extraction to a phrase-centric approach, we are able to directly compare and rank phrases of different lengths. We construct a topical keyphrase ranking function which implements the four criteria that represent high quality topical keyphrases (coverage, purity, phraseness, and completeness). The effectiveness of our approach is demonstrated on two collections of content-representative titles in the domains of Computer Science and Physics.Comment: 9 page

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
    corecore