23,233 research outputs found
Embedding Web-based Statistical Translation Models in Cross-Language Information Retrieval
Although more and more language pairs are covered by machine translation
services, there are still many pairs that lack translation resources.
Cross-language information retrieval (CLIR) is an application which needs
translation functionality of a relatively low level of sophistication since
current models for information retrieval (IR) are still based on a
bag-of-words. The Web provides a vast resource for the automatic construction
of parallel corpora which can be used to train statistical translation models
automatically. The resulting translation models can be embedded in several ways
in a retrieval model. In this paper, we will investigate the problem of
automatically mining parallel texts from the Web and different ways of
integrating the translation models within the retrieval process. Our
experiments on standard test collections for CLIR show that the Web-based
translation models can surpass commercial MT systems in CLIR tasks. These
results open the perspective of constructing a fully automatic query
translation device for CLIR at a very low cost.Comment: 37 page
Glasgow University at TRECVID 2006
In the first part of this paper we describe our experiments in the automatic and interactive search tasks of TRECVID 2006. We submitted five fully automatic runs, including a text baseline, two runs based on visual features, and two runs that combine textual and visual features in a graph model. For the interactive search, we have implemented a new video search interface with relevance feedback facilities, based on both textual and visual features.
The second part is concerned with our approach to the high-level feature extraction task, based on textual information extracted from speech recogniser and machine translation outputs. They were aligned with shots and associated with high-level feature references. A list of significant words was created for each feature, and it was in turn utilised for identification of a feature during the evaluation
Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR
The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light
term conation step and useful in case of few language-specific resources. For English, the corpusbased
stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR.
Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from
selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness
compared to using a fixed number of terms for different languages
User experiments with the Eurovision cross-language image retrieval system
In this paper we present Eurovision, a text-based system for cross-language (CL) image retrieval.
The system is evaluated by multilingual users for two search tasks with the system configured in
English and five other languages. To our knowledge this is the first published set of user
experiments for CL image retrieval. We show that: (1) it is possible to create a usable multilingual
search engine using little knowledge of any language other than English, (2) categorizing images
assists the user's search, and (3) there are differences in the way users search between the proposed
search tasks. Based on the two search tasks and user feedback, we describe important aspects of
any CL image retrieval system
Spoken content retrieval: A survey of techniques and technologies
Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR
Overview of the 2005 cross-language image retrieval track (ImageCLEF)
The purpose of this paper is to outline efforts from the 2005 CLEF crosslanguage image retrieval campaign (ImageCLEF). The aim of this CLEF track is to explore
the use of both text and content-based retrieval methods for cross-language image retrieval. Four tasks were offered in the ImageCLEF track: a ad-hoc retrieval from an historic photographic collection, ad-hoc retrieval from a medical collection, an automatic image annotation task, and a user-centered (interactive) evaluation task that is explained in the iCLEF summary. 24 research groups from a variety of backgrounds and nationalities (14 countries) participated in ImageCLEF. In this paper we describe the ImageCLEF tasks, submissions from participating groups and summarise the main fndings
Evaluating the implicit feedback models for adaptive video retrieval
Interactive video retrieval systems are becoming popular. On the one hand, these systems try to reduce the effect of the semantic gap, an issue currently being addressed by the multimedia retrieval community. On the other hand, such systems enhance the quality of information seeking for the user by supporting query formulation and reformulation. Interactive systems are very popular in the textual retrieval domain. However, they are relatively unexplored in the case of multimedia retrieval. The main problem in the development of interactive retrieval systems is the evaluation cost.The traditional evaluation methodology, as used in the information retrieval domain, is not applicable. An alternative is to use a user-centred evaluation methodology. However, such schemes are expensive in terms of effort, cost and are not scalable. This problem gets exacerbated by the use of implicit indicators, which are useful and increasingly used in predicting user intentions. In this paper, we explore the effectiveness of a number of interfaces and feedback mechanisms and compare their relative performance using a simulated evaluation methodology. The results show the relatively better performance of a search interface with the combination of explicit and implicit features
Document expansion for image retrieval
Successful information retrieval requires e�ective matching
between the user's search request and the contents of relevant
documents. Often the request entered by a user may
not use the same topic relevant terms as the authors' of the
documents. One potential approach to address problems
of query-document term mismatch is document expansion
to include additional topically relevant indexing terms in a
document which may encourage its retrieval when relevant
to queries which do not match its original contents well. We
propose and evaluate a new document expansion method
using external resources. While results of previous research
have been inconclusive in determining the impact of document
expansion on retrieval e�ectiveness, our method is
shown to work e�ectively for text-based image retrieval of
short image annotation documents. Our approach uses the
Okapi query expansion algorithm as a method for document
expansion. We further show improved performance can be
achieved by using a \document reduction" approach to include
only the signi�cant terms in a document in the expansion
process. Our experiments on the WikipediaMM task at
ImageCLEF 2008 show an increase of 16.5% in mean average
precision (MAP) compared to a variation of Okapi BM25 retrieval
model. To compare document expansion with query
expansion, we also test query expansion from an external resource
which leads an improvement by 9.84% in MAP over
our baseline. Our conclusion is that the document expansion
with document reduction and in combination with query expansion
produces the overall best retrieval results for shortlength
document retrieval. For this image retrieval task, we
also concluded that query expansion from external resource
does not outperform the document expansion method
Dublin City University at CLEF 2005: Experiments with the ImageCLEF St Andrew’s collection
The aim of the Dublin City University participation in the CLEF 2005 ImageCLEF St Andrew’s Collection task was to explore an alternative approach to exploiting text annotation and content-based retrieval in a novel combined way for pseudo relevance feedback (PRF). This method combines evidence from retrieved lists generated using
text and content-based retrieval to determine which documents will be assumed relevant for the PRF process. Unfortunately the results show that while standard textbased
PRF improves upon a no feedback text baseline, at present our new approach to combining evidence from text and content-based retrieval does not give further improve
improvement
- …