1,120 research outputs found
The State-of-the-arts in Focused Search
The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems
Evaluation campaigns and TRECVid
The TREC Video Retrieval Evaluation (TRECVid) is an
international benchmarking activity to encourage research
in video information retrieval by providing a large test collection, uniform scoring procedures, and a forum for organizations interested in comparing their results. TRECVid completed its fifth annual cycle at the end of 2005 and in 2006 TRECVid will involve almost 70 research organizations, universities and other consortia. Throughout its existence, TRECVid has benchmarked both interactive and automatic/manual searching for shots from within a video
corpus, automatic detection of a variety of semantic and
low-level video features, shot boundary detection and the
detection of story boundaries in broadcast TV news. This
paper will give an introduction to information retrieval (IR) evaluation from both a user and a system perspective, highlighting that system evaluation is by far the most prevalent type of evaluation carried out. We also include a summary of TRECVid as an example of a system evaluation benchmarking campaign and this allows us to discuss whether
such campaigns are a good thing or a bad thing. There are
arguments for and against these campaigns and we present
some of them in the paper concluding that on balance they
have had a very positive impact on research progress
PFTijah: text search in an XML database system
This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery
The Wikipedia Image Retrieval Task
The wikipedia image retrieval task at ImageCLEF provides a testbed for the system-oriented evaluation of visual information retrieval from a collection of Wikipedia images. The aim is to investigate the effectiveness of retrieval approaches that exploit textual and visual evidence in the context of a large and heterogeneous collection of images that are searched for by users with diverse information needs. This chapter presents an overview of the available test collections, summarises the retrieval approaches employed by the groups that participated in the task during the 2008 and 2009 ImageCLEF campaigns, provides an analysis of the main evaluation results, identifies best practices for effective retrieval, and discusses open issues
Seven years of INEX interactive retrieval experiments – lessons and challenges
This paper summarizes a major effort in interactive search investigation,
the INEX i-track, a collective effort run over a seven-year period. We present
the experimental conditions, report some of the findings of the participating
groups, and examine the challenges posed by this kind of collective experimental
effort
Overview of the wikipediaMM task at ImageCLEF 2008
The wikipediaMM task provides a testbed for the system-oriented evaluation of ad-hoc retrieval from a large collection of Wikipedia images. It became a part of the ImageCLEF evaluation campaign in 2008 with the aim of investigating the use of visual and textual sources in combination for improving the retrieval performance. This paper presents an overview of the task¿s resources, topics, assessments, participants' approaches, and main results
Recommended from our members
XML-structured documents: Retrievable units and inheritance
We consider the retrieval of XML-structured documents, and of passages from such documents, defined as elements of the XML structure. These are considered from the point of view of passage retrieval, as a form of document retrieval. A retrievable unit (an element chosen as defining suitable passages for retrieval) is a textual document in its own right, but may inherit information from the other parts of the same document. Again, this inheritance is defined in terms of the XML structure. All retrievable units are mapped onto a common field structure, and the ranking function is a standard document retrieval function with a suitable field weighting. A small experiment to demonstrate the idea, using INEX data, is described
Sound ranking algorithms for XML search
Ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behavior of different approaches to ranking content-and-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard
A survey on tree matching and XML retrieval
International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval
Entity Ranking in Wikipedia
The traditional entity extraction problem lies in the ability of extracting
named entities from plain text using natural language processing techniques and
intensive training from large document collections. Examples of named entities
include organisations, people, locations, or dates. There are many research
activities involving named entities; we are interested in entity ranking in the
field of information retrieval. In this paper, we describe our approach to
identifying and ranking entities from the INEX Wikipedia document collection.
Wikipedia offers a number of interesting features for entity identification and
ranking that we first introduce. We then describe the principles and the
architecture of our entity ranking system, and introduce our methodology for
evaluation. Our preliminary results show that the use of categories and the
link structure of Wikipedia, together with entity examples, can significantly
improve retrieval effectiveness.Comment: to appea
- …