Search CORE

274,984 research outputs found

Enhancing Content-And-Structure Information Retrieval using a Native XML Database

Author: Pehcevski Jovan
Thom James A.
Vercoustre Anne-Marie
Publication venue
Publication date: 01/01/2004
Field of study

Three approaches to content-and-structure XML retrieval are analysed in this paper: first by using Zettair, a full-text information retrieval system; second by using eXist, a native XML database, and third by using a hybrid XML retrieval system that uses eXist to produce the final answers from likely relevant articles retrieved by Zettair. INEX 2003 content-and-structure topics can be classified in two categories: the first retrieving full articles as final answers, and the second retrieving more specific elements within articles as final answers. We show that for both topic categories our initial hybrid system improves the retrieval effectiveness of a native XML database. For ranking the final answer elements, we propose and evaluate a novel retrieval model that utilises the structural relationships between the answer elements of a native XML database and retrieves Coherent Retrieval Elements. The final results of our experiments show that when the XML retrieval task focusses on highly relevant elements our hybrid XML retrieval system with the Coherent Retrieval Elements module is 1.8 times more effective than Zettair and 3 times more effective than eXist, and yields an effective content-and-structure XML retrieval

arXiv.org e-Print Archive

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Building a domain-specific document collection for evaluating metadata effects on information retrieval

Author: Jones Gareth J.F.
Leveling Johannes
Magdy Walid
Min Jinming
Publication venue: European Language Resources Association
Publication date: 01/05/2010
Field of study

This paper describes the development of a structured document collection containing user-generated text and numerical metadata for exploring the exploitation of metadata in information retrieval (IR). The collection consists of more than 61,000 documents extracted from YouTube video pages on basketball in general and NBA (National Basketball Association) in particular, together with a set of 40 topics and their relevance judgements. In addition, a collection of nearly 250,000 user profiles related to the NBA collection is available. Several baseline IR experiments report the effect of using video-associated metadata on retrieval effectiveness. The results surprisingly show that searching the videos titles only performs significantly better than searching additional metadata text fields of the videos such as the tags or the description

CiteSeerX

Irish Universities

DCU Online Research Access Service

Using Text Segmentation to Enhance the Cluster Hypothesis

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user’s query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant

Okina

A citation-based review of study on image retrieval

Author: Cai Xin
Wang Yanyan
Zhao Yuehua
Publication venue: 'iSchools'
Publication date: 01/01/2018
Field of study

Driven by the development of the information retrieval technologies, image retrieval has been studied for more than several decades. This study centers on revealing the current status and future directions of image retrieval based on reviewing previous related studies. The citation-based analysis was applied to 2243 articles retrieved from Web of Science database. The time series plots of the citation relationships between the retrieved articles reveal a fundamental research article that lay the foundation for the image retrieval field. Co-citation analysis identifies that the existing studies formed two clusters. Each cluster represents one of the two major areas in the field of image retrieval: the text-based image retrieval and the content-based image retrieval. The visualization map shows that the research of content-based image retrieval has received more attention than the area of text-based image retrieval. Relevance feedback was identified as a promising research direction for the future study

Illinois Digital Environment for Access to Learning and Scholarship Repository

A comparative study of probabilistic and language models for information retrieval

Author: Bennett G
Scholer F
Uitdenbogerd A
Publication venue: CRPIT (Australia)
Publication date: 01/01/2008
Field of study

Language models for information retrieval have received much attention in recent years, with many claims being made about their performance. However, previous studies evaluating the language modelling approach for information retrieval used different query sets and heterogeneous collections, which make reported results difficult to compare. This research is a broad-based study that evaluates language models against a variety of search tasks --- topic finding, named-page finding and topic distillation. The standard Text REtrieval Conference (TREC) methodology is used to compare language models to the probabilistic Okapi BM25 system. Using consistent parameter choices, we compare results of different language models on three different search tasks, multiple query sets and three different text collections. For ad hoc retrieval, the Dirichlet smoothing method was found to be significantly better than Okapi BM25, but for named-page finding Okapi BM25 was more effective than the language modelling methods. Optimal smoothing parameters for each method were found to be dependent on the collection and the query set. For longer queries, the language modelling approaches required more aggressive smoothing but they were found to be more effective than with shorter queries. The choice of smoothing method was also found to have a significant effect on the performance of language models for information retrieval

RMIT Research Repository

Elaboration over a Discourse Facilitates Retrieval in Sentence Processing.

Author: Hofmeister Philip
Kutas Marta
Troyer Melissa
Publication venue: eScholarship, University of California
Publication date: 01/01/2016
Field of study

Language comprehension requires access to stored knowledge and the ability to combine knowledge in new, meaningful ways. Previous work has shown that processing linguistically more complex expressions ('Texas cattle rancher' vs. 'rancher') leads to slow-downs in reading during initial processing, possibly reflecting effort in combining information. Conversely, when this information must subsequently be retrieved (as in filler-gap constructions), processing is facilitated for more complex expressions, possibly because more semantic cues are available during retrieval. To follow up on this hypothesis, we tested whether information distributed across a short discourse can similarly provide effective cues for retrieval. Participants read texts introducing two referents (e.g., two senators), one of whom was described in greater detail than the other (e.g., 'The Democrat had voted for one of the senators, and the Republican had voted for the other, a man from Ohio who was running for president'). The final sentence (e.g., 'The senator who the {Republican/Democrat}had voted for…') contained a relative clause picking out either the Many-Cue referent (with 'Republican') or the One-Cue referent (with 'Democrat'). We predicted facilitated retrieval (faster reading times) for the Many-Cue condition at the verb region ('had voted for'), where readers could understand that 'The senator' is the object of the verb. As predicted, this pattern was observed at the retrieval region and continued throughout the rest of the sentence. Participants also completed the Author/Magazine Recognition Tests (ART/MRT; Stanovich and West, 1989), providing a proxy for world knowledge. Since higher ART/MRT scores may index (a) greater experience accessing relevant knowledge and/or (b) richer/more highly structured representations in semantic memory, we predicted it would be positively associated with effects of elaboration on retrieval. We did not observe the predicted interaction between ART/MRT scores and Cue condition at the retrieval region, though ART/MRT interacted with Cue condition in other locations in the sentence. In sum, we found that providing more elaborative information over the course of a text can facilitate retrieval for referents, consistent with a framework in which referential elaboration over a discourse and not just local linguistic information directly impacts information retrieval during sentence processing

Directory of Open Access Journals

Frontiers - Publisher Connector

PubMed Central

eScholarship - University of California