Search CORE

4 research outputs found

Top subset retrieval on large collections using sorted indices

Author: Ferguson Paul
Gurrin Cathal
Smeaton Alan F.
Wilkins Peter
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

In this poster we describe alternative inverted index structures that reduce the time required to process queries, produce a higher query throughput and still return high quality results to the end user. We give results based upon the TREC Terabyte dataset showing improvements that these indices give in terms of effectiveness and efficiency

CiteSeerX

Crossref

Irish Universities

DCU Online Research Access Service

Dublin City University at the TREC 2006 terabyte track

Author: Ferguson Paul
Smeaton Alan F.
Wilkins Peter
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2006
Field of study

For the 2006 Terabyte track in TREC, Dublin City University’s participation was focussed on the ad hoc search task. As per the pervious two years [7, 4], our experiments on the Terabyte track have concentrated on the evaluation of a sorted inverted index, the aim of which is to sort the postings within each posting list in such a way, that allows only a limited number of postings to be processed from each list, while at the same time minimising the loss of effectiveness in terms of query precision. This is done using the Físréal search system, developed at Dublin City University [4, 8]

Irish Universities

DCU Online Research Access Service

Dublin City University at the TREC 2005 terabyte track

Author: Ferguson Paul
Gurrin Cathal
Smeaton Alan F.
Wilkins Peter
Publication venue: 'University of Aden - Faculty of Economics and Administration'
Publication date: 01/11/2005
Field of study

For the 2005 Terabyte track in TREC Dublin City University participated in all three tasks: Adhoc, E±ciency and Named Page Finding. Our runs for TREC in all tasks were primarily focussed on the application of "Top Subset Retrieval" to the Terabyte Track. This retrieval utilises different types of sorted inverted indices so that less documents are processed in order to reduce query times, and is done so in a way that minimises loss of effectiveness in terms of query precision. We also compare a distributed version of our Físréal search system [1][2] against the same system deployed on a single machine

Irish Universities

DCU Online Research Access Service

Index ordering by query-independent measures

Author: Alan F. Smeaton
Amento
Anh
Anh
Anh
Baeza-Yates
Broder
Büttcher
Chakrabarti
Fagni
Ferguson
Garcia
Joachims
Joachims
Kleinberg
Moffat
Ntoulas
Park
Paul Ferguson
Persin
Plachouras
Robertson
Vapnik
Wang
Witten
Xue
Zhai
Zhang
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/05/2012
Field of study

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

Crossref

Irish Universities

DCU Online Research Access Service