Search CORE

7 research outputs found

Recommended from our members

PLIERS at VLC2

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue
Publication date: 01/01/1999
Field of study

This paper describes experiments done on the VLC2 collection at TREC-7. Methods used for indexing text is described together with the results: this includes the official collections BASE1, plus some larger unofficial collections named BASE2 and BASE4. Search times on these collections are described and discussed with a particular emphasis on scaleup: for both weighted term search and passage retrieval. The various configurations for experiments are described

City Research Online

Recommended from our members

PLIERS AT TREC8

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue
Publication date: 01/01/2000
Field of study

The use of the PLIERS text retrieval system in TREC8 experiments is described. The tracks entered for are: Ad-Hoc, Filtering (Batch and Routing) and the Web Track (Large only). We describe both retrieval efficiency and effectiveness results for all these tracks. We also describe some preliminary experiments with BM_25 tuning constant variation

City Research Online

Recommended from our members

Parallel methods for the generation of partitioned inverted files

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue: 'Emerald'
Publication date: 01/10/2005
Field of study

Purpose – The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi‐gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach – We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings – The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications – The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value – The paper is of value to database administrators who manage large‐scale text collections, and who need to use parallel computing to implement their text retrieval services

City Research Online

Crossref

Recommended from our members

Distributed Inverted Files and Performance: A Study of Parallelism and Data Distribution Methods in IR

Author: Macfarlane A.
Publication venue
Publication date
Field of study

The study investigates the performance of parallel information retrieval (IR) algorithms on different data distribution methods for Inverted files to identify which is the best for the requirements of specific IR tasks. We define a data distribution method as a way of distributing Inverted file data to local disks on a parallel machine. A data distribution method may be on-the-fly (with one copy of the index held), replication (all nodes have all of the index) or partitioning (data for index is split amongst nodes). Partitioning of inverted file data can be done in many ways but we consider only two: by term (Termld) and by document (Dodd). Termld partitioning is a type of partitioning which distributes unique word data to a single partition, while D odd partitioning distributes unique document data to a single partition. We consider the issue of improving the performance of standard IR algorithms on these data distribution methods by looking at sequential job service not concurrent job service, e.g. we consider the issue of sequential query service not concurrent query service. This methodology rules out some distribution methods for some tasks studied. We consider the following main tasks of IR: indexing, search, passage retrieval, inverted file update and query optimisation for routing /filtering. We produce a synthetic performance model for each of these tasks for the purposes of comparison. We have two subsidiary aims; one was to demonstrate portability of our implemented data structures and algorithms on different parallel machines. Secondly, we also study the possibility of increased retrieval effectiveness by examining a larger section of the search space for both passage retrieval and routing/filtering. We consider the implications of concurrency in updates on Inverted files. Our theoretical and empirical results show that in most cases the D odd partitioning method is the best data distribution method apart from routing/filtering where replication was found to be superior

City Research Online

Pliers at VLC2

Author: A. Macfarlane
J.A. McCann
S.E. Robertson
Publication venue
Publication date
Field of study

: This paper describes experiments done on the VLC2 collection at TREC-7. Methods used for indexing text is described together with the results: this includes the official collections BASE1, plus some larger unofficial collections named BASE2 and BASE4. Search times on these collections are described and discussed with a particular emphasis on scaleup: for both weighted term search and passage retrieval. The various configurations for experiments are described. 1. INTRODUCTION This paper is a description of results gained using the PLIERS system on baselines of the VLC2 collection at TREC-7. The research is part of an ongoing effort to study the effects of different partitioning methods for Inverted files in parallel IR systems. In particular we wish to find out which partitioning method yields the best Indexing and Search results with respect to elapsed time as seen by the user. We present the official results for BASE1 only of VLC2, but include some further results from experiments..

CiteSeerX

Pliers At Vlc2

Author: Macfarlane Robertson
Publication venue
Publication date
Field of study

CiteSeerX