Search CORE

1,425 research outputs found

Comprehensive characterization of an open source document search engine

Author: Antoniou Georgia
Hadjilambrou Zacharias
Kleanthous Marios
Portero Antoni
Sazeides Yiannakis
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

This work performs a thorough characterization and analysis of the open source Lucene search library. The article describes in detail the architecture, functionality, and micro-architectural behavior of the search engine, and investigates prominent online document search research issues. In particular, we study how intra-server index partitioning affects the response time and throughput, explore the potential use of low power servers for document search, and examine the sources of performance degradation ands the causes of tail latencies. Some of our main conclusions are the following: (a) intra-server index partitioning can reduce tail latencies but with diminishing benefits as incoming query traffic increases, (b) low power servers given enough partitioning can provide same average and tail response times as conventional high performance servers, (c) index search is a CPU-intensive cache-friendly application, and (d) C-states are the main culprits for performance degradation in document search.Web of Science162art. no. 1

DSpace at VSB Technical University of Ostrava

Recommended from our members

Parallel methods for the generation of partitioned inverted files

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue: 'Emerald'
Publication date: 01/10/2005
Field of study

Purpose – The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi‐gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach – We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings – The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications – The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value – The paper is of value to database administrators who manage large‐scale text collections, and who need to use parallel computing to implement their text retrieval services

City Research Online

Crossref

Recommended from our members

Parallel computing in information retrieval - An updated review

The progress of parallel computing in Information Retrieval (IR) is reviewed. In particular we stress the importance of the motivation in using parallel computing for Text Retrieval. We analyse parallel IR systems using a classification due to Rasmussen [1] and describe some parallel IR systems. We give a description of the retrieval models used in parallel Information Processing.. We describe areas of research which we believe are needed

City Research Online

Crossref

Recommended from our members

PLIERS: A Parallel Information Retrieval System Using MPI

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/1999
Field of study

The use of MPI in implementing algorithms for Parallel Information Retrieval Systems is outlined. We include descriptions on methods for Indexing, Search and Update of Inverted Indexes as well as a method for Information Filtering. In Indexing we describe both local build and distributed build methods. Our description of Document Search includes that for Term Weighting, Boolean, Proximity and Passage Retrieval Operations. Document Update issues are centred on how partitioning methods are supported. We describe the implementation of term selection algorithms for Information Filtering and finally work in progress is outlined

City Research Online

Crossref

Recommended from our members

PLIERS at VLC2

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue
Publication date: 01/01/1999
Field of study

This paper describes experiments done on the VLC2 collection at TREC-7. Methods used for indexing text is described together with the results: this includes the official collections BASE1, plus some larger unofficial collections named BASE2 and BASE4. Search times on these collections are described and discussed with a particular emphasis on scaleup: for both weighted term search and passage retrieval. The various configurations for experiments are described

City Research Online

Vertical framing of superimposed signature files using partial evaluation of queries

Author: Can F.
Kocberber A. S.
Publication venue: 'Elsevier BV'
Publication date: 01/05/1997
Field of study

Cataloged from PDF version of article.A new signature file method, Multi-Frame Signature File (MFSF), is introduced by extending the bit-sliced signature file method. In MFSF a signature file is divided into variable sized vertical frames with different on-bit densities to optimize the response time using a partial query evaluation methodology. In query evaluation the on-bits of the lower on-bit density frames are used first. As the number of query terms increases, the number of query signature on-bits in the lower on-bit density frames increases and the query stopping condition is reached in fewer evaluation steps. Therefore, in MFSF, the query evaluation time decreases for increasing numbers of query terms. Under the sequentiality assumption of disk blocks, in a PC environment with 30 ms average disk seek time, MFSF provides a projected worst-case response time of 3.54 seconds for a database size of one million records in a uniform distribution multi-term query environment with 1-5 terms per query. Due to partial evaluation, this desired response time is guaranteed for queries with several terms. The comparison of MFSF with the inverted file approach shows that MFSF provides promising research opportunities. (C) 1997 Elsevier Science Ltd

Bilkent University Institutional Repository

Peer to Peer Information Retrieval: An Overview

Author: Hiemstra Djoerd
Tigelaar Almer S.
Trieschnigg Dolf
Publication venue: ACM
Publication date: 01/01/2012
Field of study

Peer-to-peer technology is widely used for file sharing. In the past decade a number of prototype peer-to-peer information retrieval systems have been developed. Unfortunately, none of these have seen widespread real- world adoption and thus, in contrast with file sharing, information retrieval is still dominated by centralised solutions. In this paper we provide an overview of the key challenges for peer-to-peer information retrieval and the work done so far. We want to stimulate and inspire further research to overcome these challenges. This will open the door to the development and large-scale deployment of real-world peer-to-peer information retrieval systems that rival existing centralised client-server solutions in terms of scalability, performance, user satisfaction and freedom

Radboud Repository

University of Twente Research Information