Search CORE

100,744 research outputs found

Challenging Ubiquitous Inverted Files

Author: Vries A.P. de
Publication venue: European Research Consortium for Informatics and Mathematics (ERCIM)
Publication date: 01/01/2000
Field of study

Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

CWI's Institutional Repository

University of Twente Research Information

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

Recommended from our members

Parallel methods for the update of partitioned inverted files

Author: A. MacFarlane
David Bawden
J.A. McCann
S.E. Robertson
Publication venue: 'Emerald'
Publication date: 12/07/2007
Field of study

Purpose – An issue which tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. In this paper we study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier. Design/methodology/approach – Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. We use standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions. Findings – Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context. Practical implications – There is an increasing need to service updates which is now becoming a requirement of inverted files (for dynamic collections such as the Web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past. Originality/value – The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services

City Research Online

Crossref

Recommended from our members

On Concurrency Control for Inverted Files

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue
Publication date: 01/01/1995
Field of study

Few if any Information Retrieval (IR) systems have had to deal with Concurrency Control (CC) on inverted files. In order to examine the issues involved in CC on inverted files, the effects of various operations (e.g. Boolean) on the effectiveness of the IR system are examined using the example of interleaved transactions. Solutions to the problems identified are examined by discussing the three main CC mechanisms; Locking, Optimistic CC and Timestamp Ordering. The effect of delays and document availability are examined. The problem of stored sets is identified. The need for further work in the area is identified

City Research Online

Using Inverted Files to Compress Text

Author: Strahil Ristov
Publication venue: 'University of Zagreb - University Computing Centre'
Publication date: 01/01/2002
Field of study

This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed by original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time

Crossref

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Nonequilibrium Neutrino Oscillations in the Early Universe with an Inverted Neutrino-Mass Hierarchy

Author: Caldwell
Dodelson
Dolgov
Dolgov
Enqvist
Enqvist
Fields
Fuller
Kolb
Kostelecký
Kostelecký
Kostelecký
Kostelecký
Kostelecký
Langacker
McKellar
Notzold
Raffelt
Samuel
Samuel
Savage
Shi
Stodolsky
Stuart Samuel
V.Alan Kostelecký
Publication venue: 'Elsevier BV'
Publication date: 01/01/1996
Field of study

The annihilation of electron-positron pairs around one second after the big bang distorts the Fermi-Dirac spectrum of neutrino energies. We determine the distortions assuming neutrino mixing with an inverted neutrino-mass hierarchy. Nonequilibrium thermodynamics, the Boltzmann equation, and numerical integration are used to achieve the results. The various types of neutrino behavior are established as a function of masses and mixing angles.Comment: 9 pages in Latex with 6 figures (10 postscript files

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality

Author: Helmer Sven
Moerkotte Guido
Publication venue
Publication date: 01/01/1999
Field of study

We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed)

MAnnheim DOCument Server

Recommended from our members

Parallel methods for the generation of partitioned inverted files

Author: MacFarlane A.
McCann J. A.
Robertson S. E.
Publication venue: 'Emerald'
Publication date: 01/10/2005
Field of study

Purpose – The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multi‐gigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId. Design/methodology/approach – We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments. Findings – The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method. Practical implications – The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration. Originality/value – The paper is of value to database administrators who manage large‐scale text collections, and who need to use parallel computing to implement their text retrieval services

City Research Online

Crossref

Magnetic inversion as a mechanism for the spectral transition of black hole binaries

Author: Done
Esin
Esin
Esin
Gierliński
Gnedin
Honma
Igor V. Igumenshchev
Igumenshchev
Igumenshchev
Kubota
Livio
Manmoto
Meyer
Narayan
Narayan
Poutanen
Punsly
Rutledge
Shakura
Taam
Tagger
Tagger
Uzdensky
Publication venue: 'IOP Publishing'
Publication date: 04/08/2009
Field of study

A mechanism for the transition between low/hard, high/soft, and steep power law (SPL) spectral states in black hole X-ray binaries is proposed. The low/hard state is explained by the development of a magnetically arrested accretion disk attributable to the accumulation of a vertical magnetic field in a central bundle. This disk forms powerful jets and consists of thin spiral accretion streams of a dense optically thick plasma surrounded by hot, magnetized, optically thin corona, which emits most of the energy in hard X-rays. State transition occurs because of the quasi-periodic or random inversion of poloidal magnetic fields in the accretion flow supplied by the secondary star. The inward advection of the inverted field results in a temporal disappearance of the central bundle caused by the annihilation of the opposed fields and restoration of the optically thick disk in the innermost region. This disk represents the high/soft state. The SPL state develops at the period of intensive field annihilation and precedes the high/soft state. The continuous supply of the inverted field leads to a new low/hard state because of the formation of another magnetically arrested disk.Comment: 5 plot files are attached separately. Accepted by the ApJ

arXiv.org e-Print Archive

Crossref