100,744 research outputs found
Challenging Ubiquitous Inverted Files
Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ātheā solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on āthe database approachā: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach
TopSig: Topology Preserving Document Signatures
Performance comparisons between File Signatures and Inverted Files for text
retrieval have previously shown several significant shortcomings of file
signatures relative to inverted files. The inverted file approach underpins
most state-of-the-art search engine algorithms, such as Language and
Probabilistic models. It has been widely accepted that traditional file
signatures are inferior alternatives to inverted files. This paper describes
TopSig, a new approach to the construction of file signatures. Many advances in
semantic hashing and dimensionality reduction have been made in recent times,
but these were not so far linked to general purpose, signature file based,
search engines. This paper introduces a different signature file approach that
builds upon and extends these recent advances. We are able to demonstrate
significant improvements in the performance of signature file based indexing
and retrieval, performance that is comparable to that of state of the art
inverted file based systems, including Language models and BM25. These findings
suggest that file signatures offer a viable alternative to inverted files in
suitable settings and from the theoretical perspective it positions the file
signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
Recommended from our members
Parallel methods for the update of partitioned inverted files
Purpose ā An issue which tends to be ignored in information retrieval is the issue of updating inverted files. This is largely because inverted files were devised to provide fast query service, and much work has been done with the emphasis strongly on queries. In this paper we study the effect of using parallel methods for the update of inverted files in order to reduce costs, by looking at two types of partitioning for inverted files: document identifier and term identifier.
Design/methodology/approach ā Raw update service and update with query service are studied with these partitioning schemes using an incremental update strategy. We use standard measures used in parallel computing such as speedup to examine the computing results and also the costs of reorganising indexes while servicing transactions.
Findings ā Empirical results show that for both transaction processing and index reorganisation the document identifier method is superior. However, there is evidence that the term identifier partitioning method could be useful in a concurrent transaction processing context.
Practical implications ā There is an increasing need to service updates which is now becoming a requirement of inverted files (for dynamic collections such as the Web), demonstrating that a shift in requirements of inverted file maintenance is needed from the past.
Originality/value ā The paper is of value to database administrators who manage large-scale and dynamic text collections, and who need to use parallel computing to implement their text retrieval services
Recommended from our members
On Concurrency Control for Inverted Files
Few if any Information Retrieval (IR) systems have had to deal with Concurrency Control (CC) on inverted files. In order to examine the issues involved in CC on inverted files, the effects of various operations (e.g. Boolean) on the effectiveness of the IR system are examined using the example of interleaved transactions. Solutions to the problems identified are examined by discussing the three main CC mechanisms; Locking, Optimistic CC and Timestamp Ordering. The effect of delays and document availability are examined. The problem of stored sets is identified. The need for further work in the area is identified
Using Inverted Files to Compress Text
This is the first report on a new approach to text compression. It consists of representing the text file with compressed inverted file index in conjunction with very compact lexicon, where lexicon includes every word in the text. The index is compressed using standard index compression techniques, and lexicon is compressed by original dictionary compression method that gives better compression results than existing procedures. Compression procedure is complex, but decompression time is linear with the file size, although it requires two passes and hence can not be performed online. First experiments show that this method, when refined, can be competitive for larger texts that only need to be decompressed in the real time
Nonequilibrium Neutrino Oscillations in the Early Universe with an Inverted Neutrino-Mass Hierarchy
The annihilation of electron-positron pairs around one second after the big
bang distorts the Fermi-Dirac spectrum of neutrino energies. We determine the
distortions assuming neutrino mixing with an inverted neutrino-mass hierarchy.
Nonequilibrium thermodynamics, the Boltzmann equation, and numerical
integration are used to achieve the results. The various types of neutrino
behavior are established as a function of masses and mixing angles.Comment: 9 pages in Latex with 6 figures (10 postscript files
A Study of Four Index Structures for Set-Valued Attributes of Low Cardinality
We review and study the performance of four different index structures for indexing set-valued attributes designed to speed up set equality, subset and superset queries. All index structures are based on traditional techniques, namely signatures and inverted files. More specifically, we consider sequential signature files, signature trees, extendible signature hashing, and a B-tree based implementation of inverted lists. The latter is refined by a compression scheme in order to keep space requirements within acceptable bounds. The performance study is based on real implementations subjected to a benchmark accounting for different set sizes, domain sizes, and data distributions (uniform and skewed)
Recommended from our members
Parallel methods for the generation of partitioned inverted files
Purpose
ā The generation of inverted indexes is one of the most computationally intensive activities for information retrieval systems: indexing large multiāgigabyte text databases can take many hours or even days to complete. We examine the generation of partitioned inverted files in order to speed up the process of indexing. Two types of index partitions are investigated: TermId and DocId.
Design/methodology/approach
ā We use standard measures used in parallel computing such as speedup and efficiency to examine the computing results and also the space costs of our trial indexing experiments.
Findings
ā The results from runs on both partitioning methods are compared and contrasted, concluding that DocId is the more efficient method.
Practical implications
ā The practical implications are that the DocId partitioning method would in most circumstances be used for distributing inverted file data in a parallel computer, particularly if indexing speed is the primary consideration.
Originality/value
ā The paper is of value to database administrators who manage largeāscale text collections, and who need to use parallel computing to implement their text retrieval services
Magnetic inversion as a mechanism for the spectral transition of black hole binaries
A mechanism for the transition between low/hard, high/soft, and steep power
law (SPL) spectral states in black hole X-ray binaries is proposed. The
low/hard state is explained by the development of a magnetically arrested
accretion disk attributable to the accumulation of a vertical magnetic field in
a central bundle. This disk forms powerful jets and consists of thin spiral
accretion streams of a dense optically thick plasma surrounded by hot,
magnetized, optically thin corona, which emits most of the energy in hard
X-rays. State transition occurs because of the quasi-periodic or random
inversion of poloidal magnetic fields in the accretion flow supplied by the
secondary star. The inward advection of the inverted field results in a
temporal disappearance of the central bundle caused by the annihilation of the
opposed fields and restoration of the optically thick disk in the innermost
region. This disk represents the high/soft state. The SPL state develops at the
period of intensive field annihilation and precedes the high/soft state. The
continuous supply of the inverted field leads to a new low/hard state because
of the formation of another magnetically arrested disk.Comment: 5 plot files are attached separately. Accepted by the ApJ
- ā¦