Search CORE

2,423 research outputs found

TopSig: Topology Preserving Document Signatures

Author: De Vries Christopher M.
Geva Shlomo
Publication venue
Publication date: 01/01/2011
Field of study

Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201

arXiv.org e-Print Archive

CiteSeerX

Crossref

Queensland University of Technology ePrints Archive

Challenging Ubiquitous Inverted Files

Author: Vries A.P. de
Publication venue: European Research Consortium for Informatics and Mathematics (ERCIM)
Publication date: 01/01/2000
Field of study

Stand-alone ranking systems based on highly optimized inverted file structures are generally considered ‘the’ solution for building search engines. Observing various developments in software and hardware, we argue however that IR research faces a complex engineering problem in the quest for more flexible yet efficient retrieval systems. We propose to base the development of retrieval systems on ‘the database approach’: mapping high-level declarative specifications of the retrieval process into efficient query plans. We present the Mirror DBMS as a prototype implementation of a retrieval system based on this approach

CWI's Institutional Repository

University of Twente Research Information

Pairwise similarity of TopSig document signatures

Author: De Vries Christopher
Geva Shlomo
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

This paper analyses the pairwise distances of signatures produced by the TopSig retrieval model on two document collections. The distribution of the distances are compared to purely random signatures. It explains why TopSig is only competitive with state of the art retrieval models at early precision. Only the local neighbourhood of the signatures is interpretable. We suggest this is a common property of vector space models

Crossref

Queensland University of Technology ePrints Archive

Compressed Bit-sliced Signature Files An Index Structure for Large Lexicons

Author: Can Fazli
Carterette Ben
Publication venue
Publication date: 01/04/1999
Field of study

We use the signature file method to search for partially specified terms in large lexicons. To optimize efficiency, we use the concepts of the partially evaluated bit-sliced signature file method and memory resident data structures. Our system employs signature partitioning, compression, and term blocking. We derive equations to obtain system design parameters, and measure indexing efficiency in terms of time and space. The resulting approach provides good response time and is storage-efficient. In the experiments we use four different lexicons, and show that the signature file approach outperforms the inverted file approach in certain efficiency aspects. KEYWORDS: Lexicon search, n-grams, signature files

Scholarly Commons @ MiamiOH (Miami University)

Tile-based Image Visual Codewords Extraction for Efficient Indexing and Retrieval

Author: Olfa Nasraoui
Zhiyong Zhang
Publication venue: 'IntechOpen'
Publication date: 01/02/2010
Field of study

IntechOpen

Application of Information Retrieval Techniques to Heterogeneous Databases in the Virtual Distributed Laboratory

Author: Lykins Rodney D.
Publication venue: AFIT Scholar
Publication date: 01/03/2002
Field of study

The Department of Defense (DoD) maintains thousands of Synthetic Aperture Radar (SAR), Infrared (IR), Hyper-Spectral intelligence imagery and Electro-Optical (EO) target signature data. These images are essential to evaluating and testing individual algorithm methodologies and development techniques within the Automatic Target Recognition (ATR) community. The Air Force Research Laboratory Sensors Directorate (AFRL/SN) has proposed the Virtual Distributed Laboratory (VDL) to maintain a central collection of the associated imagery metadata and a query mechanism to retrieve the desired imagery. All imagery metadata is stored in relational database format for access from agencies throughout the federal government and large civilian universities. Each set of imagery is independently maintained at each agency s location along with a local copy of the associated metadata that is periodically updated and sent to the VDL. This research focuses on applying information retrieval techniques to the multiple heterogeneous imagery metadata databases to present users the most relevant images based on user defined search criteria. More specifically, it defines a hierarchical concept thesaurus development methodology to handle the complexities of heterogeneous databases and the application of two classic information retrieval models. The results indicate this type of thesaurus-based approach can significantly increase the precision and recall levels of retrieving relevant documents

AFTI Scholar (Air Force Institute of Technology)

A Performance Study of Three Disk-based Structures for Indexing and Querying Frequent Itemsets

Author: Liu Guimei
Suchitra Andre
Wong Limsoon
Publication venue: 'VLDB Endowment'
Publication date: 01/05/2013
Field of study

Proceedings of the VLDB Endowment67505-51

CiteSeerX

ScholarBank@NUS