CORE
🇺🇦
make metadata, not war
Services
Services overview
Explore all CORE services
Access to raw data
API
Dataset
FastSync
Content discovery
Recommender
Discovery
OAI identifiers
OAI Resolver
Managing content
Dashboard
Bespoke contracts
Consultancy services
Support us
Support us
Membership
Sponsorship
Community governance
Advisory Board
Board of supporters
Research network
About
About us
Our mission
Team
Blog
FAQs
Contact us
Document retrieval hacks
Authors
Simon J. Puglisi
Bella Zhukova
Publication date
1 January 2021
Publisher
Schloss Dagstuhl- Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing
Doi
Cite
Abstract
Publisher Copyright: © Simon J. Puglisi and Bella Zhukova; licensed under Creative Commons License CC-BY 4.0 19th International Symposium on Experimental Algorithms (SEA 2021).Given a collection of strings, document listing refers to the problem of finding all the strings (or documents) where a given query string (or pattern) appears. Index data structures that support efficient document listing for string collections have been the focus of intense research in the last decade, with dozens of papers published describing exotic and elegant compressed data structures. The problem is now quite well understood in theory and many of the solutions have been implemented and evaluated experimentally. A particular recent focus has been on highly repetitive document collections, which have become prevalent in many areas (such as version control systems and genomics - to name just two very different sources). The aim of this paper is to describe simple and efficient document listing algorithms that can be used in combination with more sophisticated techniques, or as baselines against which the performance of new document listing indexes can be measured. Our approaches are based on simple combinations of scanning and hashing, which we show to combine very well with dictionary compression to achieve small space usage. Our experiments show these methods to be often much faster and less space consuming than the best specialized indexes for the problem.Peer reviewe
Similar works
Full text
Open in the Core reader
Download PDF
Available Versions
Dagstuhl Research Online Publication Server
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:drops-oai.dagstuhl.de:1378...
Last time updated on 11/06/2021
Helsingin yliopiston digitaalinen arkisto
See this paper in CORE
Go to the repository landing page
Download from data provider
oai:helda.helsinki.fi:10138/35...
Last time updated on 12/03/2023