Improved ESP-index: a practical self-index for highly repetitive texts

F. Claude; F. Claude; G. Navarro; J. Barbay; J.I. Munro; K. Goto; O. Delpratt; S. Maruyama; T. Gagie; T. Gagie; T. Yamamoto

research

Improved ESP-index: a practical self-index for highly repetitive texts

Authors: F. Claude
F. Claude
G. Navarro
J. Barbay
J.I. Munro
K. Goto
O. Delpratt
S. Maruyama
T. Gagie
T. Gagie
T. Yamamoto
Publication date: 1 January 2014
Publisher
Doi

Abstract

While several self-indexes for highly repetitive texts exist, developing a practical self-index applicable to real world repetitive texts remains a challenge. ESP-index is a grammar-based self-index on the notion of edit-sensitive parsing (ESP), an efficient parsing algorithm that guarantees upper bounds of parsing discrepancies between different appearances of the same subtexts in a text. Although ESP-index performs efficient top-down searches of query texts, it has a serious issue on binary searches for finding appearances of variables for a query text, which resulted in slowing down the query searches. We present an improved ESP-index (ESP-index-I) by leveraging the idea behind succinct data structures for large alphabets. While ESP-index-I keeps the same types of efficiencies as ESP-index about the top-down searches, it avoid the binary searches using fast rank/select operations. We experimentally test ESP-index-I on the ability to search query texts and extract subtexts from real world repetitive texts on a large-scale, and we show that ESP-index-I performs better that other possible approaches.Comment: This is the full version of a proceeding accepted to the 11th International Symposium on Experimental Algorithms (SEA2014

Similar works

Full text

Available Versions

Crossref

info:doi/10.1007%2F978-3-319-0...

Last time updated on 30/03/2019