35 research outputs found

    From Theory to Practice: Plug and Play with Succinct Data Structures

    Full text link
    Engineering efficient implementations of compact and succinct structures is a time-consuming and challenging task, since there is no standard library of easy-to- use, highly optimized, and composable components. One consequence is that measuring the practical impact of new theoretical proposals is a difficult task, since older base- line implementations may not rely on the same basic components, and reimplementing from scratch can be very time-consuming. In this paper we present a framework for experimentation with succinct data structures, providing a large set of configurable components, together with tests, benchmarks, and tools to analyze resource requirements. We demonstrate the functionality of the framework by recomposing succinct solutions for document retrieval.Comment: 10 pages, 4 figures, 3 table

    Online Pattern Matching for String Edit Distance with Moves

    Full text link
    Edit distance with moves (EDM) is a string-to-string distance measure that includes substring moves in addition to ordinal editing operations to turn one string to the other. Although optimizing EDM is intractable, it has many applications especially in error detections. Edit sensitive parsing (ESP) is an efficient parsing algorithm that guarantees an upper bound of parsing discrepancies between different appearances of the same substrings in a string. ESP can be used for computing an approximate EDM as the L1 distance between characteristic vectors built by node labels in parsing trees. However, ESP is not applicable to a streaming text data where a whole text is unknown in advance. We present an online ESP (OESP) that enables an online pattern matching for EDM. OESP builds a parse tree for a streaming text and computes the L1 distance between characteristic vectors in an online manner. For the space-efficient computation of EDM, OESP directly encodes the parse tree into a succinct representation by leveraging the idea behind recent results of a dynamic succinct tree. We experimentally test OESP on the ability to compute EDM in an online manner on benchmark datasets, and we show OESP's efficiency.Comment: This paper has been accepted to the 21st edition of the International Symposium on String Processing and Information Retrieval (SPIRE2014

    Efficient and Compact Representations of Some Non-canonical Prefix-Free Codes

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-46049-9_5[Abstract] For many kinds of prefix-free codes there are efficient and compact alternatives to the traditional tree-based representation. Since these put the codes into canonical form, however, they can only be used when we can choose the order in which codewords are assigned to characters. In this paper we first show how, given a probability distribution over an alphabet of σσ characters, we can store a nearly optimal alphabetic prefix-free code in o(σ)o(σ) bits such that we can encode and decode any character in constant time. We then consider a kind of code introduced recently to reduce the space usage of wavelet matrices (Claude, Navarro, and Ordóñez, Information Systems, 2015). They showed how to build an optimal prefix-free code such that the codewords’ lengths are non-decreasing when they are arranged such that their reverses are in lexicographic order. We show how to store such a code in O(σlogL+2ϵL)O(σlog⁡L+2ϵL) bits, where L is the maximum codeword length and ϵϵ is any positive constant, such that we can encode and decode any character in constant time under reasonable assumptions. Otherwise, we can always encode and decode a codeword of ℓℓ bits in time O(ℓ)O(ℓ) using O(σlogL)O(σlog⁡L) bits of space.Ministerio de Economía, Industria y Competitividad; TIN2013-47090-C3-3-PMinisterio de Economía, Industria y Competitividad; TIN2015-69951-RMinisterio de Economía, Industria y Competitividad; ITC-20151305Ministerio de Economía, Industria y Competitividad; ITC-20151247Xunta de Galicia; GRC2013/053Chile. Núcleo Milenio Información y Coordinación en Redes; ICM/FIC.P10-024FCOST. IC1302Academy of Finland; 268324Academy of Finland; 25034

    Priority queues and sorting for read-only data

    Get PDF
    Abstract. We revisit the random-access-machine model in which the input is given on a read-only random-access media, the output is to be produced to a write-only sequential-access media, and in addition there is a limited random-access workspace. The length of the input is N elements, the length of the output is limited by the computation itself, and the capacity of the workspace is O(S + w) bits, where S is a parameter specified by the user and w is the number of bits per machine word. We present a state-of-the-art priority queue-called an adjustable navigation pile-for this model. Under some reasonable assumptions, our priority queue supports minimum and insert in O(1) worst-case time and extract in O(N/S +lg S) worst-case time, where lg N ≤ S ≤ N/ lg N . We also show how to use this data structure to simplify the existing optimal O(N 2 /S + N lg S)-time sorting algorithm for this model

    Retrieval and Perfect Hashing Using Fingerprinting

    Get PDF

    Memory-Adjustable Navigation Piles with Applications to Sorting and Convex Hulls

    Get PDF
    We consider space-bounded computations on a random-access machine (RAM) where the input is given on a read-only random-access medium, the output is to be produced to a write-only sequential-access medium, and the available workspace allows random reads and writes but is of limited capacity. The length of the input is NN elements, the length of the output is limited by the computation, and the capacity of the workspace is O(S)O(S) bits for some predetermined parameter SS. We present a state-of-the-art priority queue---called an adjustable navigation pile---for this restricted RAM model. Under some reasonable assumptions, our priority queue supports minimum\mathit{minimum} and insert\mathit{insert} in O(1)O(1) worst-case time and extract\mathit{extract} in O(N/S+lgS)O(N/S + \lg{} S) worst-case time for any SlgNS \geq \lg{} N. We show how to use this data structure to sort NN elements and to compute the convex hull of NN points in the two-dimensional Euclidean space in O(N2/S+NlgS)O(N^2/S + N \lg{} S) worst-case time for any SlgNS \geq \lg{} N. Following a known lower bound for the space-time product of any branching program for finding unique elements, both our sorting and convex-hull algorithms are optimal. The adjustable navigation pile has turned out to be useful when designing other space-efficient algorithms, and we expect that it will find its way to yet other applications.Comment: 21 page
    corecore