Search CORE

9,440 research outputs found

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Author: Amir Amihood
Franceschini Gianni
Grossi Roberto
Kopelowitz Tsvi
Lewenstein Moshe
Lewenstein Noa
Publication venue
Publication date: 03/06/2013
Field of study

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time

O(\log n)

per input symbol (as opposed to amortized

O(\log n)

time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves

O(\log n)

worst case time per input symbol. Searching for a pattern of length

m

in the resulting suffix tree takes

O(\min(m\log |\Sigma|, m + \log n) + tocc)

time, where

tocc

is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance

arXiv.org e-Print Archive

Archivio della Ricerca - Università di Pisa

Archivio della ricerca- Università di Roma La Sapienza

LRM-Trees: Compressed Indices, Adaptive Sorting, and Compressed Permutations

Author: Barbay Jérémy
Fischer Johannes
Publication venue
Publication date: 29/09/2010
Field of study

LRM-Trees are an elegant way to partition a sequence of values into sorted consecutive blocks, and to express the relative position of the first element of each block within a previous block. They were used to encode ordinal trees and to index integer arrays in order to support range minimum queries on them. We describe how they yield many other convenient results in a variety of areas, from data structures to algorithms: some compressed succinct indices for range minimum queries; a new adaptive sorting algorithm; and a compressed succinct data structure for permutations supporting direct and indirect application in time all the shortest as the permutation is compressible.Comment: 13 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Compressed Representations of Permutations, and Applications

Author: Barbay Jérémy
Navarro Gonzalo
Publication venue
Publication date: 01/01/2008
Field of study

We explore various techniques to compress a permutation

\pi

over n integers, taking advantage of ordered subsequences in

\pi

, while supporting its application

\pi

(i) and the application of its inverse

\pi^{-1}(i)

in small time. Our compression schemes yield several interesting byproducts, in many cases matching, improving or extending the best existing results on applications such as the encoding of a permutation in order to support iterated applications

\pi^k(i)

of it, of integer functions, and of inverted lists and suffix arrays

arXiv.org e-Print Archive

CiteSeerX

Dagstuhl Research Online Publication Server

A framework for space-efficient string kernels

Author: A Apostolico
A Apostolico
AJ Smola
AM İleri
B Chor
D Belazzougui
G Reinert
GE Sims
J Herold
J Qi
J Shawe-Taylor
M Crochemore
R Chikhi
S Chairungsee
Publication venue
Publication date: 23/02/2015
Field of study

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the

k

-mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in

O(nd)

time and in

o(n)

bits of space in addition to the input, using just a

\mathtt{rangeDistinct}

data structure on the Burrows-Wheeler transform of the input strings, which takes

O(d)

time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of

k

, like the

k

-mer profile and the

k

-th order empirical entropy, and for calibrating the value of

k

using the data

arXiv.org e-Print Archive

Crossref

Locating regions in a sequence under density constraints

Author: Benjamin A. Burton
Boztaş S.
Greenberg R. I.
Huang X.
Lin Y.-L.
Mathias Hiron
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2013
Field of study

Several biological problems require the identification of regions in a sequence where some feature occurs within a target density range: examples including the location of GC-rich regions, identification of CpG islands, and sequence matching. Mathematically, this corresponds to searching a string of 0s and 1s for a substring whose relative proportion of 1s lies between given lower and upper bounds. We consider the algorithmic problem of locating the longest such substring, as well as other related problems (such as finding the shortest substring or a maximal set of disjoint substrings). For locating the longest such substring, we develop an algorithm that runs in O(n) time, improving upon the previous best-known O(n log n) result. For the related problems we develop O(n log log n) algorithms, again improving upon the best-known O(n log n) results. Practical testing verifies that our new algorithms enjoy significantly smaller time and memory footprints, and can process sequences that are orders of magnitude longer as a result.Comment: 17 pages, 8 figures; v2: minor revisions, additional explanations; to appear in SIAM Journal on Computin

arXiv.org e-Print Archive

CiteSeerX

Crossref

University of Queensland eSpace