296 research outputs found

    Cache-oblivious index for approximate string matching

    Get PDF
    This paper revisits the problem of indexing a text for approximate string matching. Specifically, given a text T of length n and a positive integer k, we want to construct an index of T such that for any input pattern P, we can find all its k-error matches in T efficiently. This problem is well-studied in the internal-memory setting. Here, we extend some of these recent results to external-memory solutions, which are also cache-oblivious. Our first index occupies O((nlog kn)B) disk pages and finds all k-error matches with O((|P|+occ)B+log knloglog Bn) I/Os, where B denotes the number of words in a disk page. To the best of our knowledge, this index is the first external-memory data structure that does not require Ω (|P|+occ+poly(logn)) I/Os. The second index reduces the space to O((nlogn)B) disk pages, and the I/O complexity is O((|P|+occ)B+log k(k+1)nloglogn) . © 2011 Elsevier B.V. All rights reserved.postprin

    04091 Abstracts Collection -- Data Structures

    Get PDF
    From 22.02. to 27.02.2004, Dagstuhl Seminar "Data Structures" was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar are put together in this paper. The first section describes the seminar topics and goals in general

    Anti-Persistence on Persistent Storage: History-Independent Sparse Tables and Dictionaries

    Get PDF
    International audienceWe present history-independent alternatives to a B-tree, the primary indexing data structure used in databases. A data structure is history independent (HI) if it is impossible to deduce any information by examining the bit representation of the data structure that is not already available through the API. We show how to build a history-independent cache-oblivious B-tree and a history-independent external-memory skip list. One of the main contributions is a data structure we build on the way—a history-independent packed-memory array (PMA). The PMA supports efficient range queries, one of the most important operations for answering database queries. Our HI PMA matches the asymptotic bounds of prior non-HI packed-memory arrays and sparse tables. Specifically, a PMA maintains a dynamic set of elements in sorted order in a linear-sized array. Inserts and deletes take an amortized O(log^2 N) element moves with high probability. Simple experiments with our implementation of HI PMAs corroborate our theoretical analysis. Comparisons to regular PMAs give preliminary indications that the practical cost of adding history-independence is not too large. Our HI cache-oblivious B-tree bounds match those of prior non-* HI cache-oblivious B-trees. Searches take O(log_B N) I/Os; inserts and deletes take O((log^2 N)/B + log_B N) amortized I/Os with high probability; and range queries returning k elements take O(log_B N + k/B) I/Os. Our HI external-memory skip list achieves optimal bounds with high probability, analogous to in-memory skip lists: O(log_B N) I/Os for point queries and amortized O(log_B N) I/Os for in-serts/deletes. Range queries returning k elements run in O(log_B N + k/B) I/Os. In contrast, the best possible high-probability bounds for inserting into the folklore B-skip list, which promotes elements with probability 1/B, is just Θ(log N) I/Os. This is no better than the bounds one gets from running an in-memory skip list in external memory

    Optimal Hashing in External Memory

    Get PDF
    Hash tables are a ubiquitous class of dictionary data structures. However, standard hash table implementations do not translate well into the external memory model, because they do not incorporate locality for insertions. Iacono and Patrasu established an update/query tradeoff curve for external-hash tables: a hash table that performs insertions in O(lambda/B) amortized IOs requires Omega(log_lambda N) expected IOs for queries, where N is the number of items that can be stored in the data structure, B is the size of a memory transfer, M is the size of memory, and lambda is a tuning parameter. They provide a complicated hashing data structure, which we call the IP hash table, that meets this curve for lambda that is Omega(log log M + log_M N). In this paper, we present a simpler external-memory hash table, the Bundle of Arrays Hash Table (BOA), that is optimal for a narrower range of lambda. The simplicity of BOAs allows them to be readily modified to achieve the following results: - A new external-memory data structure, the Bundle of Trees Hash Table (BOT), that matches the performance of the IP hash table, while retaining some of the simplicity of the BOAs. - The Cache-Oblivious Bundle of Trees Hash Table (COBOT), the first cache-oblivious hash table. This data structure matches the optimality of BOTs and IP hash tables over the same range of lambda

    Non-Overlapping Indexing - Cache Obliviously

    Get PDF
    The non-overlapping indexing problem is defined as follows: pre-process a given text T[1,n] of length n into a data structure such that whenever a pattern P[1,p] comes as an input, we can efficiently report the largest set of non-overlapping occurrences of P in T. The best known solution is by Cohen and Porat [ISAAC, 2009]. Their index size is O(n) words and query time is optimal O(p+nocc), where nocc is the output size. We study this problem in the cache-oblivious model and present a new data structure of size O(n log n) words. It can answer queries in optimal O(p/(B)+log_B n+nocc/B) I/Os, where B is the block size

    Compressing dictionaries of strings

    Get PDF
    The aim of this work is to develop a data structure capable of storing a set of strings in a compressed way providing the facility to access and search by prefix any string in the set. The notion of string will be formally exposed in this work, but it is enough to think a string as a stream of characters or a variable length dat}. We will prove that the data structure devised in our work will be able to search prefixes of the stored strings in a very efficient way, hence giving a performant solution to one of the most discussed problem of our age. In the discussion of our data structure, particular emphasis will be given to both space and time efficiency and a tradeoff between these two will be constantly searched. To understand how much string based data structures are important, think about modern search engines and social networks; they must store and process continuously immense streams of data which are mainly strings, while the output of such processed data must be available in few milliseconds not to try the patience of the user. Space efficiency is one of the main concern in this kind of problem. In order to satisfy real-time latency bounds, the largest possible amount of data must be stored in the highest levels of the memory hierarchy. Moreover, data compression allows to save money because it reduces the amount of physical memory needed to store abstract data and this particularly important since storage is the main source of expenditure in modern systems
    • …