Search CORE

10 research outputs found

Pseudo-random graphs and bit probe schemes with one-sided error

Author: Romashchenko Andrei
Publication venue
Publication date: 01/01/2011
Field of study

We study probabilistic bit-probe schemes for the membership problem. Given a set A of at most n elements from the universe of size m we organize such a structure that queries of type "Is x in A?" can be answered very quickly. H.Buhrman, P.B.Miltersen, J.Radhakrishnan, and S.Venkatesh proposed a bit-probe scheme based on expanders. Their scheme needs space of

O(n\log m)

bits, and requires to read only one randomly chosen bit from the memory to answer a query. The answer is correct with high probability with two-sided errors. In this paper we show that for the same problem there exists a bit-probe scheme with one-sided error that needs space of O(n\log^2 m+\poly(\log m)) bits. The difference with the model of Buhrman, Miltersen, Radhakrishnan, and Venkatesh is that we consider a bit-probe scheme with an auxiliary word. This means that in our scheme the memory is split into two parts of different size: the main storage of

O(n\log^2 m)

bits and a short word of

\log^{O(1)}m

bits that is pre-computed once for the stored set A and `cached'. To answer a query "Is x in A?" we allow to read the whole cached word and only one bit from the main storage. For some reasonable values of parameters our space bound is better than what can be achieved by any scheme without cached data.Comment: 19 page

arXiv.org e-Print Archive

Succinct List Indexing in Optimal Time

Author: Holland William L.
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

Dagstuhl Research Online Publication Server

Dynamic Rank/Select Dictionaries with Applications to XML Indexing

Author: Gupta Ankur
Hon Wing-Kai
Shah Rahul
Vitter Jeffrey S.
Publication venue: 'Purdue University (bepress)'
Publication date: 11/07/2006
Field of study

We consider a central problem in text indexing: Given a text T over an alphabet C, construct a conlpressed data structure answering the queries char(i), rank,(i); and select,(i) for a synlbol s E C. Wlany data structures consider these queries for static text T [GGVOS; FI\/IOl, SGOG, GMROG]. We consider the dynainic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T. This problenl is a key challenge in compressed text illdexing and has direct applicatioil to dynaillic XI\/IL iildexing structures that answer subpath queries [FLMM05]. We build on the results of [RRROZ, GMROG] and give the best known query bounds for the dynanlic version of this problem, supporting arbitrary insertions and deletions of sylllbols in T. Specifically, with an amortized update time of O((l/e)ne), we suggest how to support rank,(i), select,(i): and char(i) queries in O((~/E) loglogn) time, for ally e < 1. The best previous query tinles for this problem were O(logn1og ICI): given by [MNOG]. Our bounds are conlpetitive with state-of-the-art static structures [GhlROG]. Sonle applicable lower bounds for the partial sunls probleln [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is conlpetitive with the corresponding static structures. For the special case of bitvectors (i.e., 1x1 = 2); we also show the best tradeoffs for query/update time, inlproving upoil the results of [MNOG, HSSO3; RRR021. Finally, our focus on fast query/slower update is well-suited for a query-intensive XhlIL indexing ellvironment. Using the XBW transform [FLhllM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T

CiteSeerX

Purdue E-Pubs

Low Redundancy in Static Dictionaries with O(1) Worst Case Lookup Time

Author: A. Brodnik
A. Chi Chih Yao
D.R. Heath-Brown
F. Fich
J. P. Schmidt
M. L. Fredman
P. B. Miltersen
R. Endre Tarjan
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Librería de Estructuras de Datos Compactas en Rust

Author: Hermo González Jorge
Publication venue
Publication date: 01/01/2023
Field of study

[Resumen]: El crecimiento exponencial de los datos en la actualidad plantea desafíos significativos en términos de almacenamiento y procesamiento eficiente de los mismos. Este trabajo fin de grado se centra en la importancia de las estructuras de datos compactas como una solución clave en el tratamiento de datos a gran escala. A diferencia de las técnicas clásicas de compresión, estas estructuras permiten operar con datos sin necesidad de descomprimirlos por completo, lo que ahorra tiempo y espacio en memoria. Este enfoque se ha vuelto esencial en campos como la recuperación de información y la bioinformática debido al crecimiento masivo de datos. El lenguaje de programación Rust, conocido por su seguridad, gestión automática de memoria y eficiencia, se ha convertido en una de las opciones preferidas en la actualidad en términos de innovación y modernización en la industria de la tecnología. Ante la falta de una librería de estructuras de datos compactas en Rust que sea competitiva con el estado del arte en otros lenguajes de programación, este proyecto aprovechará las ventajas que nos proporciona este lenguaje para desarrollar una librería de estructuras de datos compactas de código abierto, proporcionando así a la comunidad científica y a los desarrolladores de Rust una herramienta flexible, potente y fácil de usar para sus proyectos. Además, con este trabajo fin de grado se busca fomentar la reproducibilidad, la reutilización y el avance en la investigación en el campo de investigación en estructuras de datos compactas. De esta manera, se contribuirá a la expansión y adopción de Rust en la investigación y al desarrollo de software científico eficiente y confiable.[Abstract]: The exponential growth of data nowadays poses significant challenges in terms of efficient storage and processing. This undergraduate thesis focuses on the importance of compact data structures as a key solution in handling large scale data. Unlike classical compression techniques, these structures allow for operations on data without the need for complete decompression, saving time and memory space. This approach has become essential in fields such as information retrieval and bioinformatics due to the massive growth of data. The Rust programming language, known for its safety, automatic memory management and efficiency, has become one of the preferred options at present for innovation and modernization in the technology industry. In the absence of a competitive library of compact data structures in Rust compared to the state of the art in other programming languages, this project will leverage the advantages provided by this language to develop an open source compact data structures library. This will provide the scientific community and Rust developers a flexible, powerful, and easy to use tool for their projects. Furthermore, this undergraduate thesis aims to promote reproducibility, reuse, and progress in research on the field of compact data structures. In this way, it will contribute to the expansion and adoption of Rust in research and the development of efficient and reliable scientific software.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2022/202

Repositorio da Universidade da Coruña

Scalable succinct indexing for large text collections

Author: Petri M
Publication venue: RMIT University
Publication date: 01/01/2013
Field of study

Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems

RMIT Research Repository

LIPIcs, Volume 248, ISAAC 2022, Complete Volume

Author: Bae Sang Won
Park Heejin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

LIPIcs, Volume 248, ISAAC 2022, Complete Volum

Dagstuhl Research Online Publication Server