10 research outputs found
Pseudo-random graphs and bit probe schemes with one-sided error
We study probabilistic bit-probe schemes for the membership problem. Given a
set A of at most n elements from the universe of size m we organize such a
structure that queries of type "Is x in A?" can be answered very quickly.
H.Buhrman, P.B.Miltersen, J.Radhakrishnan, and S.Venkatesh proposed a bit-probe
scheme based on expanders. Their scheme needs space of bits, and
requires to read only one randomly chosen bit from the memory to answer a
query. The answer is correct with high probability with two-sided errors. In
this paper we show that for the same problem there exists a bit-probe scheme
with one-sided error that needs space of O(n\log^2 m+\poly(\log m)) bits. The
difference with the model of Buhrman, Miltersen, Radhakrishnan, and Venkatesh
is that we consider a bit-probe scheme with an auxiliary word. This means that
in our scheme the memory is split into two parts of different size: the main
storage of bits and a short word of bits that is
pre-computed once for the stored set A and `cached'. To answer a query "Is x in
A?" we allow to read the whole cached word and only one bit from the main
storage. For some reasonable values of parameters our space bound is better
than what can be achieved by any scheme without cached data.Comment: 19 page
Dynamic Rank/Select Dictionaries with Applications to XML Indexing
We consider a central problem in text indexing: Given a text T over an alphabet C, construct a conlpressed data structure answering the queries char(i), rank,(i); and select,(i) for a synlbol s E C. Wlany data structures consider these queries for static text T [GGVOS; FI\/IOl, SGOG, GMROG]. We consider the dynainic version of the problem, where we are allowed to insert and delete symbols at arbitrary positions of T. This problenl is a key challenge in compressed text illdexing and has direct applicatioil to dynaillic XI\/IL iildexing structures that answer subpath queries [FLMM05]. We build on the results of [RRROZ, GMROG] and give the best known query bounds for the dynanlic version of this problem, supporting arbitrary insertions and deletions of sylllbols in T. Specifically, with an amortized update time of O((l/e)ne), we suggest how to support rank,(i), select,(i): and char(i) queries in O((~/E) loglogn) time, for ally e < 1. The best previous query tinles for this problem were O(logn1og ICI): given by [MNOG]. Our bounds are conlpetitive with state-of-the-art static structures [GhlROG]. Sonle applicable lower bounds for the partial sunls probleln [PD06] show that our update/query tradeoff is also nearly optimal. In addition, our space bound is conlpetitive with the corresponding static structures. For the special case of bitvectors (i.e., 1x1 = 2); we also show the best tradeoffs for query/update time, inlproving upoil the results of [MNOG, HSSO3; RRR021. Finally, our focus on fast query/slower update is well-suited for a query-intensive XhlIL indexing ellvironment. Using the XBW transform [FLhllM05], we also present a dynamic data structure that succinctly maintains an ordered labeled tree T and supports a powerful set of queries on T
Librería de Estructuras de Datos Compactas en Rust
[Resumen]: El crecimiento exponencial de los datos en la actualidad plantea desafíos significativos en términos
de almacenamiento y procesamiento eficiente de los mismos. Este trabajo fin de grado
se centra en la importancia de las estructuras de datos compactas como una solución clave en
el tratamiento de datos a gran escala. A diferencia de las técnicas clásicas de compresión, estas
estructuras permiten operar con datos sin necesidad de descomprimirlos por completo, lo que
ahorra tiempo y espacio en memoria. Este enfoque se ha vuelto esencial en campos como la
recuperación de información y la bioinformática debido al crecimiento masivo de datos.
El lenguaje de programación Rust, conocido por su seguridad, gestión automática de memoria
y eficiencia, se ha convertido en una de las opciones preferidas en la actualidad en
términos de innovación y modernización en la industria de la tecnología.
Ante la falta de una librería de estructuras de datos compactas en Rust que sea competitiva
con el estado del arte en otros lenguajes de programación, este proyecto aprovechará
las ventajas que nos proporciona este lenguaje para desarrollar una librería de estructuras
de datos compactas de código abierto, proporcionando así a la comunidad científica y a los
desarrolladores de Rust una herramienta flexible, potente y fácil de usar para sus proyectos.
Además, con este trabajo fin de grado se busca fomentar la reproducibilidad, la reutilización
y el avance en la investigación en el campo de investigación en estructuras de datos
compactas. De esta manera, se contribuirá a la expansión y adopción de Rust en la investigación
y al desarrollo de software científico eficiente y confiable.[Abstract]: The exponential growth of data nowadays poses significant challenges in terms of efficient
storage and processing. This undergraduate thesis focuses on the importance of compact
data structures as a key solution in handling large scale data. Unlike classical compression
techniques, these structures allow for operations on data without the need for complete decompression,
saving time and memory space. This approach has become essential in fields
such as information retrieval and bioinformatics due to the massive growth of data.
The Rust programming language, known for its safety, automatic memory management
and efficiency, has become one of the preferred options at present for innovation and modernization
in the technology industry.
In the absence of a competitive library of compact data structures in Rust compared to
the state of the art in other programming languages, this project will leverage the advantages
provided by this language to develop an open source compact data structures library. This
will provide the scientific community and Rust developers a flexible, powerful, and easy to
use tool for their projects.
Furthermore, this undergraduate thesis aims to promote reproducibility, reuse, and progress
in research on the field of compact data structures. In this way, it will contribute to the expansion
and adoption of Rust in research and the development of efficient and reliable scientific
software.Traballo fin de grao (UDC.FIC). Enxeñaría Informática. Curso 2022/202
Scalable succinct indexing for large text collections
Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems
LIPIcs, Volume 248, ISAAC 2022, Complete Volume
LIPIcs, Volume 248, ISAAC 2022, Complete Volum