6 research outputs found

    Two Dimensional Range Minimum Queries and Fibonacci Lattices

    No full text
    Given a matrix of size N , two dimensional range minimum queries (2D-RMQs) ask for the position of the minimum element in a rectangular range within the matrix. We study trade-offs between the query time and the additional space used by indexing data structures that support 2D-RMQs. Using a novel technique—the discrepancy properties of Fibonacci lattices—we give an indexing data structure for 2D-RMQs that uses O(N/c) bits additional space with O(clog c(log log c) ²) query time, for any parameter c , 4≤c≤N. Also, when the entries of the input matrix are from {0,1}, we show that the query time can be improved to O(clog c) with the same space usage

    Space Efficient Encodings for Bit-strings, Range queries and Related Problems

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 2. Srinivasa Rao Satti.In this thesis, we design and implement various space efficient data structures. Most of these structures use spaces close to the information-theoretic lower bound while supporting the queries efficiently. In particular, this thesis is concerned with the data structures for four problems: (i) supporting \rank{} and \select{} queries on compressed bit strings, (ii) nearest larger neighbor problem, (iii) simultaneous encodings for range and next/previous larger/smaller value queries, and (iv) range \topk{} queries on two-dimensional arrays. We first consider practical implementations of \emph{compressed} bitvectors, which support \rank{} and \select{} operations on a given bit-string, while storing the bit-string in compressed form~\cite{DBLP:conf/dcc/JoJORS14}. Our approach relies on \emph{variable-to-fixed} encodings of the bit-string, an approach that has not yet been considered systematically for practical encodings of bitvectors. We show that this approach leads to fast practical implementations with low \emph{redundancy} (i.e., the space used by the bitvector in addition to the compressed representation of the bit-string), and is a flexible and promising solution to the problem of supporting \rank{} and \select{} on moderately compressible bit-strings, such as those encountered in real-world applications. Next, we propose space-efficient data structures for the nearest larger neighbor problem~\cite{IWOCA2014,walcom-JoRS15}. Given a sequence of nn elements from a total order, and a position in the sequence, the nearest larger neighbor (\NLV{}) query returns the position of the element which is closest to the query position, and is larger than the element at the query position. The problem of finding all nearest larger neighbors has attracted interest due to its applications for parenthesis matching and in computational geometry~\cite{AsanoBK09,AsanoK13,BerkmanSV93}. We consider a data structure version of this problem, which is to preprocess a given sequence of elements to construct a data structure that can answer \NLN{} queries efficiently. For one-dimensional arrays, we give time-space tradeoffs for the problem on \textit{indexing model}. For two-dimensional arrays, we give an optimal encoding with constant query on \textit{encoding model}. We also propose space-efficient encodings which support various range queries, and previous and next smaller/larger value queries~\cite{cocoonJS15}. Given a sequence of nn elements from a total order, we obtain a 4.088n+o(n)4.088n + o(n)-bit encoding that supports all these queries where nn is the length of input array. For the case when we need to support all these queries in constant time, we give an encoding that takes 4.585n+o(n)4.585n + o(n) bits. This improves the 5.08n+o(n)5.08n+o(n)-bit encoding obtained by encoding the colored 2d2d-Min and 2d2d-Max heaps proposed by Fischer~\cite{Fischer11}. We extend the original DFUDS~\cite{BDMRRR05} encoding of the colored 2d2d-Min and 2d2d-Max heap that supports the queries in constant time. Then, we combine the extended DFUDS of 2d2d-Min heap and 2d2d-Max heap using the Min-Max encoding of Gawrychowski and Nicholson~\cite{Gawry14} with some modifications. We also obtain encodings that take lesser space and support a subset of these queries. Finally, we consider the various encodings that support range \topk{} queries on a two-dimensional array containing elements from a total order. For an m×nm \times n array, we first propose an optimal encoding for answering one-sided \topk{} queries, whose query range is restricted to [1m][1a][1 \dots m][1 \dots a], for 1an1 \le a \le n. Next, we propose an encoding for the general \topk{} queries that takes m2lg((k+1)nn)+mlgm+o(n)m^2\lg{{(k+1)n \choose n}} + m\lg{m}+o(n) bits. This generalizes the \topk{} encoding of Gawrychowski and Nicholson~\cite{Gawry14}.Chapter 1 Introduction 1 1.1 Computational model 2 1.1.1 Encoding and indexing models 2 1.2 Contribution of the thesis 3 1.3 Organization of the thesis 5 Chapter 2 Preliminaries 7 Chapter 3 Compressed bit vectors based on variable-to-fixed encodings 10 3.1 Introduction 10 3.2 Bit-vectors using V2F coding 14 3.3 V2F compression algorithms for bit-strings 16 3.3.1 Tunstall code 16 3.3.2 Enumerative codes 19 3.3.3 LZW algorithm 23 3.3.4 Empirical evaluation of the compressors 23 3.4 Practical implementation of bitvectors based on V2F compression. 26 3.4.1 Testing Methodology 29 3.4.2 Results of Empirical Evaluation 33 3.5 Future works 35 Chapter 4 Space Efficient Data Structures for Nearest Larger Neighbor 39 4.1 Introduction 39 4.2 Indexing NLV queries on 1D arrays 43 4.3 Encoding NLN queries on2D binary arrays 44 4.4 Encoding NLN queries for general 2D arrays 50 4.4.1 2D NLN in the encoding model–distinct case 50 4.4.2 2D NLN in the encoding model–general case 53 4.5 Open problems 63 Chapter 5 Simultaneous encodings for range and next/previous larger/smaller value queries 64 5.1 Introduction 64 5.2 Preliminaries 67 5.2.1 2d-Min heap 69 5.2.2 Encoding range min-max queries 72 5.3 Extended DFUDS for colored 2d-Min heap 75 5.4 Encoding colored 2d-Min and 2d-Max heaps 80 5.4.1 Combined data structure for DCMin(A) and DCMax(A) 82 5.4.2 Encoding colored 2d-Min and 2d-Max heaps using less space 88 5.5 Open problems 89 Chapter 6 Encoding Two-dimensional range Top-k queries 90 6.1 Introduction 90 6.2 Encoding one-sided range Top-k queries on 2D array 92 6.3 Encoding general range Top-k queries on 2D array 95 6.4 Open problems 99 Chapter 7 Conculsion 100 Bibliography 103 요약 112Docto

    In-Memory Storage for Labeled Tree-Structured Data

    Get PDF
    In this thesis, we design in-memory data structures for labeled and weights trees, so that various types of path queries or operations can be supported with efficient query time. We assume the word RAM model with word size w, which permits random accesses to w-bit memory cells. Our data structures are space-efficient and many of them are even succinct. These succinct data structures occupy space close to the information theoretic lower bounds of the input trees within lower order terms. First, we study the problems of supporting various path queries over weighted trees. A path counting query asks for the number of nodes on a query path whose weights lie within a query range, while a path reporting query requires to report these nodes. A path median query asks for the median weight on a path between two given nodes, and a path selection query returns the k-th smallest weight. We design succinct data structures to support path counting queries in O(lg σ/ lg lg n + 1) time, path reporting queries in O((occ + 1)(lg σ/ lg lg n + 1)) time, and path median and path selection queries in O(lg σ/ lg lg σ) time, where n is the size of the input tree, the weights of nodes are drawn from [1..σ] and occ is the size of the output. Our results not only greatly improve the best known data structures [31, 75, 65], but also match the lower bounds for path counting, median and selection queries [86, 87, 71] when σ = Ω(n/polylog(n)). Second, we study the problem of representing labeled ordinal trees succinctly. Our new representations support a much broader collection of operations than previous work. In our approach, labels of nodes are stored in a preorder label sequence, which can be compressed using any succinct representation of strings that supports access, rank and select operations. Thus, we present a framework for succinct representations of labeled ordinal trees that is able to handle large alphabets. This answers an open problem presented by Geary et al. [54], which asks for representations of labeled ordinal trees that remain space-efficient for large alphabets. We further extend our work and present the first succinct representations for dynamic labeled ordinal trees that support several label-based operations including finding the level ancestor with a given label. Third, we study the problems of supporting path minimum and semigroup path sum queries. In the path minimum problem, we preprocess a tree on n weighted nodes, such that given an arbitrary path, the node with the smallest weight along this path can be located. We design novel succinct indices for this problem under the indexing model, for which weights of nodes are read-only and can be accessed with ranks of nodes in the preorder traversal sequence of the input tree. One of our index structures supports queries in O(α(m,n)) time, and occupies O(m) bits of space in addition to the space required for the input tree, where m is an integer greater than or equal to n and α(m, n) is the inverse-Ackermann function. Following the same approach, we also develop succinct data structures for semigroup path sum queries, for which a query asks for the sum of weights along a given query path. Then, using the succinct indices for path minimum queries, we achieve three different time-space tradeoffs for path reporting queries. Finally, we study the problems of supporting various path queries in dynamic settings. We propose the first non-trivial linear-space solution that supports path reporting in O((lgn/lglgn)^2 +occlgn/lglgn)) query time, where n is the size of the input tree and occ is the output size, and the insertion and deletion of a node of an arbitrary degree in O(lg^{2+ε} n) amortized time, for any constant ε ∈ (0, 1). Obvious solutions based on directly dynamizing solutions to the static version of this problem all require Ω((lg n/ lg lg n)^2) time for each node reported. We also design data structures that support path counting and path reporting queries in O((lg n/ lg lg n)^2) time, and insertions and deletions in O((lg n/ lg lg n)^2) amortized time. This matches the best known results for dynamic two-dimensional range counting [62] and range selection [63], which can be viewed as special cases of path counting and path selection
    corecore