111 research outputs found

    Distance labeling schemes for trees

    Get PDF
    We consider distance labeling schemes for trees: given a tree with nn nodes, label the nodes with binary strings such that, given the labels of any two nodes, one can determine, by looking only at the labels, the distance in the tree between the two nodes. A lower bound by Gavoille et. al. (J. Alg. 2004) and an upper bound by Peleg (J. Graph Theory 2000) establish that labels must use Θ(log⁥2n)\Theta(\log^2 n) bits\footnote{Throughout this paper we use log⁥\log for log⁥2\log_2.}. Gavoille et. al. (ESA 2001) show that for very small approximate stretch, labels use Θ(log⁥nlog⁥log⁥n)\Theta(\log n \log \log n) bits. Several other papers investigate various variants such as, for example, small distances in trees (Alstrup et. al., SODA'03). We improve the known upper and lower bounds of exact distance labeling by showing that 14log⁥2n\frac{1}{4} \log^2 n bits are needed and that 12log⁥2n\frac{1}{2} \log^2 n bits are sufficient. We also give (1+Ï”1+\epsilon)-stretch labeling schemes using Θ(log⁥n)\Theta(\log n) bits for constant Ï”>0\epsilon>0. (1+Ï”1+\epsilon)-stretch labeling schemes with polylogarithmic label size have previously been established for doubling dimension graphs by Talwar (STOC 2004). In addition, we present matching upper and lower bounds for distance labeling for caterpillars, showing that labels must have size 2log⁥n−Θ(log⁥log⁥n)2\log n - \Theta(\log\log n). For simple paths with kk nodes and edge weights in [1,n][1,n], we show that labels must have size k−1klog⁥n+Θ(log⁥k)\frac{k-1}{k}\log n+\Theta(\log k)

    On Succinct Representations of Binary Trees

    Full text link
    We observe that a standard transformation between \emph{ordinal} trees (arbitrary rooted trees with ordered children) and binary trees leads to interesting succinct binary tree representations. There are four symmetric versions of these transformations. Via these transformations we get four succinct representations of nn-node binary trees that use 2n+n/(log⁥n)O(1)2n + n/(\log n)^{O(1)} bits and support (among other operations) navigation, inorder numbering, one of pre- or post-order numbering, subtree size and lowest common ancestor (LCA) queries. The ability to support inorder numbering is crucial for the well-known range-minimum query (RMQ) problem on an array AA of nn ordered values. While this functionality, and more, is also supported in O(1)O(1) time using 2n+o(n)2n + o(n) bits by Davoodi et al.'s (\emph{Phil. Trans. Royal Soc. A} \textbf{372} (2014)) extension of a representation by Farzan and Munro (\emph{Algorithmica} \textbf{6} (2014)), their \emph{redundancy}, or the o(n)o(n) term, is much larger, and their approach may not be suitable for practical implementations. One of these transformations is related to the Zaks' sequence (S.~Zaks, \emph{Theor. Comput. Sci.} \textbf{10} (1980)) for encoding binary trees, and we thus provide the first succinct binary tree representation based on Zaks' sequence. Another of these transformations is equivalent to Fischer and Heun's (\emph{SIAM J. Comput.} \textbf{40} (2011)) \minheap\ structure for this problem. Yet another variant allows an encoding of the Cartesian tree of AA to be constructed from AA using only O(nlog⁥n)O(\sqrt{n} \log n) bits of working space.Comment: Journal version of part of COCOON 2012 pape

    Distance Oracles for Interval Graphs via Breadth-First Rank/Select in Succinct Trees

    Get PDF
    We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the mapping between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries – testing adjacency, computing node degrees, neighborhoods, and shortest paths – all in optimal time. Our technique also yields optimal distance oracles for proper interval graphs (unit-interval graphs) and circular-arc graphs. Our tree data structure supports all operations provided by different approaches in previous work, as well as mapping to and from level-order ranks and retrieving the last (first) internal node before (after) a given node in a level-order traversal, all in constant time

    Distance Oracles for Interval Graphs via Breadth-First Rank/Select in Succinct Trees

    Get PDF
    We present the first succinct distance oracles for (unweighted) interval graphs and related classes of graphs, using a novel succinct data structure for ordinal trees that supports the mapping between preorder (i.e., depth-first) ranks and level-order (breadth-first) ranks of nodes in constant time. Our distance oracles for interval graphs also support navigation queries - testing adjacency, computing node degrees, neighborhoods, and shortest paths - all in optimal time. Our technique also yields optimal distance oracles for proper interval graphs (unit-interval graphs) and circular-arc graphs. Our tree data structure supports all operations provided by different approaches in previous work, as well as mapping to and from level-order ranks and retrieving the last (first) internal node before (after) a given node in a level-order traversal, all in constant time

    Path and Ancestor Queries over Trees with Multidimensional Weight Vectors

    Get PDF
    We consider an ordinal tree T on n nodes, with each node assigned a d-dimensional weight vector w in {1,2,...,n}^d, where d in N is a constant. We study path queries as generalizations of well-known {orthogonal range queries}, with one of the dimensions being tree topology rather than a linear order. Since in our definitions d only represents the number of dimensions of the weight vector without taking the tree topology into account, a path query in a tree with d-dimensional weight vectors generalize the corresponding (d+1)-dimensional orthogonal range query. We solve {ancestor dominance reporting} problem as a direct generalization of dominance reporting problem, in time O(lg^{d-1}{n}+k) and space of O(n lg^{d-2}n) words, where k is the size of the output, for d >= 2. We also achieve a tradeoff of O(n lg^{d-2+epsilon}{n}) words of space, with query time of O((lg^{d-1} n)/(lg lg n)^{d-2}+k), for the same problem, when d >= 3. We solve {path successor problem} in O(n lg^{d-1}{n}) words of space and time O(lg^{d-1+epsilon}{n}) for d >= 1 and an arbitrary constant epsilon > 0. We propose a solution to {path counting problem}, with O(n(lg{n}/lg lg{n})^{d-1}) words of space and O((lg{n}/lg lg{n})^{d}) query time, for d >= 1. Finally, we solve {path reporting problem} in O(n lg^{d-1+epsilon}{n}) words of space and O((lg^{d-1}{n})/(lg lg{n})^{d-2}+k) query time, for d >= 2. These results match or nearly match the best tradeoffs of the respective range queries. We are also the first to solve path successor even for d = 1

    Faster repetition-aware compressed suffix trees based on Block Trees

    Get PDF
    The suffix tree is a fundamental data structure in stringology, but its space usage, though linear, is an important problem in applications like Bioinformatics. We design and implement a new compressed suffix tree (CST) targeted to highly repetitive texts, such as large genomic collections of the same species. Our first contribution is to enhance the Block Tree, a data structure that captures the repetitiveness of its input sequence, to represent the topology of trees with large repeated subtrees. Our so-called Block-Tree Compressed Topology (BT-CT) data structure augments the Block Tree nodes with data that speeds up tree navigation. Our Block-Tree CST (BT-CST), in turn, uses the BT-CT to compress the topology of the suffix tree, and also replaces the sampling of the suffix array and its inverse with grammar-and/or Block-Tree-based representations of those arrays. Our experimental results show that BT-CTs reach navigation speeds comparable to compact tree representations that are insensitive to repetitiveness, while using 2-10 times less space on the topologies of the suffix trees of repetitive collections. Our BT-CST is slightly larger than previous repetition-aware suffix trees based on grammar-compressed topologies, but outperforms them in time, often by orders of magnitude.(c) 2021 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).Peer reviewe

    Transforming data by calculation

    Get PDF
    Thispaperaddressesthefoundationsofdata-modeltransformation.A catalog of data mappings is presented which includes abstraction and representa- tion relations and associated constraints. These are justified in an algebraic style via the pointfree-transform, a technique whereby predicates are lifted to binary relation terms (of the algebra of programming) in a two-level style encompassing both data and operations. This approach to data calculation, which also includes transformation of recursive data models into “flat” database schemes, is offered as alternative to standard database design from abstract models. The calculus is also used to establish a link between the proposed transformational style and bidi- rectional lenses developed in the context of the classical view-update problem.Fundação para a CiĂȘncia e a Tecnologia (FCT

    On Tackling Explanation Redundancy in Decision Trees

    Full text link
    Decision trees (DTs) epitomize the ideal of interpretability of machine learning (ML) models. The interpretability of decision trees motivates explainability approaches by so-called intrinsic interpretability, and it is at the core of recent proposals for applying interpretable ML models in high-risk applications. The belief in DT interpretability is justified by the fact that explanations for DT predictions are generally expected to be succinct. Indeed, in the case of DTs, explanations correspond to DT paths. Since decision trees are ideally shallow, and so paths contain far fewer features than the total number of features, explanations in DTs are expected to be succinct, and hence interpretable. This paper offers both theoretical and experimental arguments demonstrating that, as long as interpretability of decision trees equates with succinctness of explanations, then decision trees ought not be deemed interpretable. The paper introduces logically rigorous path explanations and path explanation redundancy, and proves that there exist functions for which decision trees must exhibit paths with arbitrarily large explanation redundancy. The paper also proves that only a very restricted class of functions can be represented with DTs that exhibit no explanation redundancy. In addition, the paper includes experimental results substantiating that path explanation redundancy is observed ubiquitously in decision trees, including those obtained using different tree learning algorithms, but also in a wide range of publicly available decision trees. The paper also proposes polynomial-time algorithms for eliminating path explanation redundancy, which in practice require negligible time to compute. Thus, these algorithms serve to indirectly attain irreducible, and so succinct, explanations for decision trees

    Compressing and Performing Algorithms on Massively Large Networks

    Get PDF
    Networks are represented as a set of nodes (vertices) and the arcs (links) connecting them. Such networks can model various real-world structures such as social networks (e.g., Facebook), information networks (e.g., citation networks), technological networks (e.g., the Internet), and biological networks (e.g., gene-phenotype network). Analysis of such structures is a heavily studied area with many applications. However, in this era of big data, we find ourselves with networks so massive that the space requirements inhibit network analysis. Since many of these networks have nodes and arcs on the order of billions to trillions, even basic data structures such as adjacency lists could cost petabytes to zettabytes of storage. Storing these networks in secondary memory would require I/O access (i.e., disk access) during analysis, thus drastically slowing analysis time. To perform analysis efficiently on such extensive data, we either need enough main memory for the data structures and algorithms, or we need to develop compressions that require much less space while still being able to answer queries efficiently. In this dissertation, we develop several compression techniques that succinctly represent these real-world networks while still being able to efficiently query the network (e.g., check if an arc exists between two nodes). Furthermore, since many of these networks continue to grow over time, our compression techniques also support the ability to add and remove nodes and edges directly on the compressed structure. We also provide a way to compress the data quickly without any intermediate structure, thus giving minimal memory overhead. We provide detailed analysis and prove that our compression is indeed succinct (i.e., achieves the information-theoretic lower bound). Also, we empirically show that our compression rates outperform or are equal to existing compression algorithms on many benchmark datasets. We also extend our technique to time-evolving networks. That is, we store the entire state of the network at each time frame. Studying time-evolving networks allows us to find patterns throughout the time that would not be available in regular, static network analysis. A succinct representation for time-evolving networks is arguably more important than static graphs, due to the extra dimension inflating the space requirements of basic data structures even more. Again, we manage to achieve succinctness while also providing fast encoding, minimal memory overhead during encoding, fast queries, and fast, direct modification. We also compare against several benchmarks and empirically show that we achieve compression rates better than or equal to the best performing benchmark for each dataset. Finally, we also develop both static and time-evolving algorithms that run directly on our compressed structures. Using our static graph compression combined with our differential technique, we find that we can speed up matrix-vector multiplication by reusing previously computed products. We compare our results against a similar technique using the Webgraph Framework, and we see that not only are our base query speeds faster, but we also gain a more significant speed-up from reusing products. Then, we use our time-evolving compression to solve the earliest arrival paths problem and time-evolving transitive closure. We found that not only were we the first to run such algorithms directly on compressed data, but that our technique was particularly efficient at doing so
    • 

    corecore