689 research outputs found

    Succinct Permutation Graphs

    Get PDF
    We present a succinct, i.e., asymptotically space-optimal, data structure for permutation graphs that supports distance, adjacency, neighborhood and shortest-path queries in optimal time; a variant of our data structure also supports degree queries in time independent of the neighborhood's size at the expense of an O(logn/loglogn)O(\log n/\log \log n)-factor overhead in all running times. We show how to generalize our data structure to the class of circular permutation graphs with asymptotically no extra space, while supporting the same queries in optimal time. Furthermore, we develop a similar compact data structure for the special case of bipartite permutation graphs and conjecture that it is succinct for this class. We demonstrate how to execute algorithms directly over our succinct representations for several combinatorial problems on permutation graphs: Clique, Coloring, Independent Set, Hamiltonian Cycle, All-Pair Shortest Paths, and others. Moreover, we initiate the study of semi-local graph representations; a concept that "interpolates" between local labeling schemes and standard "centralized" data structures. We show how to turn some of our data structures into semi-local representations by storing only O(n)O(n) bits of additional global information, beating the lower bound on distance labeling schemes for permutation graphs

    Succinct Data Structures for Families of Interval Graphs

    Full text link
    We consider the problem of designing succinct data structures for interval graphs with nn vertices while supporting degree, adjacency, neighborhood and shortest path queries in optimal time in the Θ(logn)\Theta(\log n)-bit word RAM model. The degree query reports the number of incident edges to a given vertex in constant time, the adjacency query returns true if there is an edge between two vertices in constant time, the neighborhood query reports the set of all adjacent vertices in time proportional to the degree of the queried vertex, and the shortest path query returns a shortest path in time proportional to its length, thus the running times of these queries are optimal. Towards showing succinctness, we first show that at least nlogn2nloglognO(n)n\log{n} - 2n\log\log n - O(n) bits are necessary to represent any unlabeled interval graph GG with nn vertices, answering an open problem of Yang and Pippenger [Proc. Amer. Math. Soc. 2017]. This is augmented by a data structure of size nlogn+O(n)n\log{n} +O(n) bits while supporting not only the aforementioned queries optimally but also capable of executing various combinatorial algorithms (like proper coloring, maximum independent set etc.) on the input interval graph efficiently. Finally, we extend our ideas to other variants of interval graphs, for example, proper/unit interval graphs, k-proper and k-improper interval graphs, and circular-arc graphs, and design succinct/compact data structures for these graph classes as well along with supporting queries on them efficiently

    Tight and simple Web graph compression

    Full text link
    Analysing Web graphs has applications in determining page ranks, fighting Web spam, detecting communities and mirror sites, and more. This study is however hampered by the necessity of storing a major part of huge graphs in the external memory, which prevents efficient random access to edge (hyperlink) lists. A number of algorithms involving compression techniques have thus been presented, to represent Web graphs succinctly but also providing random access. Those techniques are usually based on differential encodings of the adjacency lists, finding repeating nodes or node regions in the successive lists, more general grammar-based transformations or 2-dimensional representations of the binary matrix of the graph. In this paper we present two Web graph compression algorithms. The first can be seen as engineering of the Boldi and Vigna (2004) method. We extend the notion of similarity between link lists, and use a more compact encoding of residuals. The algorithm works on blocks of varying size (in the number of input lines) and sacrifices access time for better compression ratio, achieving more succinct graph representation than other algorithms reported in the literature. The second algorithm works on blocks of the same size, in the number of input lines, and its key mechanism is merging the block into a single ordered list. This method achieves much more attractive space-time tradeoffs.Comment: 15 page

    Orderly Spanning Trees with Applications

    Full text link
    We introduce and study the {\em orderly spanning trees} of plane graphs. This algorithmic tool generalizes {\em canonical orderings}, which exist only for triconnected plane graphs. Although not every plane graph admits an orderly spanning tree, we provide an algorithm to compute an {\em orderly pair} for any connected planar graph GG, consisting of a plane graph HH of GG, and an orderly spanning tree of HH. We also present several applications of orderly spanning trees: (1) a new constructive proof for Schnyder's Realizer Theorem, (2) the first area-optimal 2-visibility drawing of GG, and (3) the best known encodings of GG with O(1)-time query support. All algorithms in this paper run in linear time.Comment: 25 pages, 7 figures, A preliminary version appeared in Proceedings of the 12th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2001), Washington D.C., USA, January 7-9, 2001, pp. 506-51

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo