193 research outputs found

    Efficiently Generating Geometric Inhomogeneous and Hyperbolic Random Graphs

    Get PDF
    Hyperbolic random graphs (HRG) and geometric inhomogeneous random graphs (GIRG) are two similar generative network models that were designed to resemble complex real world networks. In particular, they have a power-law degree distribution with controllable exponent beta, and high clustering that can be controlled via the temperature T. We present the first implementation of an efficient GIRG generator running in expected linear time. Besides varying temperatures, it also supports underlying geometries of higher dimensions. It is capable of generating graphs with ten million edges in under a second on commodity hardware. The algorithm can be adapted to HRGs. Our resulting implementation is the fastest sequential HRG generator, despite the fact that we support non-zero temperatures. Though non-zero temperatures are crucial for many applications, most existing generators are restricted to T = 0. We also support parallelization, although this is not the focus of this paper. Moreover, we note that our generators draw from the correct probability distribution, i.e., they involve no approximation. Besides the generators themselves, we also provide an efficient algorithm to determine the non-trivial dependency between the average degree of the resulting graph and the input parameters of the GIRG model. This makes it possible to specify the desired expected average degree as input. Moreover, we investigate the differences between HRGs and GIRGs, shedding new light on the nature of the relation between the two models. Although HRGs represent, in a certain sense, a special case of the GIRG model, we find that a straight-forward inclusion does not hold in practice. However, the difference is negligible for most use cases

    A Note on the Practicality of Maximal Planar Subgraph Algorithms

    Full text link
    Given a graph GG, the NP-hard Maximum Planar Subgraph problem (MPS) asks for a planar subgraph of GG with the maximum number of edges. There are several heuristic, approximative, and exact algorithms to tackle the problem, but---to the best of our knowledge---they have never been compared competitively in practice. We report on an exploratory study on the relative merits of the diverse approaches, focusing on practical runtime, solution quality, and implementation complexity. Surprisingly, a seemingly only theoretically strong approximation forms the building block of the strongest choice.Comment: Appears in the Proceedings of the 24th International Symposium on Graph Drawing and Network Visualization (GD 2016

    Document Retrieval on Repetitive Collections

    Full text link
    Document retrieval aims at finding the most important documents where a pattern appears in a collection of strings. Traditional pattern-matching techniques yield brute-force document retrieval solutions, which has motivated the research on tailored indexes that offer near-optimal performance. However, an experimental study establishing which alternatives are actually better than brute force, and which perform best depending on the collection characteristics, has not been carried out. In this paper we address this shortcoming by exploring the relationship between the nature of the underlying collection and the performance of current methods. Via extensive experiments we show that established solutions are often beaten in practice by brute-force alternatives. We also design new methods that offer superior time/space trade-offs, particularly on repetitive collections.Comment: Accepted to ESA 2014. Implementation and experiments at http://www.cs.helsinki.fi/group/suds/rlcsa

    Lightweight Lempel-Ziv Parsing

    Full text link
    We introduce a new approach to LZ77 factorization that uses O(n/d) words of working space and O(dn) time for any d >= 1 (for polylogarithmic alphabet sizes). We also describe carefully engineered implementations of alternative approaches to lightweight LZ77 factorization. Extensive experiments show that the new algorithm is superior in most cases, particularly at the lowest memory levels and for highly repetitive data. As a part of the algorithm, we describe new methods for computing matching statistics which may be of independent interest.Comment: 12 page

    Suffix Tree of Alignment: An Efficient Index for Similar Data

    Full text link
    We consider an index data structure for similar strings. The generalized suffix tree can be a solution for this. The generalized suffix tree of two strings AA and BB is a compacted trie representing all suffixes in AA and BB. It has A+B|A|+|B| leaves and can be constructed in O(A+B)O(|A|+|B|) time. However, if the two strings are similar, the generalized suffix tree is not efficient because it does not exploit the similarity which is usually represented as an alignment of AA and BB. In this paper we propose a space/time-efficient suffix tree of alignment which wisely exploits the similarity in an alignment. Our suffix tree for an alignment of AA and BB has A+ld+l1|A| + l_d + l_1 leaves where ldl_d is the sum of the lengths of all parts of BB different from AA and l1l_1 is the sum of the lengths of some common parts of AA and BB. We did not compromise the pattern search to reduce the space. Our suffix tree can be searched for a pattern PP in O(P+occ)O(|P|+occ) time where occocc is the number of occurrences of PP in AA and BB. We also present an efficient algorithm to construct the suffix tree of alignment. When the suffix tree is constructed from scratch, the algorithm requires O(A+ld+l1+l2)O(|A| + l_d + l_1 + l_2) time where l2l_2 is the sum of the lengths of other common substrings of AA and BB. When the suffix tree of AA is already given, it requires O(ld+l1+l2)O(l_d + l_1 + l_2) time.Comment: 12 page

    Computing Covers Using Prefix Tables

    Get PDF
    An \emph{indeterminate string} x=x[1..n]x = x[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma; xx is said to be \emph{regular} if every subset is of size one. A proper substring uu of regular xx is said to be a \emph{cover} of xx iff for every i1..ni \in 1..n, an occurrence of uu in xx includes x[i]x[i]. The \emph{cover array} γ=γ[1..n]\gamma = \gamma[1..n] of xx is an integer array such that γ[i]\gamma[i] is the longest cover of x[1..i]x[1..i]. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular xx based on prior computation of the border array of xx. In this paper we first describe a linear-time algorithm to compute the cover array of regular string xx based on the prefix table of xx. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

    Strengthened Lazy Heaps: Surpassing the Lower Bounds for Binary Heaps

    Full text link
    Let nn denote the number of elements currently in a data structure. An in-place heap is stored in the first nn locations of an array, uses O(1)O(1) extra space, and supports the operations: minimum, insert, and extract-min. We introduce an in-place heap, for which minimum and insert take O(1)O(1) worst-case time, and extract-min takes O(lgn)O(\lg{} n) worst-case time and involves at most lgn+O(1)\lg{} n + O(1) element comparisons. The achieved bounds are optimal to within additive constant terms for the number of element comparisons. In particular, these bounds for both insert and extract-min -and the time bound for insert- surpass the corresponding lower bounds known for binary heaps, though our data structure is similar. In a binary heap, when viewed as a nearly complete binary tree, every node other than the root obeys the heap property, i.e. the element at a node is not smaller than that at its parent. To surpass the lower bound for extract-min, we reinforce a stronger property at the bottom levels of the heap that the element at any right child is not smaller than that at its left sibling. To surpass the lower bound for insert, we buffer insertions and allow O(lg2n)O(\lg^2{} n) nodes to violate heap order in relation to their parents
    corecore