6,573 research outputs found

    Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers

    Full text link
    We analyze the caching overhead incurred by a class of multithreaded algorithms when scheduled by an arbitrary scheduler. We obtain bounds that match or improve upon the well-known O(Q+S(M/B))O(Q+S \cdot (M/B)) caching cost for the randomized work stealing (RWS) scheduler, where SS is the number of steals, QQ is the sequential caching cost, and MM and BB are the cache size and block (or cache line) size respectively.Comment: Extended abstract in Proceedings of ACM Symp. on Parallel Alg. and Architectures (SPAA) 2017, pp. 339-350. This revision has a few small updates including a missing citation and the replacement of some big Oh terms with precise constant

    Faster Algorithms for Computing Maximal 2-Connected Subgraphs in Sparse Directed Graphs

    Full text link
    Connectivity related concepts are of fundamental interest in graph theory. The area has received extensive attention over four decades, but many problems remain unsolved, especially for directed graphs. A directed graph is 2-edge-connected (resp., 2-vertex-connected) if the removal of any edge (resp., vertex) leaves the graph strongly connected. In this paper we present improved algorithms for computing the maximal 2-edge- and 2-vertex-connected subgraphs of a given directed graph. These problems were first studied more than 35 years ago, with O~(mn)\widetilde{O}(mn) time algorithms for graphs with m edges and n vertices being known since the late 1980s. In contrast, the same problems for undirected graphs are known to be solvable in linear time. Henzinger et al. [ICALP 2015] recently introduced O(n2)O(n^2) time algorithms for the directed case, thus improving the running times for dense graphs. Our new algorithms run in time O(m3/2)O(m^{3/2}), which further improves the running times for sparse graphs. The notion of 2-connectivity naturally generalizes to k-connectivity for k>2k>2. For constant values of k, we extend one of our algorithms to compute the maximal k-edge-connected in time O(m3/2logn)O(m^{3/2} \log{n}), improving again for sparse graphs the best known algorithm by Henzinger et al. [ICALP 2015] that runs in O(n2logn)O(n^2 \log n) time.Comment: Revised version of SODA 2017 paper including details for k-edge-connected subgraph

    Fast and Tiny Structural Self-Indexes for XML

    Full text link
    XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural XPath queries. Here a fully-fledged index over such grammars is presented. The index allows to execute arbitrary tree algorithms with a slow-down that is comparable to the space improvement. More interestingly, certain algorithms execute much faster over the index (because no decompression occurs). E.g., for structural XPath count queries, evaluating over the index is faster than previous XPath implementations, often by two orders of magnitude. The index also allows to serialize XML results (including texts) faster than previous systems, by a factor of ca. 2-3. This is due to efficient copy handling of grammar repetitions, and because materialization is totally avoided. In order to compare with twig join implementations, we implemented a materializer which writes out pre-order numbers of result nodes, and show its competitiveness.Comment: 13 page

    Binary Decision Diagrams: from Tree Compaction to Sampling

    Full text link
    Any Boolean function corresponds with a complete full binary decision tree. This tree can in turn be represented in a maximally compact form as a direct acyclic graph where common subtrees are factored and shared, keeping only one copy of each unique subtree. This yields the celebrated and widely used structure called reduced ordered binary decision diagram (ROBDD). We propose to revisit the classical compaction process to give a new way of enumerating ROBDDs of a given size without considering fully expanded trees and the compaction step. Our method also provides an unranking procedure for the set of ROBDDs. As a by-product we get a random uniform and exhaustive sampler for ROBDDs for a given number of variables and size

    An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling

    Full text link
    We present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination, and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factorization leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite. The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK -- STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices

    New Algorithms for Position Heaps

    Full text link
    We present several results about position heaps, a relatively new alternative to suffix trees and suffix arrays. First, we show that, if we limit the maximum length of patterns to be sought, then we can also limit the height of the heap and reduce the worst-case cost of insertions and deletions. Second, we show how to build a position heap in linear time independent of the size of the alphabet. Third, we show how to augment a position heap such that it supports access to the corresponding suffix array, and vice versa. Fourth, we introduce a variant of a position heap that can be simulated efficiently by a compressed suffix array with a linear number of extra bits
    corecore