6,573 research outputs found
Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers
We analyze the caching overhead incurred by a class of multithreaded
algorithms when scheduled by an arbitrary scheduler. We obtain bounds that
match or improve upon the well-known caching cost for the
randomized work stealing (RWS) scheduler, where is the number of steals,
is the sequential caching cost, and and are the cache size and
block (or cache line) size respectively.Comment: Extended abstract in Proceedings of ACM Symp. on Parallel Alg. and
Architectures (SPAA) 2017, pp. 339-350. This revision has a few small updates
including a missing citation and the replacement of some big Oh terms with
precise constant
Faster Algorithms for Computing Maximal 2-Connected Subgraphs in Sparse Directed Graphs
Connectivity related concepts are of fundamental interest in graph theory.
The area has received extensive attention over four decades, but many problems
remain unsolved, especially for directed graphs. A directed graph is
2-edge-connected (resp., 2-vertex-connected) if the removal of any edge (resp.,
vertex) leaves the graph strongly connected. In this paper we present improved
algorithms for computing the maximal 2-edge- and 2-vertex-connected subgraphs
of a given directed graph. These problems were first studied more than 35 years
ago, with time algorithms for graphs with m edges and n
vertices being known since the late 1980s. In contrast, the same problems for
undirected graphs are known to be solvable in linear time. Henzinger et al.
[ICALP 2015] recently introduced time algorithms for the directed
case, thus improving the running times for dense graphs. Our new algorithms run
in time , which further improves the running times for sparse
graphs.
The notion of 2-connectivity naturally generalizes to k-connectivity for
. For constant values of k, we extend one of our algorithms to compute the
maximal k-edge-connected in time , improving again for
sparse graphs the best known algorithm by Henzinger et al. [ICALP 2015] that
runs in time.Comment: Revised version of SODA 2017 paper including details for
k-edge-connected subgraph
Fast and Tiny Structural Self-Indexes for XML
XML document markup is highly repetitive and therefore well compressible
using dictionary-based methods such as DAGs or grammars. In the context of
selectivity estimation, grammar-compressed trees were used before as synopsis
for structural XPath queries. Here a fully-fledged index over such grammars is
presented. The index allows to execute arbitrary tree algorithms with a
slow-down that is comparable to the space improvement. More interestingly,
certain algorithms execute much faster over the index (because no decompression
occurs). E.g., for structural XPath count queries, evaluating over the index is
faster than previous XPath implementations, often by two orders of magnitude.
The index also allows to serialize XML results (including texts) faster than
previous systems, by a factor of ca. 2-3. This is due to efficient copy
handling of grammar repetitions, and because materialization is totally
avoided. In order to compare with twig join implementations, we implemented a
materializer which writes out pre-order numbers of result nodes, and show its
competitiveness.Comment: 13 page
Binary Decision Diagrams: from Tree Compaction to Sampling
Any Boolean function corresponds with a complete full binary decision tree.
This tree can in turn be represented in a maximally compact form as a direct
acyclic graph where common subtrees are factored and shared, keeping only one
copy of each unique subtree. This yields the celebrated and widely used
structure called reduced ordered binary decision diagram (ROBDD). We propose to
revisit the classical compaction process to give a new way of enumerating
ROBDDs of a given size without considering fully expanded trees and the
compaction step. Our method also provides an unranking procedure for the set of
ROBDDs. As a by-product we get a random uniform and exhaustive sampler for
ROBDDs for a given number of variables and size
An efficient multi-core implementation of a novel HSS-structured multifrontal solver using randomized sampling
We present a sparse linear system solver that is based on a multifrontal
variant of Gaussian elimination, and exploits low-rank approximation of the
resulting dense frontal matrices. We use hierarchically semiseparable (HSS)
matrices, which have low-rank off-diagonal blocks, to approximate the frontal
matrices. For HSS matrix construction, a randomized sampling algorithm is used
together with interpolative decompositions. The combination of the randomized
compression with a fast ULV HSS factorization leads to a solver with lower
computational complexity than the standard multifrontal method for many
applications, resulting in speedups up to 7 fold for problems in our test
suite. The implementation targets many-core systems by using task parallelism
with dynamic runtime scheduling. Numerical experiments show performance
improvements over state-of-the-art sparse direct solvers. The implementation
achieves high performance and good scalability on a range of modern shared
memory parallel systems, including the Intel Xeon Phi (MIC). The code is part
of a software package called STRUMPACK -- STRUctured Matrices PACKage, which
also has a distributed memory component for dense rank-structured matrices
New Algorithms for Position Heaps
We present several results about position heaps, a relatively new alternative
to suffix trees and suffix arrays. First, we show that, if we limit the maximum
length of patterns to be sought, then we can also limit the height of the heap
and reduce the worst-case cost of insertions and deletions. Second, we show how
to build a position heap in linear time independent of the size of the
alphabet. Third, we show how to augment a position heap such that it supports
access to the corresponding suffix array, and vice versa. Fourth, we introduce
a variant of a position heap that can be simulated efficiently by a compressed
suffix array with a linear number of extra bits
- …