Search CORE

118 research outputs found

Parallel Wavelet Tree Construction

Author: Shun Julian
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/04/2015
Field of study

We present parallel algorithms for wavelet tree construction with polylogarithmic depth, improving upon the linear depth of the recent parallel algorithms by Fuentes-Sepulveda et al. We experimentally show on a 40-core machine with two-way hyper-threading that we outperform the existing parallel algorithms by 1.3--5.6x and achieve up to 27x speedup over the sequential algorithm on a variety of real-world and artificial inputs. Our algorithms show good scalability with increasing thread count, input size and alphabet size. We also discuss extensions to variants of the standard wavelet tree.Comment: This is a longer version of the paper that appears in the Proceedings of the IEEE Data Compression Conference, 201

arXiv.org e-Print Archive

Crossref

Connected Spatial Networks over Random Points and a Route-Length Statistic

Author: Aldous David J.
Shun Julian
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

We review mathematically tractable models for connected networks on random points in the plane, emphasizing the class of proximity graphs which deserves to be better known to applied probabilists and statisticians. We introduce and motivate a particular statistic

R

measuring shortness of routes in a network. We illustrate, via Monte Carlo in part, the trade-off between normalized network length and

R

in a one-parameter family of proximity graphs. How close this family comes to the optimal trade-off over all possible networks remains an intriguing open question. The paper is a write-up of a talk developed by the first author during 2007--2009.Comment: Published in at http://dx.doi.org/10.1214/10-STS335 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

CiteSeerX

Crossref

eScholarship - University of California

Fast Arrays: Atomic Arrays with Constant Time Initialization

Author: Jayanti Siddhartha
Shun Julian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Distributed Computing (DISC 2021)
Publication date: 01/01/2021
Field of study

Some algorithms require a large array, but only operate on a small fraction of its indices. Examples include adjacency matrices for sparse graphs, hash tables, and van Emde Boas trees. For such algorithms, array initialization can be the most time-consuming operation. Fast arrays were invented to avoid this costly initialization. A fast array is a software implementation of an array, such that the entire array can be initialized in just constant time. While algorithms for sequential fast arrays have been known for a long time, to the best of our knowledge, there are no previous algorithms for concurrent fast arrays. We present the first such algorithms in this paper. Our first algorithm is linearizable and wait-free, uses only linear space, and supports all operations - initialize, read, and write - in constant time. Our second algorithm enhances the first to additionally support all the read-modify-write operations available in hardware (such as compare-and-swap) in constant time

Dagstuhl Research Online Publication Server

Parallel Index-Based Structural Graph Clustering and Its Approximation

Author: Dhulipala Laxman
Shun Julian
Tseng Tom
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/03/2021
Field of study

SCAN (Structural Clustering Algorithm for Networks) is a well-studied, widely used graph clustering algorithm. For large graphs, however, sequential SCAN variants are prohibitively slow, and parallel SCAN variants do not effectively share work among queries with different SCAN parameter settings. Since users of SCAN often explore many parameter settings to find good clusterings, it is worthwhile to precompute an index that speeds up queries. This paper presents a practical and provably efficient parallel index-based SCAN algorithm based on GS*-Index, a recent sequential algorithm. Our parallel algorithm improves upon the asymptotic work of the sequential algorithm by using integer sorting. It is also highly parallel, achieving logarithmic span (parallel time) for both index construction and clustering queries. Furthermore, we apply locality-sensitive hashing (LSH) to design a novel approximate SCAN algorithm and prove guarantees for its clustering behavior. We present an experimental evaluation of our algorithms on large real-world graphs. On a 48-core machine with two-way hyper-threading, our parallel index construction achieves 50--151

\times

speedup over the construction of GS*-Index. In fact, even on a single thread, our index construction algorithm is faster than GS*-Index. Our parallel index query implementation achieves 5--32

\times

speedup over GS*-Index queries across a range of SCAN parameter values, and our implementation is always faster than ppSCAN, a state-of-the-art parallel SCAN algorithm. Moreover, our experiments show that applying LSH results in faster index construction while maintaining good clustering quality

arXiv.org e-Print Archive

DSpace@MIT

ConnectIt: A Framework for Static and Incremental Parallel Graph Connectivity Algorithms

Author: Dhulipala Laxman
Hong Changwan
Shun Julian
Publication venue
Publication date: 10/08/2020
Field of study

Connected components is a fundamental kernel in graph applications due to its usefulness in measuring how well-connected a graph is, as well as its use as subroutines in many other graph algorithms. The fastest existing parallel multicore algorithms for connectivity are based on some form of edge sampling and/or linking and compressing trees. However, many combinations of these design choices have been left unexplored. In this paper, we design the ConnectIt framework, which provides different sampling strategies as well as various tree linking and compression schemes. ConnectIt enables us to obtain several hundred new variants of connectivity algorithms, most of which extend to computing spanning forest. In addition to static graphs, we also extend ConnectIt to support mixes of insertions and connectivity queries in the concurrent setting. We present an experimental evaluation of ConnectIt on a 72-core machine, which we believe is the most comprehensive evaluation of parallel connectivity algorithms to date. Compared to a collection of state-of-the-art static multicore algorithms, we obtain an average speedup of 37.4x (2.36x average speedup over the fastest existing implementation for each graph). Using ConnectIt, we are able to compute connectivity on the largest publicly-available graph (with over 3.5 billion vertices and 128 billion edges) in under 10 seconds using a 72-core machine, providing a 3.1x speedup over the fastest existing connectivity result for this graph, in any computational setting. For our incremental algorithms, we show that our algorithms can ingest graph updates at up to several billion edges per second. Finally, to guide the user in selecting the best variants in ConnectIt for different situations, we provide a detailed analysis of the different strategies in terms of their work and locality

arXiv.org e-Print Archive

DSpace@MIT

Parallel Five-Cycle Counting Algorithms

Author: Huang Louisa Ruixue
Shi Jessica
Shun Julian
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 19th International Symposium on Experimental Algorithms (SEA 2021)
Publication date: 01/01/2021
Field of study

Counting the frequency of subgraphs in large networks is a classic research question that reveals the underlying substructures of these networks for important applications. However, subgraph counting is a challenging problem, even for subgraph sizes as small as five, due to the combinatorial explosion in the number of possible occurrences. This paper focuses on the five-cycle, which is an important special case of five-vertex subgraph counting and one of the most difficult to count efficiently. We design two new parallel five-cycle counting algorithms and prove that they are work-efficient and achieve polylogarithmic span. Both algorithms are based on computing low out-degree orientations, which enables the efficient computation of directed two-paths and three-paths, and the algorithms differ in the ways in which they use this orientation to eliminate double-counting. We develop fast multicore implementations of the algorithms and propose a work scheduling optimization to improve their performance. Our experiments on a variety of real-world graphs using a 36-core machine with two-way hyper-threading show that our algorithms achieves 10-46x self-relative speed-up, outperform our serial benchmarks by 10-32x, and outperform the previous state-of-the-art serial algorithm by up to 818x

Dagstuhl Research Online Publication Server

A Parallel Batch-Dynamic Data Structure for the Closest Pair Problem

Author: Gu Yan
Shun Julian
Wang Yiqiu
Yu Shangdi
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 37th International Symposium on Computational Geometry (SoCG 2021)
Publication date: 01/01/2021
Field of study

We propose a theoretically-efficient and practical parallel batch-dynamic data structure for the closest pair problem. Our solution is based on a serial dynamic closest pair data structure by Golin et al., and supports batches of insertions and deletions in parallel. For a data set of size

n

, our data structure supports a batch of insertions or deletions of size

m

O(m(1+\log ((n+m)/m)))

expected work and

O(\log (n+m)\log^*(n+m))

depth with high probability, and takes linear space. The key techniques for achieving these bounds are a new work-efficient parallel batch-dynamic binary heap, and careful management of the computation across sets of points to minimize work and depth. We provide an optimized multicore implementation of our data structure using dynamic hash tables, parallel heaps, and dynamic

k

-d trees. Our experiments on a variety of synthetic and real-world data sets show that it achieves a parallel speedup of up to 38.57x (15.10x on average) on 48 cores with hyper-threading. In addition, we also implement and compare four parallel algorithms for static closest pair problem, for which we are not aware of any existing practical implementations. On 48 cores with hyper-threading, the static algorithms achieve up to 51.45x (29.42x on average) speedup, and Rabin's algorithm performs the best on average. Comparing our dynamic algorithm to the fastest static algorithm, we find that it is advantageous to use the dynamic algorithm for batch sizes of up to 20\% of the data set. As far as we know, our work is the first to experimentally evaluate parallel closest pair algorithms, in both the static and the dynamic settings

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Implicit Decomposition for Write-Efficient Connectivity Algorithms

Author: Ben-David Naama
Blelloch Guy E.
Fineman Jeremy T.
Gibbons Phillip B.
Gu Yan
McGuffey Charles
Shun Julian
Publication venue
Publication date: 07/10/2017
Field of study

The future of main memory appears to lie in the direction of new technologies that provide strong capacity-to-performance ratios, but have write operations that are much more expensive than reads in terms of latency, bandwidth, and energy. Motivated by this trend, we propose sequential and parallel algorithms to solve graph connectivity problems using significantly fewer writes than conventional algorithms. Our primary algorithmic tool is the construction of an

o(n)

-sized "implicit decomposition" of a bounded-degree graph

G

n

nodes, which combined with read-only access to

G

enables fast answers to connectivity and biconnectivity queries on

G

. The construction breaks the linear-write "barrier", resulting in costs that are asymptotically lower than conventional algorithms while adding only a modest cost to querying time. For general non-sparse graphs on

m

edges, we also provide the first

o(m)

writes and

O(m)

operations parallel algorithms for connectivity and biconnectivity. These algorithms provide insight into how applications can efficiently process computations on large graphs in systems with read-write asymmetry

arXiv.org e-Print Archive

Crossref

DSpace@MIT