25,929 research outputs found
An Almost Tight RMR Lower Bound for Abortable Test-And-Set
We prove a lower bound of Omega(log n/log log n) for the remote memory reference (RMR) complexity of abortable test-and-set (leader election) in the cache-coherent (CC) and the distributed shared memory (DSM) model. This separates the complexities of abortable and non-abortable test-and-set, as the latter has constant RMR complexity [Wojciech Golab et al., 2010].
Golab, Hendler, Hadzilacos and Woelfel [Wojciech M. Golab et al., 2012] showed that compare-and-swap can be implemented from registers and test-and-set objects with constant RMR complexity. We observe that a small modification to that implementation is abortable, provided that the used test-and-set objects are atomic (or abortable). As a consequence, using existing efficient randomized wait-free implementations of test-and-set [George Giakkoupis and Philipp Woelfel, 2012], we obtain randomized abortable compare-and-swap objects with almost constant (O(log^* n)) RMR complexity
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Training Big Random Forests with Little Resources
Without access to large compute clusters, building random forests on large
datasets is still a challenging problem. This is, in particular, the case if
fully-grown trees are desired. We propose a simple yet effective framework that
allows to efficiently construct ensembles of huge trees for hundreds of
millions or even billions of training instances using a cheap desktop computer
with commodity hardware. The basic idea is to consider a multi-level
construction scheme, which builds top trees for small random subsets of the
available data and which subsequently distributes all training instances to the
top trees' leaves for further processing. While being conceptually simple, the
overall efficiency crucially depends on the particular implementation of the
different phases. The practical merits of our approach are demonstrated using
dense datasets with hundreds of millions of training instances.Comment: 9 pages, 9 Figure
- …