2,191 research outputs found
RoBuSt: A Crash-Failure-Resistant Distributed Storage System
In this work we present the first distributed storage system that is provably
robust against crash failures issued by an adaptive adversary, i.e., for each
batch of requests the adversary can decide based on the entire system state
which servers will be unavailable for that batch of requests. Despite up to
crashed servers, with constant and
denoting the number of servers, our system can correctly process any batch of
lookup and write requests (with at most a polylogarithmic number of requests
issued at each non-crashed server) in at most a polylogarithmic number of
communication rounds, with at most polylogarithmic time and work at each server
and only a logarithmic storage overhead.
Our system is based on previous work by Eikel and Scheideler (SPAA 2013), who
presented IRIS, a distributed information system that is provably robust
against the same kind of crash failures. However, IRIS is only able to serve
lookup requests. Handling both lookup and write requests has turned out to
require major changes in the design of IRIS.Comment: Revised full versio
Bounding Cache Miss Costs of Multithreaded Computations Under General Schedulers
We analyze the caching overhead incurred by a class of multithreaded
algorithms when scheduled by an arbitrary scheduler. We obtain bounds that
match or improve upon the well-known caching cost for the
randomized work stealing (RWS) scheduler, where is the number of steals,
is the sequential caching cost, and and are the cache size and
block (or cache line) size respectively.Comment: Extended abstract in Proceedings of ACM Symp. on Parallel Alg. and
Architectures (SPAA) 2017, pp. 339-350. This revision has a few small updates
including a missing citation and the replacement of some big Oh terms with
precise constant
Well-Structured Futures and Cache Locality
In fork-join parallelism, a sequential program is split into a directed
acyclic graph of tasks linked by directed dependency edges, and the tasks are
executed, possibly in parallel, in an order consistent with their dependencies.
A popular and effective way to extend fork-join parallelism is to allow threads
to create futures. A thread creates a future to hold the results of a
computation, which may or may not be executed in parallel. That result is
returned when some thread touches that future, blocking if necessary until the
result is ready.
Recent research has shown that while futures can, of course, enhance
parallelism in a structured way, they can have a deleterious effect on cache
locality. In the worst case, futures can incur deviations, which implies
additional cache misses, where is the number of cache lines, is the
number of processors, is the number of touches, and is the
\emph{computation span}. Since cache locality has a large impact on software
performance on modern multicores, this result is troubling.
In this paper, however, we show that if futures are used in a simple,
disciplined way, then the situation is much better: if each future is touched
only once, either by the thread that created it, or by a thread to which the
future has been passed from the thread that created it, then parallel
executions with work stealing can incur at most additional
cache misses, a substantial improvement. This structured use of futures is
characteristic of many (but not all) parallel applications
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
- β¦