2,736 research outputs found
Experimental analysis of space-bounded schedulers
ABSTRACT The running time of nested parallel programs on shared memory machines depends in significant part on how well the scheduler mapping the program to the machine is optimized for the organization of caches and processors on the machine. Recent work proposed "space-bounded schedulers" for scheduling such programs on the multi-level cache hierarchies of current machines. The main benefit of this class of schedulers is that they provably preserve locality of the program at every level in the hierarchy, resulting (in theory) in fewer cache misses and better use of bandwidth than the popular work-stealing scheduler. On the other hand, compared to work-stealing, space-bounded schedulers are inferior at load balancing and may have greater scheduling overheads, raising the question as to the relative effectiveness of the two schedulers in practice. In this paper, we provide the first experimental study aimed at addressing this question. To facilitate this study, we built a flexible experimental framework with separate interfaces for programs and schedulers. This enables a headto-head comparison of the relative strengths of schedulers in terms of running times and cache miss counts across a range of benchmarks. (The framework is validated by comparisons with the Intel R Cilk TM Plus work-stealing scheduler.) We present experimental results on a 32-core Xeon R 7560 comparing work-stealing, hierarchy-minded work-stealing, and two variants of space-bounded schedulers on both divideand-conquer micro-benchmarks and some popular algorithmic kernels. Our results indicate that space-bounded schedulers reduce the number of L3 cache misses compared to work-stealing schedulers by 25-65% for most of the benchmarks, but incur up to 7% additional scheduler and loadimbalance overhead. Only for memory-intensive benchmarks can the reduction in cache misses overcome the added overhead, resulting in up to a 25% improvement in running time for synthetic benchmarks and about 20% improvement for algorithmic kernels. We also quantify runtime improvements ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the national government of United States. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. varying the available bandwidth per core (the "bandwidth gap"), and show up to 50% improvements in the running times of kernels as this gap increases 4-fold. As part of our study, we generalize prior definitions of space-bounded schedulers to allow for more practical variants (while still preserving their guarantees), and explore implementation tradeoffs
MeGARA: Menu-based Game Abstraction and Abstraction Refinement of Markov Automata
Markov automata combine continuous time, probabilistic transitions, and
nondeterminism in a single model. They represent an important and powerful way
to model a wide range of complex real-life systems. However, such models tend
to be large and difficult to handle, making abstraction and abstraction
refinement necessary. In this paper we present an abstraction and abstraction
refinement technique for Markov automata, based on the game-based and
menu-based abstraction of probabilistic automata. First experiments show that a
significant reduction in size is possible using abstraction.Comment: In Proceedings QAPL 2014, arXiv:1406.156
Algorithmic Analysis of Qualitative and Quantitative Termination Problems for Affine Probabilistic Programs
In this paper, we consider termination of probabilistic programs with
real-valued variables. The questions concerned are:
1. qualitative ones that ask (i) whether the program terminates with
probability 1 (almost-sure termination) and (ii) whether the expected
termination time is finite (finite termination); 2. quantitative ones that ask
(i) to approximate the expected termination time (expectation problem) and (ii)
to compute a bound B such that the probability to terminate after B steps
decreases exponentially (concentration problem).
To solve these questions, we utilize the notion of ranking supermartingales
which is a powerful approach for proving termination of probabilistic programs.
In detail, we focus on algorithmic synthesis of linear ranking-supermartingales
over affine probabilistic programs (APP's) with both angelic and demonic
non-determinism. An important subclass of APP's is LRAPP which is defined as
the class of all APP's over which a linear ranking-supermartingale exists.
Our main contributions are as follows. Firstly, we show that the membership
problem of LRAPP (i) can be decided in polynomial time for APP's with at most
demonic non-determinism, and (ii) is NP-hard and in PSPACE for APP's with
angelic non-determinism; moreover, the NP-hardness result holds already for
APP's without probability and demonic non-determinism. Secondly, we show that
the concentration problem over LRAPP can be solved in the same complexity as
for the membership problem of LRAPP. Finally, we show that the expectation
problem over LRAPP can be solved in 2EXPTIME and is PSPACE-hard even for APP's
without probability and non-determinism (i.e., deterministic programs). Our
experimental results demonstrate the effectiveness of our approach to answer
the qualitative and quantitative questions over APP's with at most demonic
non-determinism.Comment: 24 pages, full version to the conference paper on POPL 201
Extending the Nested Parallel Model to the Nested Dataflow Model with Provably Efficient Schedulers
The nested parallel (a.k.a. fork-join) model is widely used for writing
parallel programs. However, the two composition constructs, i.e. ""
(parallel) and "" (serial), are insufficient in expressing "partial
dependencies" or "partial parallelism" in a program. We propose a new dataflow
composition construct "" to express partial dependencies in
algorithms in a processor- and cache-oblivious way, thus extending the Nested
Parallel (NP) model to the \emph{Nested Dataflow} (ND) model. We redesign
several divide-and-conquer algorithms ranging from dense linear algebra to
dynamic-programming in the ND model and prove that they all have optimal span
while retaining optimal cache complexity. We propose the design of runtime
schedulers that map ND programs to multicore processors with multiple levels of
possibly shared caches (i.e, Parallel Memory Hierarchies) and provide
theoretical guarantees on their ability to preserve locality and load balance.
For this, we adapt space-bounded (SB) schedulers for the ND model. We show that
our algorithms have increased "parallelizability" in the ND model, and that SB
schedulers can use the extra parallelizability to achieve asymptotically
optimal bounds on cache misses and running time on a greater number of
processors than in the NP model. The running time for the algorithms in this
paper is , where is the cache complexity of task ,
is the cost of cache miss at level- cache which is of size ,
is a constant, and is the number of processors in an
-level cache hierarchy
Transient Reward Approximation for Continuous-Time Markov Chains
We are interested in the analysis of very large continuous-time Markov chains
(CTMCs) with many distinct rates. Such models arise naturally in the context of
reliability analysis, e.g., of computer network performability analysis, of
power grids, of computer virus vulnerability, and in the study of crowd
dynamics. We use abstraction techniques together with novel algorithms for the
computation of bounds on the expected final and accumulated rewards in
continuous-time Markov decision processes (CTMDPs). These ingredients are
combined in a partly symbolic and partly explicit (symblicit) analysis
approach. In particular, we circumvent the use of multi-terminal decision
diagrams, because the latter do not work well if facing a large number of
different rates. We demonstrate the practical applicability and efficiency of
the approach on two case studies.Comment: Accepted for publication in IEEE Transactions on Reliabilit
Maximizing the Conditional Expected Reward for Reaching the Goal
The paper addresses the problem of computing maximal conditional expected
accumulated rewards until reaching a target state (briefly called maximal
conditional expectations) in finite-state Markov decision processes where the
condition is given as a reachability constraint. Conditional expectations of
this type can, e.g., stand for the maximal expected termination time of
probabilistic programs with non-determinism, under the condition that the
program eventually terminates, or for the worst-case expected penalty to be
paid, assuming that at least three deadlines are missed. The main results of
the paper are (i) a polynomial-time algorithm to check the finiteness of
maximal conditional expectations, (ii) PSPACE-completeness for the threshold
problem in acyclic Markov decision processes where the task is to check whether
the maximal conditional expectation exceeds a given threshold, (iii) a
pseudo-polynomial-time algorithm for the threshold problem in the general
(cyclic) case, and (iv) an exponential-time algorithm for computing the maximal
conditional expectation and an optimal scheduler.Comment: 103 pages, extended version with appendices of a paper accepted at
TACAS 201
Bounded Model Checking for Probabilistic Programs
In this paper we investigate the applicability of standard model checking
approaches to verifying properties in probabilistic programming. As the
operational model for a standard probabilistic program is a potentially
infinite parametric Markov decision process, no direct adaption of existing
techniques is possible. Therefore, we propose an on-the-fly approach where the
operational model is successively created and verified via a step-wise
execution of the program. This approach enables to take key features of many
probabilistic programs into account: nondeterminism and conditioning. We
discuss the restrictions and demonstrate the scalability on several benchmarks
- …