11,057 research outputs found
A Graph-Based Semantics Workbench for Concurrent Asynchronous Programs
A number of novel programming languages and libraries have been proposed that
offer simpler-to-use models of concurrency than threads. It is challenging,
however, to devise execution models that successfully realise their
abstractions without forfeiting performance or introducing unintended
behaviours. This is exemplified by SCOOP---a concurrent object-oriented
message-passing language---which has seen multiple semantics proposed and
implemented over its evolution. We propose a "semantics workbench" with fully
and semi-automatic tools for SCOOP, that can be used to analyse and compare
programs with respect to different execution models. We demonstrate its use in
checking the consistency of semantics by applying it to a set of representative
programs, and highlighting a deadlock-related discrepancy between the principal
execution models of the language. Our workbench is based on a modular and
parameterisable graph transformation semantics implemented in the GROOVE tool.
We discuss how graph transformations are leveraged to atomically model
intricate language abstractions, and how the visual yet algebraic nature of the
model can be used to ascertain soundness.Comment: Accepted for publication in the proceedings of FASE 2016 (to appear
Preemptive Thread Block Scheduling with Online Structural Runtime Prediction for Concurrent GPGPU Kernels
Recent NVIDIA Graphics Processing Units (GPUs) can execute multiple kernels
concurrently. On these GPUs, the thread block scheduler (TBS) uses the FIFO
policy to schedule their thread blocks. We show that FIFO leaves performance to
chance, resulting in significant loss of performance and fairness. To improve
performance and fairness, we propose use of the preemptive Shortest Remaining
Time First (SRTF) policy instead. Although SRTF requires an estimate of runtime
of GPU kernels, we show that such an estimate of the runtime can be easily
obtained using online profiling and exploiting a simple observation on GPU
kernels' grid structure. Specifically, we propose a novel Structural Runtime
Predictor. Using a simple Staircase model of GPU kernel execution, we show that
the runtime of a kernel can be predicted by profiling only the first few thread
blocks. We evaluate an online predictor based on this model on benchmarks from
ERCBench, and find that it can estimate the actual runtime reasonably well
after the execution of only a single thread block. Next, we design a thread
block scheduler that is both concurrent kernel-aware and uses this predictor.
We implement the SRTF policy and evaluate it on two-program workloads from
ERCBench. SRTF improves STP by 1.18x and ANTT by 2.25x over FIFO. When compared
to MPMax, a state-of-the-art resource allocation policy for concurrent kernels,
SRTF improves STP by 1.16x and ANTT by 1.3x. To improve fairness, we also
propose SRTF/Adaptive which controls resource usage of concurrently executing
kernels to maximize fairness. SRTF/Adaptive improves STP by 1.12x, ANTT by
2.23x and Fairness by 2.95x compared to FIFO. Overall, our implementation of
SRTF achieves system throughput to within 12.64% of Shortest Job First (SJF, an
oracle optimal scheduling policy), bridging 49% of the gap between FIFO and
SJF.Comment: 14 pages, full pre-review version of PACT 2014 poste
Fault Localization in Multi-Threaded C Programs using Bounded Model Checking (extended version)
Software debugging is a very time-consuming process, which is even worse for
multi-threaded programs, due to the non-deterministic behavior of
thread-scheduling algorithms. However, the debugging time may be greatly
reduced, if automatic methods are used for localizing faults. In this study, a
new method for fault localization, in multi-threaded C programs, is proposed.
It transforms a multi-threaded program into a corresponding sequential one and
then uses a fault-diagnosis method suitable for this type of program, in order
to localize faults. The code transformation is implemented with rules and
context switch information from counterexamples, which are typically generated
by bounded model checkers. Experimental results show that the proposed method
is effective, in such a way that sequential fault-localization methods can be
extended to multi-threaded programs.Comment: extended version of paper published at SBESC'1
Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture
Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of
terascale integration. Among emerging killer applications, parallel graph
processing has been a critical technique to analyze connected data. In this
paper, we empirically evaluate various computing platforms including an Intel
Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor
codenamed Knights Landing (KNL) in the domain of parallel graph processing. We
show that the KNL gains encouraging performance when processing graphs, so that
it can become a promising solution to accelerating multi-threaded graph
applications. We further characterize the impact of KNL architectural
enhancements on the performance of a state-of-the art graph framework.We have
four key observations: 1 Different graph applications require distinctive
numbers of threads to reach the peak performance. For the same application,
various datasets need even different numbers of threads to achieve the best
performance. 2 Only a few graph applications benefit from the high bandwidth
MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units
executing AVX512 SIMD instructions on KNLs are underutilized when running the
state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering
the lowest local memory access latency hurts the performance of graph
benchmarks that are lack of NUMA awareness. At last, We suggest future works
including system auto-tuning tools and graph framework optimizations to fully
exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance
Characterization of Multi-threaded Graph Processing Applications on
Many-Integrated-Core Architecture," 2018 IEEE International Symposium on
Performance Analysis of Systems and Software (ISPASS), Belfast, United
Kingdom, 2018, pp. 199-20
Testing abstract behavioral specifications
We present a range of testing techniques for the Abstract Behavioral Specification (ABS) language and apply them to an industrial case study. ABS is a formal modeling language for highly variable, concurrent, component-based systems. The nature of these systems makes them susceptible to the introduction of subtle bugs that are hard to detect in the presence of steady adaptation. While static analysis techniques are available for an abstract language such as ABS, testing is still indispensable and complements analytic methods. We focus on fully automated testing techniques including black-box and glass-box test generation as well as runtime assertion checking, which are shown to be effective in an industrial setting
- …