2,314 research outputs found
Thread-spawning schemes for speculative multithreading
Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version
Compiler analysis for trace-level speculative multithreaded architectures
Trace-level speculative multithreaded processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.Peer ReviewedPostprint (published version
Thread-Modular Static Analysis for Relaxed Memory Models
We propose a memory-model-aware static program analysis method for accurately
analyzing the behavior of concurrent software running on processors with weak
consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of
our method is a unified framework for deciding the feasibility of inter-thread
interferences to avoid propagating spurious data flows during static analysis
and thus boost the performance of the static analyzer. We formulate the
checking of interference feasibility as a set of Datalog rules which are both
efficiently solvable and general enough to capture a range of hardware-level
memory models. Compared to existing techniques, our method can significantly
reduce the number of bogus alarms as well as unsound proofs. We implemented the
method and evaluated it on a large set of multithreaded C programs. Our
experiments showthe method significantly outperforms state-of-the-art
techniques in terms of accuracy with only moderate run-time overhead.Comment: revised version of the ESEC/FSE 2017 pape
SmartTrack: Efficient Predictive Race Detection
Widely used data race detectors, including the state-of-the-art FastTrack
algorithm, incur performance costs that are acceptable for regular in-house
testing, but miss races detectable from the analyzed execution. Predictive
analyses detect more data races in an analyzed execution than FastTrack
detects, but at significantly higher performance cost.
This paper presents SmartTrack, an algorithm that optimizes predictive race
detection analyses, including two analyses from prior work and a new analysis
introduced in this paper. SmartTrack's algorithm incorporates two main
optimizations: (1) epoch and ownership optimizations from prior work, applied
to predictive analysis for the first time; and (2) novel conflicting critical
section optimizations introduced by this paper. Our evaluation shows that
SmartTrack achieves performance competitive with FastTrack-a qualitative
improvement in the state of the art for data race detection.Comment: Extended arXiv version of PLDI 2020 paper (adds Appendices A-E) #228
SmartTrack: Efficient Predictive Race Detectio
Dead code elimination based pointer analysis for multithreaded programs
This paper presents a new approach for optimizing multitheaded programs with
pointer constructs. The approach has applications in the area of certified code
(proof-carrying code) where a justification or a proof for the correctness of
each optimization is required. The optimization meant here is that of dead code
elimination.
Towards optimizing multithreaded programs the paper presents a new
operational semantics for parallel constructs like join-fork constructs,
parallel loops, and conditionally spawned threads. The paper also presents a
novel type system for flow-sensitive pointer analysis of multithreaded
programs. This type system is extended to obtain a new type system for
live-variables analysis of multithreaded programs. The live-variables type
system is extended to build the third novel type system, proposed in this
paper, which carries the optimization of dead code elimination. The
justification mentioned above takes the form of type derivation in our
approach.Comment: 19 page
A Graph Grammar to Transform a Dataflow Graph into a Multithread Graph and its Application in Task Scheduling
The scheduling of tasks in a parallel program is an NP-complete problem, where scheduling tasks over multiple processing units requires an effective strategy to maximize the exploitation of the parallel hardware. Several studies focus on the scheduling of parallel programs described into DAGs (Directed Acyclic Graphs). However, this representation does not describe a multithreaded program suitably. This paper shows the structure and semantics of a DCG, an abstraction which describes a multithreaded program, and proposes standards to map structures found in DAGs into segments of a DCG. A graph grammar has been developed to perform the proposed transformation and case studies using DAGs found in the literature validate the transformation process. Besides the automatic translation and precise definition of the mapping, the use of a formal language also allowed the verification of the existence and uniqueness of the out coming model.
- …