7 research outputs found
Mitosis compiler: An infrastructure for speculative threading based on pre-computation slices
Speculative parallelization can provide significant sources of additional thread-level parallelism, especially for irregular applications that are hard to parallelize by conventional approaches. In this paper, we present the Mitosis compiler, which partitions applications into speculative threads, with special emphasis on applications for which conventional parallelizing approaches fail. The management of inter-thread data dependences is crucial for the performance of the system. The Mitosis framework uses a pure software approach to predict/compute the thread’s input values. This software approach is based on the use of pre-computation slices (p-slices), which are built by the Mitosis compiler and added at the beginning of the speculative thread. P-slices must compute thread input values accurately but they do not need to guarantee correctness, since the underlying architecture can detect and recover from misspeculations. This allows the compiler to use aggressive/unsafe optimizations to significantly reduce their overhead. The most important optimizations included in the Mitosis compiler and presented in this paper are branch pruning, memory and register dependence speculation, and early thread squashing. Performance evaluation of Mitosis compiler/architecture shows an average speedup of 2.2
Mitosis: A speculative multithreaded processor based on pre-computation slices
This paper presents the Mitosis framework, which is a combined hardware-software approach to speculative multithreading, even in the presence of frequent dependences among threads. Speculative multithreading increases single-threaded application performance by exploiting thread-level parallelism speculatively, that is, executing code in parallel, even when the compiler or runtime system cannot guarantee that the parallelism exists. The proposed approach is based on predicting/computing thread input values via software through a piece of code that is added at the beginning of each thread (the precomputation slice). A precomputation slice is expected to compute the correct thread input values most of the time but not necessarily always. This allows aggressive optimization techniques to be applied to the slice to make it very short. This paper focuses on the microarchitecture that supports this execution model. The primary novelty of the microarchitecture is the hardware support for the execution and validation of precomputation slices. Additionally, this paper presents new architectures for the register file and the cache memory in order to support multiple versions of each variable and allow for efficient rollback in case of misspeculation. We show that the proposed microarchitecture, together with the compiler support, achieves an average speedup of 2.2 for applications that conventional nonspeculative approaches are not able to parallelize at all.Peer Reviewe
Boosting single-thread performance in multi-core systems through fine-grain multi-threading
Industry has shifted towards multi-core designs as we have hit the memory and power walls. However, single thread performance
remains of paramount importance since some applications have limited thread-level parallelism (TLP), and even a small part with
limited TLP impose important constraints to the global performance, as explained by Amdahl’s law.
In this paper we propose a novel approach for leveraging multiple cores to improve single-thread performance in a multi-core
design. The proposed technique features a set of novel hardware mechanisms that support the execution of threads generated at
compile time. These threads result from a fine-grain speculative decomposition of the original application and they are executed
under a modified multi-core system that includes: (1) mechanisms to support multiple versions; (2) mechanisms to detect violations
among threads; (3) mechanisms to reconstruct the original sequential order; and (4) mechanisms to checkpoint the architectural state and recovery to handle misspeculations.
The proposed scheme outperforms previous hardware-only schemes to implement the idea of combining cores for executing
single-thread applications in a multi-core design by more than 10% on average on Spec2006 for all configurations. Moreover,
single-thread performance is improved by 41% on average when the proposed scheme is used on a Tiny Core, and up to 2.6x for some selected applications.Peer ReviewedPostprint (published version