826 research outputs found

    Compiler-Driven Software Speculation for Thread-Level Parallelism

    Get PDF
    Current parallelizing compilers can tackle applications exercising regular access patterns on arrays or affine indices, where data dependencies can be expressed in a linear form. Unfortunately, there are cases that independence between statements of code cannot be guaranteed and thus the compiler conservatively produces sequential code. Programs that involve extensive pointer use, irregular access patterns, and loops with unknown number of iterations are examples of such cases. This limits the extraction of parallelism in cases where dependencies are rarely or never triggered at runtime. Speculative parallelism refers to methods employed during program execution that aim to produce a valid parallel execution schedule for programs immune to static parallelization. The motivation for this article is to review recent developments in the area of compiler-driven software speculation for thread-level parallelism and how they came about. The article is divided into two parts. In the first part the fundamentals of speculative parallelization for thread-level parallelism are explained along with a design choice categorization for implementing such systems. Design choices include the ways speculative data is handled, how data dependence violations are detected and resolved, how the correct data are made visible to other threads, or how speculative threads are scheduled. The second part is structured around those design choices providing the advances and trends in the literature with reference to key developments in the area. Although the focus of the article is in software speculative parallelization, a section is dedicated for providing the interested reader with pointers and references for exploring similar topics such as hardware thread-level speculation, transactional memory, and automatic parallelization

    Quantifying the benefits of SPECint distant parallelism in simultaneous multithreading architectures

    Get PDF
    We exploit the existence of distant parallelism that future compilers could detect and characterise its performance under simultaneous multithreading architectures. By distant parallelism we mean parallelism that cannot be captured by the processor instruction window and that can produce threads suitable for parallel execution in a multithreaded processor. We show that distant parallelism can make feasible wider issue processors by providing more instructions from the distant threads, thus better exploiting the resources from the processor in the case of speeding up single integer applications. We also investigate the necessity of out-of-order processors in the presence of multiple threads of the same program. It is important to notice at this point that the benefits described are totally orthogonal to any other architectural techniques targeting a single thread.Peer ReviewedPostprint (published version

    Thread-spawning schemes for speculative multithreading

    Get PDF
    Speculative multithreading has been recently proposed to boost performance by means of exploiting thread-level parallelism in applications difficult to parallelize. The performance of these processors heavily depends on the partitioning policy used to split the program into threads. Previous work uses heuristics to spawn speculative threads based on easily-detectable program constructs such as loops or subroutines. In this work we propose a profile-based mechanism to divide programs into threads by searching for those parts of the code that have certain features that could benefit from potential thread-level parallelism. Our profile-based spawning scheme is evaluated on a Clustered Speculative Multithreaded Processor and results show large performance benefits. When the proposed spawning scheme is compared with traditional heuristics, we outperform them by almost 20%. When a realistic value predictor and a 8-cycle thread initialization penalty is considered, the performance difference between them is maintained. The speed-up over a single thread execution is higher than 5x for a 16-thread-unit processor and close to 2x for a 4-thread-unit processor.Peer ReviewedPostprint (published version

    Factoring out ordered sections to expose thread-level parallelism

    Get PDF
    With the rise of multi-core processors, researchers are taking a new look at extending the applicability auto-parallelization techniques. In this paper, we identify a dependence pattern on which autoparallelization currently fails. This dependence pattern occurs for ordered sections, i.e. code fragments in a loop that must be executed atomically and in original program order. We discuss why these ordered sections prohibit current auto-parallelizers from working and we present a technique to deal with them. We experimentally demonstrate the efficacy of the technique, yielding significant overall program speedups

    A Survey on Thread-Level Speculation Techniques

    Get PDF
    Producción CientíficaThread-Level Speculation (TLS) is a promising technique that allows the parallel execution of sequential code without relying on a prior, compile-time-dependence analysis. In this work, we introduce the technique, present a taxonomy of TLS solutions, and summarize and put into perspective the most relevant advances in this field.MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS)

    An integrated soft- and hard-programmable multithreaded architecture

    Get PDF

    Object oriented execution model (OOM)

    Get PDF
    This paper considers implementing the Object Oriented Programming Model directly in the hardware to serve as a base to exploit object-level parallelism, speculation and heterogeneous computing. Towards this goal, we present a new execution model called Object Oriented execution Model - OOM - that implements the OO Programming Models. All OOM hardware structures are objects and the OOM Instruction Set directly utilizes objects while hiding other complex hardware structures. OOM maintains all high-level programming language information until execution time. This enables efficient extraction of available parallelism in OO serial code at execution time with minimal compiler support. Our results show that OOM utilizes the available parallelism better than the OoO (Out-of-Order) modelPeer ReviewedPostprint (published version
    corecore