Search CORE

218 research outputs found

The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

Author: Baghdadi Riyadh
Bastoul Cedric
Cohen Albert
Pouchet Louis-Noel
Rauchwerger Lawrence
Publication venue
Publication date: 01/01/2010
Field of study

Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

arXiv.org e-Print Archive

HAL-CentraleSupelec

CiteSeerX

INRIA a CCSD electronic archive server

HAL-Rennes 1

Compiler analysis for trace-level speculative multithreaded architectures

Author: González Colás Antonio María
Molina Clemente Carlos
Tubella Murgadas Jordi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

Trace-level speculative multithreaded processors exploit trace-level speculation by means of two threads working cooperatively. One thread, called the speculative thread, executes instructions ahead of the other by speculating on the result of several traces. The other thread executes speculated traces and verifies the speculation made by the first thread. In this paper, we propose a static program analysis for identifying candidate traces to be speculated. This approach identifies large regions of code whose live-output values may be successfully predicted. We present several heuristics to determine the best opportunities for dynamic speculation, based on compiler analysis and program profiling information. Simulation results show that the proposed trace recognition techniques achieve on average a speed-up close to 38% for a collection of SPEC2000 benchmarks.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Mitosis based speculative multithreaded architectures

Author: Madriles Gimeno Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2012
Field of study

In the last decade, industry made a right-hand turn and shifted towards multi-core processor designs, also known as Chip-Multi-Processors (CMPs), in order to provide further performance improvements under a reasonable power budget, design complexity, and validation cost. Over the years, several processor vendors have come out with multi-core chips in their product lines and they have become mainstream, with the number of cores increasing in each processor generation. Multi-core processors improve the performance of applications by exploiting Thread Level Parallelism (TLP) while the Instruction Level Parallelism (ILP) exploited by each individual core is limited. These architectures are very efficient when multiple threads are available for execution. However, single-thread sections of code (single-thread applications and serial sections of parallel applications) pose important constraints on the benefits achieved by parallel execution, as pointed out by Amdahl’s law. Parallel programming, even with the help of recently proposed techniques like transactional memory, has proven to be a very challenging task. On the other hand, automatically partitioning applications into threads may be a straightforward task in regular applications, but becomes much harder for irregular programs, where compilers usually fail to discover sufficient TLP. In this scenario, two main directions have been followed in the research community to take benefit of multi-core platforms: Speculative Multithreading (SpMT) and Non-Speculative Clustered architectures. The former splits a sequential application into speculative threads, while the later partitions the instructions among the cores based on data-dependences but avoid large degree of speculation. Despite the large amount of research on both these approaches, the proposed techniques so far have shown marginal performance improvements. In this thesis we propose novel schemes to speed-up sequential or lightly threaded applications in multi-core processors that effectively address the main unresolved challenges of previous approaches. In particular, we propose a SpMT architecture, called Mitosis, that leverages a powerful software value prediction technique to manage inter-thread dependences, based on pre-computation slices (p-slices). Thanks to the accuracy and low cost of this technique, Mitosis is able to effectively parallelize applications even in the presence of frequent dependences among threads. We also propose a novel architecture, called Anaphase, that combines the best of SpMT schemes and clustered architectures. Anaphase effectively exploits ILP, TLP and Memory Level Parallelism (MLP), thanks to its unique finegrain thread decomposition algorithm that adapts to the available parallelism in the application

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Tesis Doctorals en Xarxa

Secretaría de Estado de Cultura

Mitosis based speculative multithreaded architectures

Author: Madriles Gimeno Carles
Publication venue: Universitat Politècnica de Catalunya
Publication date: 23/07/2012
Field of study

UPCommons. Portal del coneixement obert de la UPC

High Performance Optimizations in Runtime Speculative Parallelization for Multicore Architectures

Author: Yiapanis Paraskevas
Publication venue
Publication date: 01/08/2014
Field of study

The University of Manchester - Institutional Repository

Recommended from our members

HELIX-RC

Author: Brooks David
Brownell Kevin
Campanoni Simone
Jones Timothy M
Kanev Svilen
Wei Gu-Yeon
Publication venue: ACM SIGARCH Computer Architecture News
Publication date: 29/07/2014
Field of study

Data dependences in sequential programs limit parallelization because extracted threads cannot run independently. Although thread-level speculation can avoid the need for precise dependence analysis, communication overheads required to synchronize actual dependences counteract the benefits of parallelization. To address these challenges, we propose a lightweight architectural enhancement co-designed with a parallelizing compiler, which together can decouple communication from thread execution. Simulations of these approaches, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85x performance speedup for six SPEC CINT2000 benchmarksThis work was possible thanks to the sponsorship of the Royal Academy of Engineering, EPSRC and the National Science Foundation (award number IIS-0926148).This is the accepted manuscript. The final version is available from IEEE and ACM at http://dl.acm.org/citation.cfm?doid=2678373.2665705

Apollo (Cambridge)

Loopapalooza: Investigating Limits of Loop-Level Parallelism with a Compiler-Driven Approach

Author: Gabrielli Giacomo
Iordanou Konstantinos
Luján Mikel
Zaidi Ali
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/04/2021
Field of study

The University of Manchester - Institutional Repository

Janus: Statically-Driven and Profile-Guided Automatic Dynamic Binary Parallelisation

Author: Jones TM
Zhou R
Publication venue: CGO 2019 - Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and Optimization
Publication date: 01/01/2019
Field of study

We present Janus, a framework that addresses the challenge of automatic binary parallelisation. Janus uses same-ISA dynamic binary modification to optimise application binaries, controlled by static analysis with judicious use of software speculation and runtime checks that ensure the safety of the optimisations. A static binary analyser first examines a binary executable, to determine the loops that are amenable to parallelisation and the transformations required. These are encoded as a series of rewrite rules, the steps needed to convert a serial loop into parallel form. The Janus dynamic binary modifier reads both the original executable and rewrite rules and carries out the transformations on a per-basic-block level just-in-time before execution. Lifting static analysis out of the runtime enables the global and profile-guided views of the application; ambiguities from static binary analysis can in turn be addressed through a combination of dynamic runtime checks and speculation guard against data dependence violations. It allows us to parallelise even those loops containing dynamically discovered code. We demonstrate Janus by parallelising a range of optimised SPEC CPU 2006 benchmarks, achieving average speedups of 2.1× and 6.0× in the best case.Arm Ltd Engineering and Physical Sciences Research Council (EP/K026399/1), Engineering and Physical Sciences Research Council (EP/P020011/1

Crossref

Apollo (Cambridge)

Recommended from our members

Automatically accelerating non-numerical programs by architecture-compiler co-design

Author: Brooks D
Brownell K
Campanoni S
Jones TM
Kanev S
Wei GY
Publication venue: Communications of the ACM
Publication date: 01/01/2017
Field of study

Because of the high cost of communication between processors, compilers that parallelize loops automatically have been forced to skip a large class of loops that are both critical to performance and rich in latent parallelism. HELIX-RC is a compiler/microprocessor co-design that opens those loops to parallelization by decoupling communication from thread execution in conventional multicore architecures. Simulations of HELIX-RC, applied to a processor with 16 Intel Atom-like cores, show an average of 6.85× performance speedup for six SPEC CINT2000 benchmarks.</jats:p

Apollo (Cambridge)

Feedback Driven Annotation and Refactoring of Parallel Programs

Author: Larsen Per
Publication venue: Technical University of Denmark
Publication date: 01/01/2011
Field of study

Online Research Database In Technology