9,584 research outputs found

    Polly's Polyhedral Scheduling in the Presence of Reductions

    Full text link
    The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21×\times on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

    The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

    Get PDF
    Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

    A Comparative Analysis of STM Approaches to Reduction Operations in Irregular Applications

    Get PDF
    As a recently consolidated paradigm for optimistic concurrency in modern multicore architectures, Transactional Memory (TM) can help to the exploitation of parallelism in irregular applications when data dependence information is not available up to run- time. This paper presents and discusses how to leverage TM to exploit parallelism in an important class of irregular applications, the class that exhibits irregular reduction patterns. In order to test and compare our techniques with other solutions, they were implemented in a software TM system called ReduxSTM, that acts as a proof of concept. Basically, ReduxSTM combines two major ideas: a sequential-equivalent ordering of transaction commits that assures the correct result, and an extension of the underlying TM privatization mechanism to reduce unnecessary overhead due to reduction memory updates as well as unnecesary aborts and rollbacks. A comparative study of STM solutions, including ReduxSTM, and other more classical approaches to the parallelization of reduction operations is presented in terms of time, memory and overhead.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Using shared-data localization to reduce the cost of inspector-execution in unified-parallel-C programs

    Get PDF
    Programs written in the Unified Parallel C (UPC) language can access any location of the entire local and remote address space via read/write operations. However, UPC programs that contain fine-grained shared accesses can exhibit performance degradation. One solution is to use the inspector-executor technique to coalesce fine-grained shared accesses to larger remote access operations. A straightforward implementation of the inspector executor transformation results in excessive instrumentation that hinders performance.; This paper addresses this issue and introduces various techniques that aim at reducing the generated instrumentation code: a shared-data localization transformation based on Constant-Stride Linear Memory Descriptors (CSLMADs) [S. Aarseth, Gravitational N-Body Simulations: Tools and Algorithms, Cambridge Monographs on Mathematical Physics, Cambridge University Press, 2003.], the inlining of data locality checks and the usage of an index vector to aggregate the data. Finally, the paper introduces a lightweight loop code motion transformation to privatize shared scalars that were propagated through the loop body.; A performance evaluation, using up to 2048 cores of a POWER 775, explores the impact of each optimization and characterizes the overheads of UPC programs. It also shows that the presented optimizations increase performance of UPC programs up to 1.8 x their UPC hand-optimized counterpart for applications with regular accesses and up to 6.3 x for applications with irregular accesses.Peer ReviewedPostprint (author's final draft

    Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

    Full text link
    This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

    The New Governance and the Tools of Public Action: An Introduction

    Get PDF
    A fundamental re-thinking is currently underway throughout the world about how to cope with public problems. Stimulated by popular frustrations with the cost and effectiveness of government programs and by a new-found faith in liberal economic theories, serious questions are being raised about the capabilities, and even the motivations, of public-sector institutions. This type of questioning has spread beyond American political discourse to other parts of the world. Underlying much of this reform surge is a set of theories that portrays government agencies as tightly structured hierarchies insulated from market forces and from effective citizen pressure and therefore free to serve the personal and institutional interests of bureaucrats instead. The heart of this revolution has been a fundamental transformation not just in the scope and scale of government action, but in its basic forms. Instead of relying exclusively on government to solve public problems, a host of other actors is being mobilized as well, sometimes are their own initiative but often in complex partnerships with the state

    A Sharp Turn toward the Market: Economic Reform in Russia (1992–1998) and Its Consequences

    Full text link
    By analyzing and systematizing the literature accumulated over the past twenty years on the history of reforms, we can put in order the existing views on the processes that took place during these transformations and de ne a new vector in understanding the socio-economic development of Russia in the last decade of the 20th century and the rst decades of the 21st century. The rst step in this direction is the analysis of publications that re ect the preparation, progress and results of the contemporary economic reforms in the 1990s. The historiographic review includes the monographs written both by the advocates of the shock therapy, and their opponents and critics, rst of all, Members of the Russian Academy of Sciences. The study of this literature allows to reveal the spectrum of opinions on whether the shock therapy was the preferred version of transformations, on assessing the results of reforms by the end of the 1990s and the opportunities for alternative ways to make the transition from a planned to a market economy. In particular, the advocates of the «shock therapy» refer to the threat of famine and civil war to justify decisions that led to decline in output, hyperin ation and other negative trends. Their critics point out that the lack of public support caused the market reforms to fail. By acknowledging the obvious, i. e. a signi cant deterioration of economic indicators, the advocates see their success in establishing the system of market institutions, and, on this basis, insist there was no alternative to implemented version of reforms. In turn, their opponents believe that the alternatives to the «shock therapy» existed, and their distinctive feature would have been the gradual cultivation and not the forced administrative introduction of market economy institutions.This article has been prepared with the support in the form of a Grant No. 16–02–00016a from the Russian Foundation for Humanities

    The 2008 Succession: Fact or Fiction?

    Full text link
    corecore