6 research outputs found

    Polly's Polyhedral Scheduling in the Presence of Reductions

    Full text link
    The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason for polyhedral optimizers to assume parallelism prohibiting dependences. To our knowledge, no polyhedral loop optimizer available in any production compiler provides support for reductions. In this paper, we show that leveraging the parallelism of reductions can lead to a significant performance increase. We give a precise, dependence based, definition of reductions and discuss ways to extend polyhedral optimization to exploit the associativity and commutativity of reduction computations. We have implemented a reduction-enabled scheduling approach in the Polly polyhedral optimizer and evaluate it on the standard Polybench 3.2 benchmark suite. We were able to detect and model all 52 arithmetic reductions and achieve speedups up to 2.21×\times on a quad core machine by exploiting the multidimensional reduction in the BiCG benchmark.Comment: Presented at the IMPACT15 worksho

    Reduction Drawing: Language Constructs and Polyhedral Compilation for Reductions on GPUs

    Get PDF
    International audienceReductions are common in scientific and data-crunching codes, and a typical source of bottlenecks on massively parallel architectures such as GPUs. Reductions are memory-bound, and achieving peak performance involves sophisticated optimizations. There exist libraries such as CUB and Thrust providing highly tuned implementations of reductions on GPUs. However, library APIs are not flexible enough to express user-defined reductions on arbitrary data types and array indexing schemes. Languages such as OpenACC provide declarative syntax to express reductions. Such approaches support a limited range of reduction operators and do not facilitate the application of complex program transformations in presence of reductions. We present language constructs that let a programmer express arbitrary reductions on user-defined data types matching the performance of tuned library implementations. We also extend a polyhedral compilation flow to process these user-defined reductions, enabling optimizations such as the fusion of multiple reductions, combining reductions with other loop transformations, and optimizing data transfers and storage in the presence of reductions. We implemented these language constructs and compilation methods in the PPCG framework and conducted experiments on multiple GPU targets. For single reductions the generated code performs on par with highly tuned libraries, and for multiple reductions it significantly outperforms both libraries and OpenACC on all platforms

    XARK: an extensible framework for automatic recognition of computational kernels

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in ACM Transactions on Programming Languages and Systems. The final authenticated version is available online at: http://dx.doi.org/10.1145/1391956.1391959[Abstract] The recognition of program constructs that are frequently used by software developers is a powerful mechanism for optimizing and parallelizing compilers to improve the performance of the object code. The development of techniques for automatic recognition of computational kernels such as inductions, reductions and array recurrences has been an intensive research area in the scope of compiler technology during the 90's. This article presents a new compiler framework that, unlike previous techniques that focus on specific and isolated kernels, recognizes a comprehensive collection of computational kernels that appear frequently in full-scale real applications. The XARK compiler operates on top of the Gated Single Assignment (GSA) form of a high-level intermediate representation (IR) of the source code. Recognition is carried out through a demand-driven analysis of this high-level IR at two different levels. First, the dependences between the statements that compose the strongly connected components (SCCs) of the data-dependence graph of the GSA form are analyzed. As a result of this intra-SCC analysis, the computational kernels corresponding to the execution of the statements of the SCCs are recognized. Second, the dependences between statements of different SCCs are examined in order to recognize more complex kernels that result from combining simpler kernels in the same code. Overall, the XARK compiler builds a hierarchical representation of the source code as kernels and dependence relationships between those kernels. This article describes in detail the collection of computational kernels recognized by the XARK compiler. Besides, the internals of the recognition algorithms are presented. The design of the algorithms enables to extend the recognition capabilities of XARK to cope with new kernels, and provides an advanced symbolic analysis framework to run other compiler techniques on demand. Finally, extensive experiments showing the effectiveness of XARK for a collection of benchmarks from different application domains are presented. In particular, the SparsKit-II library for the manipulation of sparse matrices, the Perfect benchmarks, the SPEC CPU2000 collection and the PLTMG package for solving elliptic partial differential equations are analyzed in detail.Ministeiro de Educación y Ciencia; TIN2004-07797-C02Ministeiro de Educación y Ciencia; TIN2007-67537-C03Xunta de Galicia; PGIDIT05PXIC10504PNXunta de Galicia; PGIDIT06PXIB105228P

    RE-LANG---A Parallel-by-default Programming Language

    Get PDF
    In recent years, programming language features such as lightweight threads have gained popularity in the software development workflow. Our research takes a critical look at these recent trends, rethinking them through an academic lens. We propose a construct called "smart assignment," supported by rewriting semantics, which enables a novel parallel-by-default programming paradigm. We present a new programming language—RE-LANG—that implements this feature. Specifically, we demonstrate how the design philosophy of RE-LANG makes imperative, parallel programming more developer-friendly. We discuss the implementation of the language and showcase performance benchmarks, as well as overhead analysis, to demonstrate its efficiency.Doctor of Philosoph
    corecore