31 research outputs found

    The Potential of Synergistic Static, Dynamic and Speculative Loop Nest Optimizations for Automatic Parallelization

    Get PDF
    Research in automatic parallelization of loop-centric programs started with static analysis, then broadened its arsenal to include dynamic inspection-execution and speculative execution, the best results involving hybrid static-dynamic schemes. Beyond the detection of parallelism in a sequential program, scalable parallelization on many-core processors involves hard and interesting parallelism adaptation and mapping challenges. These challenges include tailoring data locality to the memory hierarchy, structuring independent tasks hierarchically to exploit multiple levels of parallelism, tuning the synchronization grain, balancing the execution load, decoupling the execution into thread-level pipelines, and leveraging heterogeneous hardware with specialized accelerators. The polyhedral framework allows to model, construct and apply very complex loop nest transformations addressing most of the parallelism adaptation and mapping challenges. But apart from hardware-specific, back-end oriented transformations (if-conversion, trace scheduling, value prediction), loop nest optimization has essentially ignored dynamic and speculative techniques. Research in polyhedral compilation recently reached a significant milestone towards the support of dynamic, data-dependent control flow. This opens a large avenue for blending dynamic analyses and speculative techniques with advanced loop nest optimizations. Selecting real-world examples from SPEC benchmarks and numerical kernels, we make a case for the design of synergistic static, dynamic and speculative loop transformation techniques. We also sketch the embedding of dynamic information, including speculative assumptions, in the heart of affine transformation search spaces

    Reducing the Impact of Uplift Pressures on the Base of a Concrete Dam by Configuration of Drainage Holes (Hypothetical Case Study)

    Get PDF
    A study of the impact of the uplift pressures upon base of a typical water retaining structure was presented. This work was conducted through numerical analysis by finite element method to evaluate the hydraulic uplift pressure distribution generated by the calculated flow. Also, the effects of position of either single or dual drainage holes on uplift pressures characteristic were included. A hypothetical case was solved for three types of drainage holes, and the reduction of uplift pressures were computed in each case, a comparison was presented and the results showed that it can produce the desired reduction in hydraulic uplift pressures by using two drainage holes at equidistance of 8m from upstream edge of structure floor

    Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

    Full text link
    This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

    PENCIL: Towards a Platform-Neutral Compute Intermediate Language for DSLs

    Full text link
    We motivate the design and implementation of a platform-neutral compute intermediate language (PENCIL) for productive and performance-portable accelerator programming

    Variational study of two-nucleon systems with lattice QCD

    Full text link
    The low-energy spectrum and scattering of two-nucleon systems are studied with lattice quantum chromodynamics using a variational approach. A wide range of interpolating operators are used: dibaryon operators built from products of plane-wave nucleons, hexaquark operators built from six localized quarks, and quasilocal operators inspired by two-nucleon bound-state wave functions in low-energy effective theories. Sparsening techniques are used to compute the timeslice-to-all quark propagators required to form correlation-function matrices using products of these operators. Projection of these matrices onto irreducible representations of the cubic group, including spin-orbit coupling, is detailed. Variational methods are applied to constrain the low-energy spectra of two-nucleon systems in a single finite volume with quark masses corresponding to a pion mass of 806 MeV. Results for S- and D-wave phase shifts in the isospin singlet and triplet channels are obtained under the assumption that partial-wave mixing is negligible. Tests of interpolating-operator dependence are used to investigate the reliability of the energy spectra obtained and highlight both the strengths and weaknesses of variational methods. These studies and comparisons to previous studies using the same gauge-field ensemble demonstrate that interpolating-operator dependence can lead to significant effects on the two-nucleon energy spectra obtained using both variational and nonvariational methods, including missing energy levels and other discrepancies. While this study is inconclusive regarding the presence of two-nucleon bound states at this quark mass, it provides robust upper bounds on two-nucleon energy levels that can be improved in future calculations using additional interpolating operators and is therefore a step toward reliable nuclear spectroscopy from the underlying Standard Model of particle physics

    Pencil A Platform-Neutral Compute Intermediate Language for DSL Compilers

    Get PDF
    International audienceProgramming accelerators such as GPUs with low-level APIs and languages like OpenCL and CUDA is difficult, error prone, and not performance-portable. Automatic parallelization and domain specific languages (DSLs) have been proposed to hide this complexity and to regain some performance portability. In this presentation, I will present PENCIL (Platform-Neutral Compute Intermediate Language) and present some details about how it is compiled. PENCIL is a rigorously defined subset of GNU C99 with specific programming rules and few extensions. Adherence to this subset and the use of these extensions enable compilers to exploit parallelism and to better optimize code when targeting accelerators. We intend PENCIL both as a portable language to facilitate accelerator programming, and as an intermediate language for DSL compilers. We validate the potential of PENCIL on a state-of-the-art polyhedral compiler, extending the applicability of the compiler to dynamic, data-dependent control flow and non-affine array accesses

    Amélioration du tuilage, réduction du temps de compilation, et extension de l'utilisabilité de la compilation polyédrique

    No full text
    Multi-core processors are now in widespread use in almost all areas of computing: desktops, laptops and accelerators such as GPGPUs (General Purpose Graphics Processing Units). To harness the power of multi-core processors and complex memory hierarchies, the need for powerful compiler optimizations and especially loop nest transformations is now in high demand. The polyhedral optimization framework is showing promising results in addressing such a problem. It's an algebraic program representation and a set of analyses, transformations and code generation algorithms that enable a compiler to reason about advanced loop nest transformations addressing most of the parallelism and locality-enhancing challenges.In this thesis we address some of the limitations of the polyhedral framework. We address three problems and propose practical solutions to these three problems.The first problem is related to the ability to apply tiling on code that has false dependences (loop nest tiling is an optimization that changes the order of execution of statements in a loop nest in order to enhance data locality; false dependences are induced by the reuse of a single memory location to store multiple values during the life of the program). To preserve the validity of loop nest transformations and parallelization, data-dependences need to be analyzed. Memory dependences come in two varieties: true dependences (a.k.a. flow dependences) and false dependences (a.k.a. output and anti dependences). While true dependences must be satisfied in order to preserve the correct order of computations. False dependences reduce the degrees of freedom for loop transformations. In particular, loop tiling is severely limited in the presence of these dependences. While array expansion, a transformation that transforms scalars into arrays and arrays into higher dimensional arrays, removes all false dependences, the overhead of this transformation on memory and the detrimental impact on register-level reuse can be catastrophic. We propose and evaluate a compilation technique to safely ignore a large number of false dependences in order to enable loop nest tiling in the polyhedral model. It is based on the precise characterization of interferences between live range intervals, and it does not incur any scalar or array expansion.The second problem is related to the long compilation time that one may experience when using polyhedral tools to optimize a program. Particularly, the long execution time of the Pluto affine scheduling algorithm. The Pluto affine scheduling algorithm is the algorithm that is responsible for changing the schedule (order of execution) of statements in order to optimize the code (maximize parallelism and data locality). Reducing the execution time of this affine scheduling algorithm enhances the overall compilation time. We introduce and evaluate a technique called offline statement clustering. It is a practical technique designed to reduce the execution time of the Pluto affine scheduling algorithm without much loss in optimization opportunities. Using this technique, the statements of the program are clustered into macro-statements, the Pluto affine scheduling algorithm is then used to schedule the macro-statements instead of scheduling the original statements of the program. Since the number of macro-statements is less than the number of statements in the original program, scheduling the macro-statements is in general faster than scheduling the original statements of the program. We present the statement clustering algorithm, we show how offline statement clustering integrates transparently with the work-flow of a state-of-the-art polyhedral compiler and present two heuristics for choosing how statements should be clustered together. We show experimentally that statement clustering can reduce the scheduling time by a factor of 8x (in median) without a significant loss in optimization opportunities...Les processeurs multi-coeurs sont maintenant largement utilisés presque partout en informatique: ordinateurs de bureau, ordinateurs portables et accélérateurs tels que les GPGPU (General Purpose Graphics Processing Units). La difficulté de la programmation des systèmes parallèles est considérée comme un problème majeur qui va empêcher l'exploitation de leurs capacités dans le futur. Pour exploiter la puissance des processeurs multi-coeurs et les hiérarchies complexes de mémoire, il y a une grande nécessité pour utiliser des outils de parallélisation et d'optimisation automatique de code. L'optimisation polyédrique est un axe de recherche qui a comme but de résoudre ces problèmes. C'est est une représentation algébrique du programme et un ensemble d'analyses, de transformations et d'algorithmes de génération de code qui permettent à un compilateur de raisonner sur des transformations avancées de nids de boucle. Dans cette thèse, nous abordons certaines des limites du modèle polyédrique. Nous nous intéréssons particulièrement à trois problèmes et nous proposons des solutions pratiques à ces trois problèmes. Le premier problème est lié à la capacité d'appliquer l'optimisation de tuilage sur un code qui contient des fausses dépendances. Nous proposons une téchnique qui permet d'ignorer certaines fausses dépendences et donc qui permet d'appliquer l'optimisation de tuilage qui n'est pas possible sinon. Le second problème est lié au temps de compilation qui peut être trés long pour certains programmes. Nous proposons une téchnique qui transforme la représentation originale du programme à une nouvelle representation dans laquelle il y a moins d'instructions. L'optimisation de cette nouvelle représentation du programme est moins couteuse en terme de temps de compilation en comparaison avec l'optimisation de la représentation originale du programme. Le troisième problème est lié à deux limites: la première limite concerne la possibilité d'utiliser la compilation polyédrique sur des programmes qui ne resepectent pas les restrictions classiques du modèle polyédrique (un programme peut être représenté de façon précise dans le modèle polyédrique s'il ne contient pas des conditionnelles non-affines, des bornes de boucles non-affines et des accés non-affines). La seconde limite est liée à l'aptitude des outils à générer un code performant dans les performances se rapprochent des performances du code écrit à la main. Pour éviter ces deux limites, nous proposons un language de programmation que l'on appelle PENCIL, c'est un sous-ensemble de GNU C99 avec des règles de programmation spécifiques et quelques extensions. L'utilisation de ce sous-ensemble et l'utilisation de ces extensions permettent aux compilateurs de mieux exploiter le parallélisme et de mieux optimiser le code

    Improved Loop Tiling based on the Removal of Spurious False Dependences

    No full text
    Selected for presentation at the HiPEAC 2013 Conf.International audienceTo preserve the validity of loop nest transformations and parallelization, data-dependences need to be analyzed. Memory dependences come in two varieties: true dependences or false dependences. While true dependences must be satisfied in order to preserve the correct order of computations, false dependences are induced by the reuse of a single memory location to store multiple values. False dependences reduce the degrees of freedom for loop transformations. In particular, loop tiling is severely limited in the presence of these dependences. While array expansion removes all false dependences, the overhead on memory and the detrimental impact on register-level reuse can be catastrophic. We propose and evaluate a compilation technique to safely ignore a large number of false dependences in order to enable loop nest tiling in the polyhedral model. It is based on the precise characterization of interferences between live range intervals, and it does not incur any scalar or array expansion. Our algorithms have been implemented in the Pluto polyhedral compiler, and evaluated on the PolyBench suite
    corecore