Search CORE

3 research outputs found

Language Support for Programming High-Performance Code

Author: Leißa Roland
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2017
Field of study

Nowadays, the computing landscape is becoming increasingly heterogeneous and this trend is currently showing no signs of turning around. In particular, hardware becomes more and more specialized and exhibits different forms of parallelism. For performance-critical codes it is indispensable to address hardware-specific peculiarities. Because of the halting problem, however, it is unrealistic to assume that a program implemented in a general-purpose programming language can be fully automatically compiled to such specialized hardware while still delivering peak performance. One form of parallelism is single instruction, multiple data (SIMD). Part I of this thesis presents Sierra: an extension for C ++ that facilitates portable and effective SIMD programming. Part II discusses AnyDSL. This framework allows to embed a so-called domain-specific language (DSL) into a host language. On the one hand, a DSL offers the application developer a convenient interface; on the other hand, a DSL can perform domain-specific optimizations and effectively map DSL constructs to various architectures. In order to implement a DSL, one usually has to write or modify a compiler. With AnyDSL though, the DSL constructs are directly implemented in the host language while a partial evaluator removes any abstractions that are required in the implementation of the DSL.Die Rechnerlandschaft wird heutzutage immer heterogener und derzeit ist keine Trendwende in Sicht. Insbesondere wird die Hardware immer spezialisierter und weist verschiedene Formen der Parallelität auf. Für performante Programme ist es unabdingbar, hardwarespezifische Eigenheiten zu adressieren. Wegen des Halteproblems ist es allerdings unrealistisch anzunehmen, dass ein Programm, das in einer universell einsetzbaren Programmiersprache implementiert ist, vollautomatisch auf solche spezialisierte Hardware übersetzt werden kann und dabei noch Spitzenleistung erzielt. Eine Form der Parallelität ist „single instruction, multiple data (SIMD)“. Teil I dieser Arbeit stellt Sierra vor: eine Erweiterung für C++, die portable und effektive SIMD-Programmierung unterstützt. Teil II behandelt AnyDSL. Dieses Rahmenwerk ermöglicht es, eine sogenannte domänenspezifische Sprache (DSL) in eine Gastsprache einzubetten. Auf der einen Seite bietet eine DSL dem Anwendungsentwickler eine komfortable Schnittstelle; auf der anderen Seiten kann eine DSL domänenspezifische Optimierungen durchführen und DSL-Konstrukte effektiv auf verschiedene Architekturen abbilden. Um eine DSL zu implementieren, muss man gewöhnlich einen Compiler schreiben oder modifizieren. In AnyDSL werden die DSL-Konstrukte jedoch direkt in der Gastsprache implementiert und ein partieller Auswerter entfernt jegliche Abstraktionen, die in der Implementierung der DSL benötigt werden

Acronym

Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure (revised version)

Author: Bosilca George
Dongarra Jack,
Harrison Robert,
Herault Thomas
Lewis Cannada,
Robert Yves
Valeev Edward,
Publication venue: HAL CCSD
Publication date: 01/10/2020
Field of study

Many domains of scientific simulation (chemistry, condensed matter physics, data science) increasingly eschew dense tensors for block-sparse tensors, sometimes with additional structure (recursive hierarchy, rank sparsity, etc.). Distributed-memory parallel computation with block-sparse tensorial data is paramount to minimize the time-tosolution (e.g., to study dynamical problems or for real-time analysis) and to accommodateproblems of realistic size that are too large to fit into the host/device memory of a single node equipped with accelerators. Unfortunately, computation with such irregular data structures is a poor match to the dominant imperative, bulk-synchronous parallel programming model. In this paper, we focus on the critical element of block-sparse tensor algebra, namely binary tensor contraction, and report on an efficient and scalable implementation using the task-focused PaRSEC runtime. High performance of the block-sparse tensor contraction on the Summit supercomputer is demonstrated for synthetic data as well as for real data involved in electronic structure simulations of unprecedented size.Les tenseurs creux par blocs (block-sparse) sont présents dans de nombreux domaines scientifiques. Ce rapport étudie la parallélisation d’un noyau de contraction essentiel pour la manipulation de tels tenseurs, qui peut se matérialiser sous forme d’un produit de matrices C ← C + AB, où les trois matrices ont une structure creuse par blocs, où les tuiles de A et B sont de tailles hétérogènes, et où B est carrée de taille n, alors que A et C sont rectangulaires de taille m × n avec m << n. Nous proposons une implémentation sur la plate-forme Summit à mémoire distribuée, où chaque nœud est équipé de plusieurs GPUs, au sein de l’environnement de tâches PaRSEC. Nous obtenons de bonnes performances pour des problèmes de taille inégalées à ce jour

INRIA a CCSD electronic archive server

Modeling for inversion in exploration geophysics

Author: Louboutin Mathias
Publication venue: Georgia Institute of Technology
Publication date: 20/05/2020
Field of study

Seismic inversion, and more generally geophysical exploration, aims at better understanding the earth's subsurface, which is one of today's most important challenges. Firstly, it contains natural resources that are critical to our technologies such as water, minerals and oil and gas. Secondly, monitoring the subsurface in the context of CO2 sequestration, earthquake detection and global seismology are of major interests with regard to safety and the environment hazards. However, the technologies to monitor the subsurface or find resources are scientifically extremely challenging. Seismic inversion can be formulated as a mathematical optimization problem that minimizes the difference between field recorded data and numerically modeled synthetic data. The process of solving this optimization problem then requires to numerically model, thousands of times, wave-propagation in large three-dimensional representations of part of the earth subsurface. The mathematical and computational complexity of this problem, therefore, calls for software design that abstracts these requirements and facilitates algorithm and software development. My thesis addresses some of the challenges that arise from these problems; mainly the computational cost and access to the right software for research and development. In the first part, I will discuss a performance metric that improves the current runtime-only benchmarks in exploration geophysics. This metric, the roofline model, first provides insight at the hardware level of the performance of a given implementation relative to the maximum achievable performance. Second, this study demonstrates that the choice of numerical discretization has a major impact on the achievable performance depending on the hardware at hand and shows that a flexible framework with respect to the discretization parameters is necessary. In the second part, I will introduce and describe Devito, a symbolic finite-difference DSL that provides a high-level interface to the definition of partial differential equations (PDE) such as the wave equation. Devito, from the symbolic definition of PDEs, then generates and compiles highly optimized C code on-the-fly to compute the solution of the PDE. The combination of the high-level abstractions and the just-in-time compiler enable research for geophysical exploration and PDE-constrainted optimization based on the paradigm of separation of concerns. This allows researchers to concentrate on their respective field of study while having access to computationally performant solvers with a flexible and easy to use interface to successfully implement complex representations of the physics. The second part of my thesis will be split into two sub-parts; first describing the symbolic application programming interface (API), before describing and benchmarking the just-in-time compiler. I will end my thesis with concluding remarks, the latest developments and a brief description of projects that were enabled by Devito.Ph.D

Scholarly Materials And Research @ Georgia Tech