5 research outputs found

    Etude de l'adéquation des machines Exascale pour les algorithmes implémentant la méthode du Reverse Time Migation

    Get PDF
    As we are expecting Exascale systems for the 2018-2020 time frame, performance analysis and characterization of applications for new processor architectures and large scale systems are important tasks that permit to anticipate the required changes to efficiently exploit the future HPC systems. This thesis focuses on seismic imaging applications used for modeling complex physical phenomena, in particular the depth imaging application called Reverse Time Migration (RTM). My first contribution consists in characterizing and modeling the performance of the computational core of RTM which is based on finite-difference time-domain (FDTD) computations. I identify and explore the major tuning parameters influencing performance and the interaction between the architecture and the application. The second contribution is an analysis to identify the challenges for a hybrid and heterogeneous implementation of FDTD for manycore architectures. We target Intel’s first Xeon Phi co-processor, the Knights Corner. This architecture is an interesting proxy for our study since it contains some of the expected features of an Exascale system: concurrency and heterogeneity.My third contribution is an extension of the performance analysis and modeling to the full RTM. This adds communications and IOs to the computation part. RTM is a data intensive application and requires the storage of intermediate values of the computational field resulting in expensive IO accesses. My fourth contribution is the final measurement and model validation of my hybrid RTM implementation on a large system. This has been done on Stampede, a machine of the Texas Advanced Computing Center (TACC), which allows us to test the scalability up to 64 nodes each containing one 61-core Xeon Phi and two 8-core CPUs for a total close to 5000 heterogeneous coresLa caractérisation des applications en vue de les préparer pour les nouvelles architectures et les porter sur des systèmes très étendus est une étape importante pour pouvoir anticiper les modifications nécessaires. Comme les machines Exascale sont prévues pour la période 2018-2020, l'étude des applications et leur préparation pour ces machines s'avèrent donc essentielles. Nous nous intéressons aux applications d'imagerie sismique et en particulier à l'application Reverse Time Migration (RTM) car elle est très utilisée par les pétroliers dans le cadre de l'exploration sismique.La première partie de nos travaux a porté sur l'étude du cœur de calcul de l'application RTM qui consiste en un calcul de différences finies dans le domaine temporel (FDTD). Nous avons caractérisé cette partie de l'application en soulevant les aspects architecturaux des machines actuelles ayant un fort impact sur la performance, notamment les caches, les bandes passantes et le prefetching. Cette étude a abouti à l'élaboration d'un modèle de performance permettant de prédire le trafic DRAM des FDTD. La deuxième partie de la thèse se focalise sur l'impact de l'hétérogénéité et le parallélisme sur la FDTD et sur RTM. Nous avons choisi l'architecture manycore d’Intel, Xeon Phi, et nous avons étudié une implémentation "native" et une implémentation hétérogène et hybride, la version "symmetric". Enfin, nous avons porté l'application RTM sur un cluster hétérogène, Stampede du Texas Advanced Computing Center (TACC), où nous avons effectué des tests de scalabilité allant jusqu'à 64 nœuds contenant des coprocesseurs Xeon Phi et des processeurs Sandy Bridge ce qui correspond à presque 5000 cœur

    Preparing depth imaging applications for Exascale challenges and impacts

    No full text
    La caractérisation des applications en vue de les préparer pour les nouvelles architectures et les porter sur des systèmes très étendus est une étape importante pour pouvoir anticiper les modifications nécessaires. Comme les machines Exascale sont prévues pour la période 2018-2020, l'étude des applications et leur préparation pour ces machines s'avèrent donc essentielles. Nous nous intéressons aux applications d'imagerie sismique et en particulier à l'application Reverse Time Migration (RTM) car elle est très utilisée par les pétroliers dans le cadre de l'exploration sismique.La première partie de nos travaux a porté sur l'étude du cœur de calcul de l'application RTM qui consiste en un calcul de différences finies dans le domaine temporel (FDTD). Nous avons caractérisé cette partie de l'application en soulevant les aspects architecturaux des machines actuelles ayant un fort impact sur la performance, notamment les caches, les bandes passantes et le prefetching. Cette étude a abouti à l'élaboration d'un modèle de performance permettant de prédire le trafic DRAM des FDTD. La deuxième partie de la thèse se focalise sur l'impact de l'hétérogénéité et le parallélisme sur la FDTD et sur RTM. Nous avons choisi l'architecture manycore d’Intel, Xeon Phi, et nous avons étudié une implémentation "native" et une implémentation hétérogène et hybride, la version "symmetric". Enfin, nous avons porté l'application RTM sur un cluster hétérogène, Stampede du Texas Advanced Computing Center (TACC), où nous avons effectué des tests de scalabilité allant jusqu'à 64 nœuds contenant des coprocesseurs Xeon Phi et des processeurs Sandy Bridge ce qui correspond à presque 5000 cœursAs we are expecting Exascale systems for the 2018-2020 time frame, performance analysis and characterization of applications for new processor architectures and large scale systems are important tasks that permit to anticipate the required changes to efficiently exploit the future HPC systems. This thesis focuses on seismic imaging applications used for modeling complex physical phenomena, in particular the depth imaging application called Reverse Time Migration (RTM). My first contribution consists in characterizing and modeling the performance of the computational core of RTM which is based on finite-difference time-domain (FDTD) computations. I identify and explore the major tuning parameters influencing performance and the interaction between the architecture and the application. The second contribution is an analysis to identify the challenges for a hybrid and heterogeneous implementation of FDTD for manycore architectures. We target Intel’s first Xeon Phi co-processor, the Knights Corner. This architecture is an interesting proxy for our study since it contains some of the expected features of an Exascale system: concurrency and heterogeneity.My third contribution is an extension of the performance analysis and modeling to the full RTM. This adds communications and IOs to the computation part. RTM is a data intensive application and requires the storage of intermediate values of the computational field resulting in expensive IO accesses. My fourth contribution is the final measurement and model validation of my hybrid RTM implementation on a large system. This has been done on Stampede, a machine of the Texas Advanced Computing Center (TACC), which allows us to test the scalability up to 64 nodes each containing one 61-core Xeon Phi and two 8-core CPUs for a total close to 5000 heterogeneous core

    Adaptive Sampling for Performance Characterization of Application Kernels

    No full text
    International audienceCharacterizing performance is essential to optimize programs and architectures. The open source Adaptive Sampling Kit (ASK) measures the performance trade-off in large design spaces. Exhaustively sampling all sets of parameters is computationally intractable. Therefore, ASK concentrates exploration in the most irregular regions of the design space through multiple adaptive sampling strategies. The paper presents the ASK architecture and a set of adaptive sampling strategies, including a new approach called Hierarchical Variance Sampling. ASK's usage is demonstrated on three performance characterization problems: memory stride accesses, Jacobian stencil code, and an industrial seismic application using 3D stencils. ASK builds accurate models of performance with a small number of measures. It considerably reduces the cost of performance exploration. For instance, the Jacobian stencil code design space, which has more than 31 × 10^8 combinations of parameters, is accurately predicted using only 1500 combinations

    Adaptive SIMD optimizations in particle-in-cell codes with fine-grain particle sorting

    No full text
    International audienceParticle-In-Cell (PIC) codes are broadly applied to the kinetic simulation of plasmas, from laser–matter interaction to astrophysics. Their heavy simulation cost can be mitigated by using the Single Instruction Multiple Data (SIMD) capability,or vectorization, now available on most architectures. This article details and discusses the vectorization strategy developed in the code Smilei which takes advantage from an efficient, systematic, cell-based sorting of the particles. The PIC operators on particles (projection, push, deposition) have been optimized to benefit from large SIMD vectors on both recent and older architectures. The efficiency of these vectorized operations increases with the number of particles per cell (PPC), typically speeding up three-dimensional simulations by a factor 2 with 256 PPC. Although this implementation shows acceleration from as few as 8 PPC, it can be slower than the scalar version in domains containing fewer PPC as usually observed in vectorization attempts. This issue is overcome with an adaptive algorithm which switches locally between scalar (for few PPC) and vectorized operators (otherwise). The newly implemented methods are benchmarked on three different, large-scale simulations considering configurations frequently studied with PIC codes

    A Coumarin-Based Analogue of Thiacetazone as Dual Covalent Inhibitor and Potential Fluorescent Label of HadA in Mycobacterium tuberculosis

    No full text
    International audienceA novel coumarin-based molecule, designed as a fluorescent surrogate of a thiacetazone-derived antitubercular agent, was quickly and easily synthesized from readily available starting materials. This small molecule, coined Coum-TAC, exhibited a combination of appropriate physicochemical and biological properties, including resistance toward hydrolysis and excellent antitubercular efficiency similar to that of well-known thiacetazone derivatives, as well as efficient covalent labeling of HadA, a relevant therapeutic target to combat Mycobacterium tuberculosis. More remarkably, Coum-TAC was successfully implemented as an imaging probe that is capable of labeling Mycobacterium tuberculosis in a selective manner, with an enrichment at the level of the poles, thus giving for the first time relevant insights about the polar localization of HadA in the mycobacteria
    corecore