Search CORE

5 research outputs found

Comparison of Staggered Grid Finite Difference Schemes for Ultrasound Simulation in Curving Composites

Author: Frankforter Erik
Leckey Cara
Schneck William C.
Publication venue
Publication date
Field of study

The optimization of ultrasonic nondestructive evaluation (NDE) simulation tools for composites has the potential to reduce both individual part inspection time and overall certification time for composite parts and structures. Inspection guidance based on simulation provides increased confidence in the veracity of inspection results in addition to time reductions. This paper outlines ongoing work targeted to advance this objective through the use of finite difference (FD) simulation techniques formulated for composite structures with realistic geometries. Two staggered grid explicit FD schemes which show promise for this purpose are assessed: the Lebedev FD scheme and the rotated staggered grid (RSG) FD scheme. Algorithmic points which provide challenges for complex geometries are addressed, in particular handling of traction free surfaces and bi-material interfaces present at lamina boundaries. Code execution time estimates are performed as well to guide feasible domain sizes relative to algorithm choice and available hardware. Three test cases are simulated: a delaminated plate, a cylinder, and a triclinic lamina. These tests demonstrate that the Lebedev FD scheme needs additional work to handle inter-laminar interfaces and traction free boundaries in the presence of stair-stepping approximations. In contrast, the simple structure of the RSG unit cell makes it more straightforward to construct a 3D simulation technique for curved composite laminates

NASA Technical Reports Server

Etude de l'adéquation des machines Exascale pour les algorithmes implémentant la méthode du Reverse Time Migation

Author: Farjallah Asma
Publication venue: HAL CCSD
Publication date: 16/12/2014
Field of study

As we are expecting Exascale systems for the 2018-2020 time frame, performance analysis and characterization of applications for new processor architectures and large scale systems are important tasks that permit to anticipate the required changes to efficiently exploit the future HPC systems. This thesis focuses on seismic imaging applications used for modeling complex physical phenomena, in particular the depth imaging application called Reverse Time Migration (RTM). My first contribution consists in characterizing and modeling the performance of the computational core of RTM which is based on finite-difference time-domain (FDTD) computations. I identify and explore the major tuning parameters influencing performance and the interaction between the architecture and the application. The second contribution is an analysis to identify the challenges for a hybrid and heterogeneous implementation of FDTD for manycore architectures. We target Intel’s first Xeon Phi co-processor, the Knights Corner. This architecture is an interesting proxy for our study since it contains some of the expected features of an Exascale system: concurrency and heterogeneity.My third contribution is an extension of the performance analysis and modeling to the full RTM. This adds communications and IOs to the computation part. RTM is a data intensive application and requires the storage of intermediate values of the computational field resulting in expensive IO accesses. My fourth contribution is the final measurement and model validation of my hybrid RTM implementation on a large system. This has been done on Stampede, a machine of the Texas Advanced Computing Center (TACC), which allows us to test the scalability up to 64 nodes each containing one 61-core Xeon Phi and two 8-core CPUs for a total close to 5000 heterogeneous coresLa caractérisation des applications en vue de les préparer pour les nouvelles architectures et les porter sur des systèmes très étendus est une étape importante pour pouvoir anticiper les modifications nécessaires. Comme les machines Exascale sont prévues pour la période 2018-2020, l'étude des applications et leur préparation pour ces machines s'avèrent donc essentielles. Nous nous intéressons aux applications d'imagerie sismique et en particulier à l'application Reverse Time Migration (RTM) car elle est très utilisée par les pétroliers dans le cadre de l'exploration sismique.La première partie de nos travaux a porté sur l'étude du cœur de calcul de l'application RTM qui consiste en un calcul de différences finies dans le domaine temporel (FDTD). Nous avons caractérisé cette partie de l'application en soulevant les aspects architecturaux des machines actuelles ayant un fort impact sur la performance, notamment les caches, les bandes passantes et le prefetching. Cette étude a abouti à l'élaboration d'un modèle de performance permettant de prédire le trafic DRAM des FDTD. La deuxième partie de la thèse se focalise sur l'impact de l'hétérogénéité et le parallélisme sur la FDTD et sur RTM. Nous avons choisi l'architecture manycore d’Intel, Xeon Phi, et nous avons étudié une implémentation "native" et une implémentation hétérogène et hybride, la version "symmetric". Enfin, nous avons porté l'application RTM sur un cluster hétérogène, Stampede du Texas Advanced Computing Center (TACC), où nous avons effectué des tests de scalabilité allant jusqu'à 64 nœuds contenant des coprocesseurs Xeon Phi et des processeurs Sandy Bridge ce qui correspond à presque 5000 cœur

Generalized sweeping preconditioners for domain decomposition methods applied to Helmholtz problems

Author: Dai Ruiyang
Publication venue: UCL - Université Catholique de Louvain
Publication date: 01/01/2021
Field of study

The main part of this thesis explores a family of generalized sweeping preconditionners for Helmholtz problems with non-overlapping checkerboard partition of the computational domain. The domain decomposition procedure relies on high-order transmission conditions and cross-point treatments, which cannot scale without an efficient preconditioning technique when the number of subdomains increases. With the proposed approach, existing sweeping preconditioners, such as the symmetric Gauss-Seidel and parallel double sweep preconditioners, can be applied to checkerboard partitions with different sweeping directions (e.g. horizontal and diagonal). Several directions can be combined thanks to the flexible version of GMRES, allowing for the rapid transfer of information in the different zones of the computational domain, then accelerating the convergence of the final iterative solution procedure. Several two-dimensional finite element results are proposed to study and to compare the sweeping preconditioners, and to illustrate the performance on cases of increasing complexity

Open Repository and Bibliography - Liège

DIAL UCLouvain

Optimization of Elastodynamic Finite Integration Technique on Intel Xeon Phi Knights Landing Processors

Author: Gregory Elizabeth D.
Leckey Cara A. C.
Schneck William C.
Publication venue
Publication date
Field of study

This work describes the development and optimization of an implementation of an isotropic elastodynamic finite integration technique (EFIT) code for parallelized computation on Intel Knights Landing (KNL) hardware. EFIT is a numerical approach resulting in standard staggered-grid finite difference equations for the elastodynamic equations of motion to simulate bulk waves is solids. The computationally efficient simulation of elastodynamic wave propagation and interactions in aerospace materials is of high-interest in the fields of nondestructive evaluation (NDE) and structural health monitoring (SHM). Ultrasonic inspection uses an ultrasonic signal, generated at the surface of the material/structure via use of a piezoelectric transducer, to propagate sound waves into the material where it interacts with any existing defects, as well as with structural boundaries and any material inhomogeneity. Reflections from defects and boundaries are then measured by a transducer. Realistic ultrasound simulation tools can significantly aid the development and optimization of inspection techniques and can assist in the interpretation of experimental data. The optimization of an elastodynamics simulation code for the KNL Many Integrated Core processor was performed. The optimization focused on data locality and vectorization. Results show that tiling of the data to exploit the cache behavior and allow for significant utilization of the KNL hardware. The MPI implementation allows for a scalable implementation enabling large problems to be simulated. The model results were validated against theoretical dispersion curves to within 2% of the group velocity, and within 0.5% of the phase velocity of the A0 mode. Aggressive use of tiling, threading, and vectorization techniques allowed for dramatically improved time to solution

NASA Technical Reports Server

Parallélisation de simulations interactives de champs ultrasonores pour le contrôle non destructif

Author: Lambert Jason
Publication venue: HAL CCSD
Publication date: 03/07/2015
Field of study

The Non Destructive Testing field increasingly uses simulation.It is used at every step of the whole control process of an industrial part, from speeding up control development to helping experts understand results. During this thesis, a simulation tool dedicated to the fast computation of an ultrasonic field radiated by a phase array probe in an isotropic specimen has been developped. Its performance enables an interactive usage. To benefit from the commonly available parallel architectures, a regular model (aimed at removing divergent branching) derived from the generic CIVA model has been developped. First, a reference implementation was developped to validate this model against CIVA results, and to analyze its performance behaviour before optimization. The resulting code has been optimized for three kinds of parallel architectures commonly available in workstations: general purpose processors (GPP), manycore coprocessors (Intel MIC) and graphics processing units (nVidia GPU). On the GPP and the MIC, the algorithm was reorganized and implemented to benefit from both parallelism levels, multhreading and vector instructions. On the GPU, the multiple steps of field computing have been divided in multiple successive CUDA kernels.Moreover, libraries dedicated to each architecture were used to speedup Fast Fourier Transforms, Intel MKL on GPP and MIC and nVidia cuFFT on GPU. Performance and hardware adequation of the produced algorithms were thoroughly studied for each architecture. On multiple realistic control configurations, interactive performance was reached. Perspectives to adress more complex configurations were drawn. Finally, the integration and the industrialization of this code in the commercial NDT plateform CIVA is discussed.La simulation est de plus en plus utilisée dans le domaine industriel du Contrôle Non Destructif. Elle est employée tout au long du processus de contrôle, que ce soit pour en accélérer la mise au point ou en comprendre les résultats. Les travaux menés au cours de cette thèse présentent une méthode de calcul rapide de champ ultrasonore rayonné par un capteur multi-éléments dans une pièce isotrope, permettant un usage interactif des simulations. Afin de tirer parti des architectures parallèles communément disponibles, un modèle régulier (qui limite au maximum les branchements divergents) dérivé du modèle générique présent dans la plateforme logicielle CIVA a été mis au point. Une première implémentation de référence a permis de le valider par rapport aux résultats CIVA et d'analyser son comportement en termes de performances. Le code a ensuite été porté et optimisé sur trois classes d'architectures parallèles aujourd'hui disponibles dans les stations de calcul : le processeur généraliste central (GPP), le coprocesseur manycore (Intel MIC) et la carte graphique (nVidia GPU). Concernant le processeur généraliste et le coprocesseur manycore, l'algorithme a été réorganisé et le code implémenté afin de tirer parti des deux niveaux de parallélisme disponibles, le multithreading et les instructions vectorielles. Sur la carte graphique, les différentes étapes de simulation de champ ont été découpées en une série de noyaux CUDA. Enfin, des bibliothèques de calculs spécifiques à ces architectures, Intel MKL et nVidia cuFFT, ont été utilisées pour effectuer les opérations de Transformées de Fourier Rapides. Les performances et la bonne adéquation des codes produits ont été analysées en détail pour chaque architecture. Dans plusieurs cas, sur des configurations de contrôle réalistes, des performances autorisant l'interactivité ont été atteintes. Des perspectives pour traiter des configurations plus complexes sont dressées. Enfin la problématique de l'industrialisation de ce type de code dans la plateforme logicielle CIVA est étudiée