1,117 research outputs found

    Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC

    Get PDF
    The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)

    On Modern Offloading Parallelization Methods: A Critical Analysis of OpenMP

    Get PDF
    The very concept of offloading computationally complex routines to a graphics processing unit for general-purpose computing is a problem left wide open to the academic community, both in terms of application as well as implementation, with several different and popular interfaces exploding into popularity within the last twenty years. The OpenMP standard is among the elites in this category, standing as a parallelization interface that has stood the test of time. The goals that the inquiry presented herein seeks to answer are twofold: Firstly, we aim to assess the performance of common sorting algorithms parallelized and offloaded using OpenMP, offloaded to NVIDIA GPU hardware, and secondly, to critically analyze the programmer experience in using an implementation of the OpenMP standard (again, with offloading to NVIDIA GPU hardware) to implement these algorithms. For completeness, the empirical analysis contains a comparison to the unparallelized algorithms. From this data and the impression of the programming experience, strengths and weaknesses of usage of OpenMP for parallelizing and offloading sorting algorithms are derived. After discussing each benchmark in depth, as well as the data derived from the parallelized implementations of each, we found that OpenMP’s position as one of the forefront parallel programming standards is well-justified, with few, but notable, pitfalls for the average programmer. In terms of its performance in parallelizing common sorting algorithms with offloading to NVIDIA GPU hardware, it was found that OpenMP fails to deliver viable implementations of the algorithms that are advantageous over their single-threaded counterparts, though, this was found not to be the fault of OpenMP, but rather, of the inherent nature of offloading to NVIDIA GPU hardware

    Recent EUROfusion Achievements in Support of Computationally Demanding Multiscale Fusion Physics Simulations and Integrated Modeling

    Get PDF
    Integrated modeling (IM) of present experiments and future tokamak reactors requires the provision of computational resources and numerical tools capable of simulating multiscale spatial phenomena as well as fast transient events and relatively slow plasma evolution within a reasonably short computational time. Recent progress in the implementation of the new computational resources for fusion applications in Europe based on modern supercomputer technologies (supercomputer MARCONI-FUSION), in the optimization and speedup of the EU fusion-related first-principle codes, and in the development of a basis for physics codes/modules integration into a centrally maintained suite of IM tools achieved within the EUROfusion Consortium is presented. Physics phenomena that can now be reasonably modelled in various areas (core turbulence and magnetic reconnection, edge and scrape-off layer physics, radio-frequency heating and current drive, magnetohydrodynamic model, reflectometry simulations) following successful code optimizations and parallelization are briefly described. Development activities in support to IM are summarized. They include support to (1) the local deployment of the IM infrastructure and access to experimental data at various host sites, (2) the management of releases for sophisticated IM workflows involving a large number of components, and (3) the performance optimization of complex IM workflows.This work has been carried out within the framework of the EUROfusion Consortium and has received funding from the Euratom research and training programme 2014 to 2018 under grant agreement 633053. The views and opinions expressed herein do not necessarily reflect those of the European Commission or ITER.Peer ReviewedPostprint (published version

    A Framework for the Design and Simulation of Embedded Vision Applications Based on OpenVX and ROS

    Get PDF
    Customizing computer vision applications for embedded systems is a common and widespread problem in the cyber-physical systems community. Such a customization means parametrizing the algorithm by considering the external environment and mapping the Software application to the heterogeneous Hardware resources by satisfying non-functional constraints like performance, power, and energy consumption. This work presents a framework for the design and simulation of embedded vision applications that integrates the OpenVX standard platform with the Robot Operating System (ROS). The paper shows how the framework has been applied to tune the ORB-SLAM application for an NVIDIA Jetson TX2 board by considering different environment contexts and different design constraints

    A domain-specific language and matrix-free stencil code for investigating electronic properties of Dirac and topological materials

    Full text link
    We introduce PVSC-DTM (Parallel Vectorized Stencil Code for Dirac and Topological Materials), a library and code generator based on a domain-specific language tailored to implement the specific stencil-like algorithms that can describe Dirac and topological materials such as graphene and topological insulators in a matrix-free way. The generated hybrid-parallel (MPI+OpenMP) code is fully vectorized using Single Instruction Multiple Data (SIMD) extensions. It is significantly faster than matrix-based approaches on the node level and performs in accordance with the roofline model. We demonstrate the chip-level performance and distributed-memory scalability of basic building blocks such as sparse matrix-(multiple-) vector multiplication on modern multicore CPUs. As an application example, we use the PVSC-DTM scheme to (i) explore the scattering of a Dirac wave on an array of gate-defined quantum dots, to (ii) calculate a bunch of interior eigenvalues for strong topological insulators, and to (iii) discuss the photoemission spectra of a disordered Weyl semimetal.Comment: 16 pages, 2 tables, 11 figure

    Parallelization of a two-dimensional flood inundation model based on domain decomposition

    Get PDF
    Flood modelling often involves prediction of the inundated extent over large spatial and temporal scales. As the dimensionality of the system and the complexity of the problems increase, the need to obtain quick solutions becomes a priority. However, for large-scale problems or situations where fine resolution data is required, it is often not possible or practical to run the model on a single computer in a reasonable timeframe. This paper presents the development and testing of a parallelized 2D diffusion-based flood inundation model (FloodMap-Parallel) which enables largescale simulations to be run on distributed multi-processors. The model has been applied to three locations in the UK with different flow and topographical boundary conditions. The accuracy of the parallelized model and its computational efficiency have been tested. The predictions obtained from the parallelized model match those obtained from the serialized simulations. The computational performance of the model has been investigated in relation to the granularity of the domain decomposition, the total number of cells and the domain decomposition configuration pattern. Results show that the parallelized model is more effective with simulations of low granularity and a large number of cells. The large communication overhead associated with the potential loadimbalance between sub-domains is a major bottleneck in utilizing this approach with higher domain granularity
    • …
    corecore