1,792 research outputs found

    Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC

    Get PDF
    The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)

    GPU in Physics Computation: Case Geant4 Navigation

    Full text link
    General purpose computing on graphic processing units (GPU) is a potential method of speeding up scientific computation with low cost and high energy efficiency. We experimented with the particle physics simulation toolkit Geant4 used at CERN to benchmark its geometry navigation functionality on a GPU. The goal was to find out whether Geant4 physics simulations could benefit from GPU acceleration and how difficult it is to modify Geant4 code to run in a GPU. We ported selected parts of Geant4 code to C99 & CUDA and implemented a simple gamma physics simulation utilizing this code to measure efficiency. The performance of the program was tested by running it on two different platforms: NVIDIA GeForce 470 GTX GPU and a 12-core AMD CPU system. Our conclusion was that GPUs can be a competitive alternate for multi-core computers but porting existing software in an efficient way is challenging

    ACCELERATION OF SPIKING NEURAL NETWORKS ON SINGLE-GPU AND MULTI-GPU SYSTEMS

    Get PDF
    There has been a strong interest in modeling a mammalian brain in order to study the architectural and functional principles of the brain and offer tools to neuroscientists and medical researchers for related studies. Artificial Neural Networks (ANNs) are compute models that try to simulate the structure and/or the functional behavior of neurons and process information using the connectionist approach to computation. Hence, the ANNs are the viable options for such studies. Of many classes of ANNs, Spiking Neuron Network models (SNNs) have been employed to simulate mammalian brain, capturing its functionality and inference capabilities. In this class of neuron models, some of the biologically accurate models are the Hodgkin Huxley (HH) model, Morris Lecar (ML) model, Wilson model, and the Izhikevich model. The HH model is the oldest, most biologically accurate and the most compute intensive of the listed models. The Izhikevich model, a more recent development, is sufficiently accurate and involves the least computations. Accurate modeling of the neurons calls for compute intensive models and hence single core processors are not suitable for large scale SNN simulations due to their serial computation and low memory bandwidth. Graphical Processing Units have been used for general purpose computing as they offer raw computing power, with a majority of logic solely dedicated for computing purpose. The work presented in this thesis implements two-level character recognition networks using the four previously mentioned SNN models in Nvidia\u27s Tesla C870 card and investigates performance improvements over the equivalent software implementation on a 2.66 GHz Intel Core 2 Quad. The work probes some of the important parameters such as the kernel time, memory transfer time and flops offered by the GPU device for the implementations. In this work, we report speed-ups as high as 576x on a single GPU device for the most compute-intensive, highly biologically realistic Hodgkin Huxley model. These results demonstrate the potential of GPUs for large-scale, accurate modeling of the mammalian brain. The research in this thesis also presents several optimization techniques and strategies, and discusses the major bottlenecks that must be avoided in order to achieve maximum performance benefits for applications involving complex computations. The research also investigates an initial multi-GPU implementation to study the problem partitioning for simulating biological-scale neuron networks on a cluster of GPU devices

    GPU implementation of block transforms

    Get PDF
    Traditionally, intensive floating-point computational ability of Graphics Processing Units (GPUs) has been mainly limited for rendering and visualization application by architecture and programming model. However, with increasing programmability and architecture progress, GPUs inherent massively parallel computational ability have become an essential part of today\u27s mainstream general purpose (non-graphical) high performance computing system. It has been widely reported that adapted GPU-based algorithms outperform significantly their CPU counterpart. The focus of the thesis is to utilize NVIDIA CUDA GPUs to implement orthogonal transforms such as signal dependent Karhunen-Loeve Transform and signal independent Discrete Cosine Transform. GPU architecture and programming model are examined. Mathematical preliminaries of orthogonal transform, eigen-analysis and algorithms are re-visited. Due to highly parallel structure, GPUs are well suited to such computation. Further, the thesis examines multiple implementations schemes and configuration, measurement of performance is provided. A real time processing display application frame is developed to visually exhibit GPU compute capability

    Explorations of the viability of ARM and Xeon Phi for physics processing

    Full text link
    We report on our investigations into the viability of the ARM processor and the Intel Xeon Phi co-processor for scientific computing. We describe our experience porting software to these processors and running benchmarks using real physics applications to explore the potential of these processors for production physics processing.Comment: Submitted to proceedings of the 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13), Amsterda
    • …
    corecore