Search CORE

500 research outputs found

Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC

Author: Green Justin S.
Gutierrez Julian
Williams R. Anthony
Publication venue
Publication date
Field of study

The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)

NASA Technical Reports Server

Формализованное проектирование эффективных многопоточных программ

Author: Дорошенко А.Е.
Жереб К.А.
Яценко Е.А.
Publication venue: Інститут програмних систем НАН України
Publication date: 01/01/2007
Field of study

Предложено совместное использование высокоуровневого инструментария алгеброалгоритмического проектирования, дополненного подсистемой переписывающих правил, а также низкоуровневых профилировщиков для автоматизации проектирования и повышения производительности последовательных и параллельных (многопоточных) программ с общей памятью

Наукова електронна бібліотека періодичних видань НАН України (Vernadsky National Library of Ukraine)

An LLVM Instrumentation Plug-in for Score-P

Author: Brendel Ronny
Döbel Sebastian
Herold Christian
Tschüter Ronny
Weber Matthias
Wesarg Bert
Ziegenbalg Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/12/2017
Field of study

Reducing application runtime, scaling parallel applications to higher numbers of processes/threads, and porting applications to new hardware architectures are tasks necessary in the software development process. Therefore, developers have to investigate and understand application runtime behavior. Tools such as monitoring infrastructures that capture performance relevant data during application execution assist in this task. The measured data forms the basis for identifying bottlenecks and optimizing the code. Monitoring infrastructures need mechanisms to record application activities in order to conduct measurements. Automatic instrumentation of the source code is the preferred method in most application scenarios. We introduce a plug-in for the LLVM infrastructure that enables automatic source code instrumentation at compile-time. In contrast to available instrumentation mechanisms in LLVM/Clang, our plug-in can selectively include/exclude individual application functions. This enables developers to fine-tune the measurement to the required level of detail while avoiding large runtime overheads due to excessive instrumentation.Comment: 8 page

arXiv.org e-Print Archive

Crossref

Acceleration computing process in wavelength scanning interferometry

Author: Gao F.
Jiang Xiang
Muhamedsalih Hussam
Publication venue
Publication date: 29/06/2011
Field of study

The optical interferometry has been widely explored for surface measurement due to the advantages of non-contact and high accuracy interrogation. Eventually, some interferometers are used to measure both rough and smooth surfaces such as white light interferometry and wavelength scanning interferometry (WSI). The WSI can be used to measure large discontinuous surface profiles without the phase ambiguity problems. However, the WSI usually needs to capture hundreds of interferograms at different wavelength in order to evaluate the surface finish for a sample. The evaluating process for this large amount of data needs long processing time if CPUs traditional programming is used. This paper presents a parallel programming model to achieve the data parallelism for accelerating the computing analysis of the captured data. This parallel programming is based on CUDATM C program structure that developed by NVIDIA. Additionally, this paper explains the mathematical algorithm that has been used for evaluating the surface profiles. The computing time and accuracy obtained from CUDA program, using GeForce GTX 280 graphics processing unit (GPU), were compared to those obtained from sequential execution Matlab program, using Intel® Core™2 Duo CPU. The results of measuring a step height sample shows that the parallel programming capability of the GPU can highly accelerate the floating point calculation throughput compared to multicore CPU

University of Huddersfield Repository

Optimization of the AGATA pulse shape analysis algorithm using graphics processing units

Author: Calore Enrico
Publication venue
Publication date: 08/04/2022
Field of study

Questo progetto di tesi affronta il problema della velocità di esecuzione dell'algoritmo GridSearch nell'ambito dell'analisi di forma di impulso per i segnali acquisiti dallo spettrometro gamma AGATA. Il problema viene risolto fornendo un'implementazione dell'algoritmo stesso, in linguaggio OpenCL, in modo da sfruttare la potenza di calcolo messa a disposizione dalle GPU delle moderne schede graficheope

Padua Thesis and Dissertation Archive

Intel-oneAPI para Computación Heterogénea

Author: Castaño Roldán Germán
Publication venue
Publication date: 01/01/2021
Field of study

Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Departamento de Arquitectura de Computadores y Automática, Curso 2020/2021"oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures—for faster application performance, more productivity, and greater innovation." -www.oneapi.com The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI base toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++) assisting in the migration process. This project consists of an analysis of the DPC++ Compatibility Tool, considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. And a comparative study of the performance obtained by the migrated code."oneAPI es un modelo de programación unificado, abierto y basado en estándares, que ofrece una experiencia de desarrollador común en todas las arquitecturas de aceleradores, para un rendimiento de aplicaciones más rápido, más productividad y una mayor innovación." -www.oneapi.com La herramienta de compatibilidad DPC++ de Intel es un componente del oneAPI Base Toolkit. esta herramienta transforma automáticamente código CUDA en Data Parallel C++ (DPC++) ayudando en el proceso de migración. Este proyecto consiste en un análisis de la herramienta de compatibilidad DPC++, considerando la intervención manual requerida y los problemas encontrados al migrar los benchmarks de Rodinia. Y un estudio comparativo del rendimiento obtenido por el código migrado.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu

Docta Complutense