500 research outputs found

    Parallelization of a Six Degree of Freedom Entry Vehicle Trajectory Simulation Using OpenMP and OpenACC

    Get PDF
    The art and science of writing parallelized software, using methods such as Open Multi-Processing (OpenMP) and Open Accelerators (OpenACC), is dominated by computer scientists. Engineers and non-computer scientists looking to apply these techniques to their project applications face a steep learning curve, especially when looking to adapt their original single threaded software to run multi-threaded on graphics processing units (GPUs). There are significant changes in mindset that must occur; such as how to manage memory, the organization of instructions, and the use of if statements (also known as branching). The purpose of this work is twofold: 1) to demonstrate the applicability of parallelized coding methodologies, OpenMP and OpenACC, to tasks outside of the typical large scale matrix mathematics; and 2) to discuss, from an engineers perspective, the lessons learned from parallelizing software using these computer science techniques. This work applies OpenMP, on both multi-core central processing units (CPUs) and Intel Xeon Phi 7210, and OpenACC on GPUs. These parallelization techniques are used to tackle the simulation of thousands of entry vehicle trajectories through the integration of six degree of freedom (DoF) equations of motion (EoM). The forces and moments acting on the entry vehicle, and used by the EoM, are estimated using multiple models of varying levels of complexity. Several benchmark comparisons are made on the execution of six DoF trajectory simulation: single thread Intel Xeon E5-2670 CPU, multi-thread CPU using OpenMP, multi-thread Xeon Phi 7210 using OpenMP, and multi-thread NVIDIA Tesla K40 GPU using OpenACC. These benchmarks are run on the Pleiades Supercomputer Cluster at the National Aeronautics and Space Administration (NASA) Ames Research Center (ARC), and a Xeon Phi 7210 node at NASA Langley Research Center (LaRC)

    Формализованное проектирование эффективных многопоточных программ

    Get PDF
    Предложено совместное использование высокоуровневого инструментария алгеброалгоритмического проектирования, дополненного подсистемой переписывающих правил, а также низкоуровневых профилировщиков для автоматизации проектирования и повышения производительности последовательных и параллельных (многопоточных) программ с общей памятью

    An LLVM Instrumentation Plug-in for Score-P

    Full text link
    Reducing application runtime, scaling parallel applications to higher numbers of processes/threads, and porting applications to new hardware architectures are tasks necessary in the software development process. Therefore, developers have to investigate and understand application runtime behavior. Tools such as monitoring infrastructures that capture performance relevant data during application execution assist in this task. The measured data forms the basis for identifying bottlenecks and optimizing the code. Monitoring infrastructures need mechanisms to record application activities in order to conduct measurements. Automatic instrumentation of the source code is the preferred method in most application scenarios. We introduce a plug-in for the LLVM infrastructure that enables automatic source code instrumentation at compile-time. In contrast to available instrumentation mechanisms in LLVM/Clang, our plug-in can selectively include/exclude individual application functions. This enables developers to fine-tune the measurement to the required level of detail while avoiding large runtime overheads due to excessive instrumentation.Comment: 8 page

    Acceleration computing process in wavelength scanning interferometry

    Get PDF
    The optical interferometry has been widely explored for surface measurement due to the advantages of non-contact and high accuracy interrogation. Eventually, some interferometers are used to measure both rough and smooth surfaces such as white light interferometry and wavelength scanning interferometry (WSI). The WSI can be used to measure large discontinuous surface profiles without the phase ambiguity problems. However, the WSI usually needs to capture hundreds of interferograms at different wavelength in order to evaluate the surface finish for a sample. The evaluating process for this large amount of data needs long processing time if CPUs traditional programming is used. This paper presents a parallel programming model to achieve the data parallelism for accelerating the computing analysis of the captured data. This parallel programming is based on CUDATM C program structure that developed by NVIDIA. Additionally, this paper explains the mathematical algorithm that has been used for evaluating the surface profiles. The computing time and accuracy obtained from CUDA program, using GeForce GTX 280 graphics processing unit (GPU), were compared to those obtained from sequential execution Matlab program, using Intel® Core™2 Duo CPU. The results of measuring a step height sample shows that the parallel programming capability of the GPU can highly accelerate the floating point calculation throughput compared to multicore CPU

    Optimization of the AGATA pulse shape analysis algorithm using graphics processing units

    Get PDF
    Questo progetto di tesi affronta il problema della velocità di esecuzione dell'algoritmo GridSearch nell'ambito dell'analisi di forma di impulso per i segnali acquisiti dallo spettrometro gamma AGATA. Il problema viene risolto fornendo un'implementazione dell'algoritmo stesso, in linguaggio OpenCL, in modo da sfruttare la potenza di calcolo messa a disposizione dalle GPU delle moderne schede graficheope

    Intel-oneAPI para Computación Heterogénea

    Get PDF
    Trabajo de Fin de Grado en Ingeniería Informática, Facultad de Informática UCM, Departamento de Departamento de Arquitectura de Computadores y Automática, Curso 2020/2021"oneAPI is a cross-industry, open, standards-based unified programming model that delivers a common developer experience across accelerator architectures—for faster application performance, more productivity, and greater innovation." -www.oneapi.com The Intel DPC++ Compatibility Tool is a component of the Intel oneAPI base toolkit. This tool automatically transforms CUDA code into Data Parallel C++ (DPC++) assisting in the migration process. This project consists of an analysis of the DPC++ Compatibility Tool, considering the manual intervention required and the problems encountered while migrating the Rodinia benchmarks. And a comparative study of the performance obtained by the migrated code."oneAPI es un modelo de programación unificado, abierto y basado en estándares, que ofrece una experiencia de desarrollador común en todas las arquitecturas de aceleradores, para un rendimiento de aplicaciones más rápido, más productividad y una mayor innovación." -www.oneapi.com La herramienta de compatibilidad DPC++ de Intel es un componente del oneAPI Base Toolkit. esta herramienta transforma automáticamente código CUDA en Data Parallel C++ (DPC++) ayudando en el proceso de migración. Este proyecto consiste en un análisis de la herramienta de compatibilidad DPC++, considerando la intervención manual requerida y los problemas encontrados al migrar los benchmarks de Rodinia. Y un estudio comparativo del rendimiento obtenido por el código migrado.Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEunpu
    corecore