573 research outputs found

    Astrophysical Supercomputing with GPUs: Critical Decisions for Early Adopters

    Full text link
    General purpose computing on graphics processing units (GPGPU) is dramatically changing the landscape of high performance computing in astronomy. In this paper, we identify and investigate several key decision areas, with a goal of simplyfing the early adoption of GPGPU in astronomy. We consider the merits of OpenCL as an open standard in order to reduce risks associated with coding in a native, vendor-specific programming environment, and present a GPU programming philosophy based on using brute force solutions. We assert that effective use of new GPU-based supercomputing facilities will require a change in approach from astronomers. This will likely include improved programming training, an increased need for software development best-practice through the use of profiling and related optimisation tools, and a greater reliance on third-party code libraries. As with any new technology, those willing to take the risks, and make the investment of time and effort to become early adopters of GPGPU in astronomy, stand to reap great benefits.Comment: 13 pages, 5 figures, accepted for publication in PAS

    Survey on Parallel Computing and Performance Modelling in High Performance Computing

    Get PDF
    The parallel programming come a long way with the advances in the HPC. The high performance computing landscape is shifting from collections of homogeneous nodes towards heterogeneous systems, in which nodes consist of a combination of traditional out-of-order execution cores and accelerator devices. Accelerators, built around GPUs, many-core chips, FPGAs or DSPs, are used to offload compute-intensive tasks. Large-scale GPU clusters are gaining popularity in the scientific computing community and having massive range of applications. However, their deployment and production use are associated with a number of new challenges including CUDA. In this paper, we present our efforts to address some of the issues related to HPC and also introduced some performance modelling techniques along with GPU clustering. DOI: 10.17762/ijritcc2321-8169.15029

    Contributions to the efficient use of general purpose coprocessors: kernel density estimation as case study

    Get PDF
    142 p.The high performance computing landscape is shifting from assemblies of homogeneous nodes towards heterogeneous systems, in which nodes consist of a combination of traditional out-of-order execution cores and accelerator devices. Accelerators provide greater theoretical performance compared to traditional multi-core CPUs, but exploiting their computing power remains as a challenging task.This dissertation discusses the issues that arise when trying to efficiently use general purpose accelerators. As a contribution to aid in this task, we present a thorough survey of performance modeling techniques and tools for general purpose coprocessors. Then we use as case study the statistical technique Kernel Density Estimation (KDE). KDE is a memory bound application that poses several challenges for its adaptation to the accelerator-based model. We present a novel algorithm for the computation of KDE that reduces considerably its computational complexity, called S-KDE. Furthermore, we have carried out two parallel implementations of S-KDE, one for multi and many-core processors, and another one for accelerators. The latter has been implemented in OpenCL in order to make it portable across a wide range of devices. We have evaluated the performance of each implementation of S-KDE in a variety of architectures, trying to highlight the bottlenecks and the limits that the code reaches in each device. Finally, we present an application of our S-KDE algorithm in the field of climatology: a novel methodology for the evaluation of environmental models

    Analytical cost metrics: days of future past

    Get PDF
    2019 Summer.Includes bibliographical references.Future exascale high-performance computing (HPC) systems are expected to be increasingly heterogeneous, consisting of several multi-core CPUs and a large number of accelerators, special-purpose hardware that will increase the computing power of the system in a very energy-efficient way. Specialized, energy-efficient accelerators are also an important component in many diverse systems beyond HPC: gaming machines, general purpose workstations, tablets, phones and other media devices. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. This work builds analytical cost models for cost metrics such as time, energy, memory access, and silicon area. These models are used to predict the performance of applications, for performance tuning, and chip design. The idea is to work with domain specific accelerators where analytical cost models can be accurately used for performance optimization. The performance optimization problems are formulated as mathematical optimization problems. This work explores the analytical cost modeling and mathematical optimization approach in a few ways. For stencil applications and GPU architectures, the analytical cost models are developed for execution time as well as energy. The models are used for performance tuning over existing architectures, and are coupled with silicon area models of GPU architectures to generate highly efficient architecture configurations. For matrix chain products, analytical closed form solutions for off-chip data movement are built and used to minimize the total data movement cost of a minimum op count tree

    Power And Hotspot Modeling For Modern GPUs

    Get PDF
    As General Purpose GPUs (GPGPU) are increasingly becoming a prominent component of high performance computing platforms, power and thermal dissipation are getting more attention. The trade-offs among performance, power, and heat must be well modeled and evaluated from the early stage of GPU design. This necessitates a tool that allows GPU architects to quickly and accurately evaluate their design. There are a few models for GPU power but most of them estimate power at a higher level than architecture, which are therefore missing hardware reconfigurability. In this thesis, we propose a framework that models power and heat dissipation at the hardware architecture level, which allows for configuring and investigating individual hardware components. Our framework is also capable of visualizing the heat map of the processor over different clock cycles. To the best of our knowledge, this is the first comprehensive framework that integrates and visualizes power consumption and heat dissipation of GPUs

    On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

    Get PDF
    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed

    Development of a GPGPU accelerated tool to simulate advection-reaction-diffusion phenomena in 2D

    Get PDF
    Computational models are powerful tools to the study of environmental systems, playing a fundamental role in several fields of research (hydrological sciences, biomathematics, atmospheric sciences, geosciences, among others). Most of these models require high computational capacity, especially when one considers high spatial resolution and the application to large areas. In this context, the exponential increase in computational power brought by General Purpose Graphics Processing Units (GPGPU) has drawn the attention of scientists and engineers to the development of low cost and high performance parallel implementations of environmental models. In this research, we apply GPGPU computing for the development of a model that describes the physical processes of advection, reaction and diffusion. This presentation is held in the form of three self-contained articles. In the first one, we present a GPGPU implementation for the solution of the 2D groundwater flow equation in unconfined aquifers for heterogenous and anisotropic media. We implement a finite difference solution scheme based on the Crank- Nicolson method and show that the GPGPU accelerated solution implemented using CUDA C/C++ (Compute Unified Device Architecture) greatly outperforms the corresponding serial solution implemented in C/C++. The results show that accelerated GPGPU implementation is capable of delivering up to 56 times acceleration in the solution process using an ordinary office computer. In the second article, we study the application of a diffusive-logistic growth (DLG) model to the problem of forest growth and regeneration. The study focuses on vegetation belonging to preservation areas, such as riparian buffer zones. The study was developed in two stages: (i) a methodology based on Artificial Neural Network Ensembles (ANNE) was applied to evaluate the width of riparian buffer required to filter 90% of the residual nitrogen; (ii) the DLG model was calibrated and validated to generate a prognostic of forest regeneration in riparian protection bands considering the minimum widths indicated by the ANNE. The solution was implemented in GPGPU and it was applied to simulate the forest regeneration process for forty years on the riparian protection bands along the Ligeiro river, in Brazil. The results from calibration and validation showed that the DLG model provides fairly accurate results for the modelling of forest regeneration. In the third manuscript, we present a GPGPU implementation of the solution of the advection-reaction-diffusion equation in 2D. The implementation is designed to be general and flexible to allow the modeling of a wide range of processes, including those with heterogeneity and anisotropy. We show that simulations performed in GPGPU allow the use of mesh grids containing more than 20 million points, corresponding to an area of 18,000 km? in a standard Landsat image resolution.Os modelos computacionais s?o ferramentas poderosas para o estudo de sistemas ambientais, desempenhando um papel fundamental em v?rios campos de pesquisa (ci?ncias hidrol?gicas, biomatem?tica, ci?ncias atmosf?ricas, geoci?ncias, entre outros). A maioria desses modelos requer alta capacidade computacional, especialmente quando se considera uma alta resolu??o espacial e a aplica??o em grandes ?reas. Neste contexto, o aumento exponencial do poder computacional trazido pelas Unidades de Processamento de Gr?ficos de Prop?sito Geral (GPGPU) chamou a aten??o de cientistas e engenheiros para o desenvolvimento de implementa??es paralelas de baixo custo e alto desempenho para modelos ambientais. Neste trabalho, aplicamos computa??o em GPGPU para o desenvolvimento de um modelo que descreve os processos f?sicos de advec??o, rea??o e difus?o. Esta disserta??o ? apresentada sob a forma de tr?s artigos. No primeiro, apresentamos uma implementa??o em GPGPU para a solu??o da equa??o de fluxo de ?guas subterr?neas 2D em aqu?feros n?o confinados para meios heterog?neos e anisotr?picos. Foi implementado um esquema de solu??o de diferen?as finitas com base no m?todo Crank- Nicolson e mostramos que a solu??o acelerada GPGPU implementada usando CUDA C / C ++ supera a solu??o serial correspondente implementada em C / C ++. Os resultados mostram que a implementa??o acelerada por GPGPU ? capaz de fornecer acelera??o de at? 56 vezes no processo da solu??o usando um computador de escrit?rio comum. No segundo artigo estudamos a aplica??o de um modelo de crescimento log?stico difusivo (DLG) ao problema de crescimento e regenera??o florestal. O estudo foi desenvolvido em duas etapas: (i) Aplicou-se uma metodologia baseada em Comites de Rede Neural Artificial (ANNE) para avaliar a largura da faixa de prote??o rip?ria necess?ria para filtrar 90% do nitrog?nio residual; (ii) O modelo DLG foi calibrado e validado para gerar um progn?stico de regenera??o florestal em faixas de prote??o rip?rias considerando as larguras m?nimas indicadas pela ANNE. A solu??o foi implementada em GPGPU e aplicada para simular o processo de regenera??o florestal para um per?odo de quarenta anos na faixa de prote??o rip?ria ao longo do rio Ligeiro, no Brasil. Os resultados da calibra??o e valida??o mostraram que o modelo DLG fornece resultados bastante precisos para a modelagem de regenera??o florestal. No terceiro artigo, apresenta-se uma implementa??o em GPGPU para solu??o da equa??o advec??o-rea??o-difus?o em 2D. A implementa??o ? projetada para ser geral e flex?vel para permitir a modelagem de uma ampla gama de processos, incluindo caracter?sticas como heterogeneidade e anisotropia do meio. Neste trabalho mostra-se que as simula??es realizadas em GPGPU permitem o uso de malhas contendo mais de 20 milh?es de pontos (vari?veis), correspondendo a uma ?rea de 18.000 km? em resolu??o de 30m padr?o das imagens Landsat

    Molecular dynamics simulations through GPU video games technologies

    Get PDF
    Bioinformatics is the scientific field that focuses on the application of computer technology to the management of biological information. Over the years, bioinformatics applications have been used to store, process and integrate biological and genetic information, using a wide range of methodologies. One of the most de novo techniques used to understand the physical movements of atoms and molecules is molecular dynamics (MD). MD is an in silico method to simulate the physical motions of atoms and molecules under certain conditions. This has become a state strategic technique and now plays a key role in many areas of exact sciences, such as chemistry, biology, physics and medicine. Due to their complexity, MD calculations could require enormous amounts of computer memory and time and therefore their execution has been a big problem. Despite the huge computational cost, molecular dynamics have been implemented using traditional computers with a central memory unit (CPU). A graphics processing unit (GPU) computing technology was first designed with the goal to improve video games, by rapidly creating and displaying images in a frame buffer such as screens. The hybrid GPU-CPU implementation, combined with parallel computing is a novel technology to perform a wide range of calculations. GPUs have been proposed and used to accelerate many scientific computations including MD simulations. Herein, we describe the new methodologies developed initially as video games and how they are now applied in MD simulations

    An Investigation into Animating Plant Structures within Real-time Constraints

    Get PDF
    This paper is an analysis of current developments in rendering botanical structures for scientic and entertainment purposes with a focus on visualising growth. The choices of practical investigations produce a novel approach for parallel parsing of difficult bracketed L-Systems, based upon the work of Lipp, Wonka and Wimmer (2010). Alongside this is a general overview of the issues involved when looking at growing systems, technical details involving programming for the Graphics Processing Unit (GPU) and other possible solutions for further work that also could achieve the project's goals
    corecore