Search CORE

1,166 research outputs found

Massively Parallel Ray Tracing Algorithm Using GPU

Author: Huang Xiang
Lin Jianbiao
Qin Yutong
Publication venue
Publication date: 13/04/2015
Field of study

Ray tracing is a technique for generating an image by tracing the path of light through pixels in an image plane and simulating the effects of high-quality global illumination at a heavy computational cost. Because of the high computation complexity, it can't reach the requirement of real-time rendering. The emergence of many-core architectures, makes it possible to reduce significantly the running time of ray tracing algorithm by employing the powerful ability of floating point computation. In this paper, a new GPU implementation and optimization of the ray tracing to accelerate the rendering process is presented

arXiv.org e-Print Archive

Crossref

Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

Author: de Laat Cees
Varbanescu Ana Lucia
Verstraaten Merijn
Publication venue
Publication date: 03/08/2017
Field of study

While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A Study of Hardware Performance Counters Selection for Cross Architectural GPU Power Modeling

Author: De Giusti Armando Eduardo
De Giusti Laura Cristina
Naiouf Marcelo
Pi Puig Martín
Publication venue
Publication date: 16/03/2020
Field of study

In the exascale race where huge corporations are spending billions of dollars on designing highly efficient heterogeneous supercomputers, the real need to reduce power envelopes forces current technologies to face crucial challenges as well as it demands the scientific community to evaluate and optimize the performance-power ratio. While energy consumption continues to climb up, the viability of these massive systems becomes a growing concern. In this context, the relevance of specific power-related research works turns into a priority. So we here develop an exhaustive step-by-step process for selecting a comprehensive set of hardware performance counters to serve as an input in an eventual GPU cross-architectural power consumption model. Our experiments show a high power-performance correlation between shared GPU events. Also, we present a set of events that delivers exclusive performance information in order to predict accurately GPU power fluctuations.XX Workshop Procesamiento Distribuido y Paralelo.Red de Universidades con Carreras en Informátic

A Study of Hardware Performance Counters Selection for Cross Architectural GPU Power Modeling

Author: De Giusti Armando Eduardo
De Giusti Laura Cristina
Naiouf Marcelo
Pi Puig Martín
Publication venue
Publication date: 01/10/2019
Field of study

Power And Hotspot Modeling For Modern GPUs

Author: Hassan Md Mainul
Publication venue: eGrove
Publication date: 01/01/2015
Field of study

As General Purpose GPUs (GPGPU) are increasingly becoming a prominent component of high performance computing platforms, power and thermal dissipation are getting more attention. The trade-offs among performance, power, and heat must be well modeled and evaluated from the early stage of GPU design. This necessitates a tool that allows GPU architects to quickly and accurately evaluate their design. There are a few models for GPU power but most of them estimate power at a higher level than architecture, which are therefore missing hardware reconfigurability. In this thesis, we propose a framework that models power and heat dissipation at the hardware architecture level, which allows for configuring and investigating individual hardware components. Our framework is also capable of visualizing the heat map of the processor over different clock cycles. To the best of our knowledge, this is the first comprehensive framework that integrates and visualizes power consumption and heat dissipation of GPUs

eGrove (Univ. of Mississippi)

Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

Author: Arcas Túnez Francisco
Bueno Crespo Andrés
Cecilia Canales José María
García Valverde Teresa
Llanes Antonio
Muñoz Andrés
Pérez Sánchez Horacio
Sánchez Antonia María
Publication venue: 'Bentham Science Publishers Ltd.'
Publication date: 01/01/2016
Field of study

The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

Institutional Repository UCAM

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Improving GPGPU Energy-Efficiency through Concurrent Kernel Execution and DVFS

Author: JIAO QING
Publication venue
Publication date: 22/09/2014
Field of study

Master'sMASTER OF SCIENC

ScholarBank@NUS

Power Bounded Computing on Current & Emerging HPC Systems

Author: Zou Pengfei
Publication venue: Clemson University Libraries
Publication date: 01/05/2020
Field of study

Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets. We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency. Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems

Clemson University: TigerPrints