Search CORE

274,405 research outputs found

Statistical Regression Methods for GPGPU Design Space Exploration

Author: Raut Nimisha
Publication venue: Clemson University Libraries
Publication date: 01/08/2013
Field of study

General Purpose Graphics Processing Units (GPGPUs) have leveraged the performance and power efficiency of today\u27s heterogeneous systems to usher in a new era of innovation in high-performance scientific computing. These systems can offer significantly high performance for massively parallel applications; however, their resources may be wasted due to inefficient tuning strategies. Previous application tuning studies pre-dominantly employ low-level, architecture specific tuning which can make the performance modeling task difficult and less generic. In this research, we explore the GPGPU design space featuring the memory hierarchy for application tuning using regression-based performance prediction framework and rank the design space based on the runtime performance. The regression-based framework models the GPGPU device computations using algorithm characteristics such as the number of floating-point operations, total number of bytes, and hardware parameters pertaining to the GPGPU memory hierarchy as predictor variables. The computation component regression models are developed using several instrumented executions of the algorithms that include a range of FLOPS-to-Byte requirement. We validate our model with a Synchronous Iterative Algorithm (SIA) set that includes Spiking Neural Networks (SNNs) and Anisotropic Diffusion Filtering (ADF) for massive images. The highly parallel nature of the above mentioned algorithms, in addition to their wide range of communication-to-computation complexities, makes them good candidates for this study. A hierarchy of implementations for the SNNs and ADF is constructed and ranked using the regression-based framework. We further illustrate the Synchronous Iterative GPGPU Execution (SIGE) model on the GPGPU-augmented Palmetto Cluster. The performance prediction framework maps appropriate design space implementation for 4 out of 5 case studies used in this research. The final goal of this research is to establish the efficacy of the regression-based framework to accurately predict the application kernel runtime, allowing developers to correctly rank their design space prior to the large-scale implementation

Clemson University: TigerPrints

Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach

Author: de Laat Cees
Varbanescu Ana Lucia
Verstraaten Merijn
Publication venue
Publication date: 03/08/2017
Field of study

While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune different graph operations on a specific GPU. However, the performance impact of the input graph has only been taken into account indirectly as a result of the graphs used to benchmark the system. In this work, we present a case study investigating how to use the properties of the input graph to improve the performance of the breadth-first search (BFS) graph traversal. To do so, we first study the performance variation of 15 different BFS implementations across 248 graphs. Using this performance data, we show that significant speed-up can be achieved by combining the best implementation for each level of the traversal. To make use of this data-dependent optimization, we must correctly predict the relative performance of algorithms per graph level, and enable dynamic switching to the optimal algorithm for each level at runtime. We use the collected performance data to train a binary decision tree, to enable high-accuracy predictions and fast switching. We demonstrate empirically that our decision tree is both fast enough to allow dynamic switching between implementations, without noticeable overhead, and accurate enough in its prediction to enable significant BFS speedup. We conclude that our model-driven approach (1) enables BFS to outperform state of the art GPU algorithms, and (2) can be adapted for other BFS variants, other algorithms, or more specific datasets

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A Taxonomy of Workflow Management Systems for Grid Computing

Author: Buyya Rajkumar
Yu Jia
Publication venue
Publication date: 01/01/2005
Field of study

With the advent of Grid and application technologies, scientists and engineers are building more and more complex applications to manage and process large data sets, and execute scientific experiments on distributed resources. Such application scenarios require means for composing and executing complex workflows. Therefore, many efforts have been made towards the development of workflow management systems for Grid computing. In this paper, we propose a taxonomy that characterizes and classifies various approaches for building and executing workflows on Grids. We also survey several representative Grid workflow systems developed by various projects world-wide to demonstrate the comprehensiveness of the taxonomy. The taxonomy not only highlights the design and engineering similarities and differences of state-of-the-art in Grid workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure

arXiv.org e-Print Archive

CiteSeerX

CT Automated Exposure Control Using A Generalized Detectability Index

Author: Crotty Dominic J.
Fan Jiahua
Foley W. Dennis
Khobragade P.
Kulkarni Naveen M.
O\u27Connor Stacy D.
Rupcich Franco
Schmidt Taly Gilat
Publication venue: e-Publications@Marquette
Publication date: 01/01/2019
Field of study

Purpose Identifying an appropriate tube current setting can be challenging when using iterative reconstruction due to the varying relationship between spatial resolution, contrast, noise, and dose across different algorithms. This study developed and investigated the application of a generalized detectability index (d\u27gen) to determine the noise parameter to input to existing automated exposure control (AEC) systems to provide consistent image quality (IQ) across different reconstruction approaches. Methods This study proposes a task‐based automated exposure control (AEC) method using a generalized detectability index (d\u27gen). The proposed method leverages existing AEC methods that are based on a prescribed noise level. The generalized d\u27gen metric is calculated using lookup tables of task‐based modulation transfer function (MTF) and noise power spectrum (NPS). To generate the lookup tables, the American College of Radiology CT accreditation phantom was scanned on a multidetector CT scanner (Revolution CT, GE Healthcare) at 120 kV and tube current varied manually from 20 to 240 mAs. Images were reconstructed using a reference reconstruction algorithm and four levels of an in‐house iterative reconstruction algorithm with different regularization strengths (IR1–IR4). The task‐based MTF and NPS were estimated from the measured images to create lookup tables of scaling factors that convert between d\u27gen and noise standard deviation. The performance of the proposed d\u27gen‐AEC method in providing a desired IQ level over a range of iterative reconstruction algorithms was evaluated using the American College of Radiology (ACR) phantom with elliptical shell and using a human reader evaluation on anthropomorphic phantom images. Results The study of the ACR phantom with elliptical shell demonstrated reasonable agreement between the d\u27gen predicted by the lookup table and d\u27 measured in the images, with a mean absolute error of 15% across all dose levels and maximum error of 45% at the lowest dose level with the elliptical shell. For the anthropomorphic phantom study, the mean reader scores for images resulting from the d\u27gen‐AEC method were 3.3 (reference image), 3.5 (IR1), 3.6 (IR2), 3.5 (IR3), and 2.2 (IR4). When using the d\u27gen‐AEC method, the observers’ IQ scores for the reference reconstruction were statistical equivalent to the scores for IR1, IR2, and IR3 iterative reconstructions (P \u3e 0.35). The d\u27gen‐AEC method achieved this equivalent IQ at lower dose for the IR scans compared to the reference scans. Conclusions A novel AEC method, based on a generalized detectability index, was investigated. The proposed method can be used with some existing AEC systems to derive the tube current profile for iterative reconstruction algorithms. The results provide preliminary evidence that the proposed d\u27gen‐AEC can produce similar IQ across different iterative reconstruction approaches at different dose levels

epublications@Marquette