274,405 research outputs found
Statistical Regression Methods for GPGPU Design Space Exploration
General Purpose Graphics Processing Units (GPGPUs) have leveraged the performance and power efficiency of today\u27s heterogeneous systems to usher in a new era of innovation in high-performance scientific computing. These systems can offer significantly high performance for massively parallel applications; however, their resources may be wasted due to inefficient tuning strategies. Previous application tuning studies pre-dominantly employ low-level, architecture specific tuning which can make the performance modeling task difficult and less generic. In this research, we explore the GPGPU design space featuring the memory hierarchy for application tuning using regression-based performance prediction framework and rank the design space based on the runtime performance. The regression-based framework models the GPGPU device computations using algorithm characteristics such as the number of floating-point operations, total number of bytes, and hardware parameters pertaining to the GPGPU memory hierarchy as predictor variables. The computation component regression models are developed using several instrumented executions of the algorithms that include a range of FLOPS-to-Byte requirement. We validate our model with a Synchronous Iterative Algorithm (SIA) set that includes Spiking Neural Networks (SNNs) and Anisotropic Diffusion Filtering (ADF) for massive images. The highly parallel nature of the above mentioned algorithms, in addition to their wide range of communication-to-computation complexities, makes them good candidates for this study. A hierarchy of implementations for the SNNs and ADF is constructed and ranked using the regression-based framework. We further illustrate the Synchronous Iterative GPGPU Execution (SIGE) model on the GPGPU-augmented Palmetto Cluster. The performance prediction framework maps appropriate design space implementation for 4 out of 5 case studies used in this research. The final goal of this research is to establish the efficacy of the regression-based framework to accurately predict the application kernel runtime, allowing developers to correctly rank their design space prior to the large-scale implementation
Using Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph
algorithms is heavily dependent on the input data, there has been surprisingly
little research to quantify and predict the impact the graph structure has on
performance. Parallel graph algorithms, running on many-core systems such as
GPUs, are no exception: most research has focused on how to efficiently
implement and tune different graph operations on a specific GPU. However, the
performance impact of the input graph has only been taken into account
indirectly as a result of the graphs used to benchmark the system.
In this work, we present a case study investigating how to use the properties
of the input graph to improve the performance of the breadth-first search (BFS)
graph traversal. To do so, we first study the performance variation of 15
different BFS implementations across 248 graphs. Using this performance data,
we show that significant speed-up can be achieved by combining the best
implementation for each level of the traversal. To make use of this
data-dependent optimization, we must correctly predict the relative performance
of algorithms per graph level, and enable dynamic switching to the optimal
algorithm for each level at runtime.
We use the collected performance data to train a binary decision tree, to
enable high-accuracy predictions and fast switching. We demonstrate empirically
that our decision tree is both fast enough to allow dynamic switching between
implementations, without noticeable overhead, and accurate enough in its
prediction to enable significant BFS speedup. We conclude that our model-driven
approach (1) enables BFS to outperform state of the art GPU algorithms, and (2)
can be adapted for other BFS variants, other algorithms, or more specific
datasets
A Taxonomy of Workflow Management Systems for Grid Computing
With the advent of Grid and application technologies, scientists and
engineers are building more and more complex applications to manage and process
large data sets, and execute scientific experiments on distributed resources.
Such application scenarios require means for composing and executing complex
workflows. Therefore, many efforts have been made towards the development of
workflow management systems for Grid computing. In this paper, we propose a
taxonomy that characterizes and classifies various approaches for building and
executing workflows on Grids. We also survey several representative Grid
workflow systems developed by various projects world-wide to demonstrate the
comprehensiveness of the taxonomy. The taxonomy not only highlights the design
and engineering similarities and differences of state-of-the-art in Grid
workflow systems, but also identifies the areas that need further research.Comment: 29 pages, 15 figure
CT Automated Exposure Control Using A Generalized Detectability Index
Purpose
Identifying an appropriate tube current setting can be challenging when using iterative reconstruction due to the varying relationship between spatial resolution, contrast, noise, and dose across different algorithms. This study developed and investigated the application of a generalized detectability index (d\u27gen) to determine the noise parameter to input to existing automated exposure control (AEC) systems to provide consistent image quality (IQ) across different reconstruction approaches. Methods
This study proposes a taskābased automated exposure control (AEC) method using a generalized detectability index (d\u27gen). The proposed method leverages existing AEC methods that are based on a prescribed noise level. The generalized d\u27gen metric is calculated using lookup tables of taskābased modulation transfer function (MTF) and noise power spectrum (NPS). To generate the lookup tables, the American College of Radiology CT accreditation phantom was scanned on a multidetector CT scanner (Revolution CT, GE Healthcare) at 120 kV and tube current varied manually from 20 to 240 mAs. Images were reconstructed using a reference reconstruction algorithm and four levels of an ināhouse iterative reconstruction algorithm with different regularization strengths (IR1āIR4). The taskābased MTF and NPS were estimated from the measured images to create lookup tables of scaling factors that convert between d\u27gen and noise standard deviation. The performance of the proposed d\u27genāAEC method in providing a desired IQ level over a range of iterative reconstruction algorithms was evaluated using the American College of Radiology (ACR) phantom with elliptical shell and using a human reader evaluation on anthropomorphic phantom images. Results
The study of the ACR phantom with elliptical shell demonstrated reasonable agreement between the d\u27gen predicted by the lookup table and d\u27 measured in the images, with a mean absolute error of 15% across all dose levels and maximum error of 45% at the lowest dose level with the elliptical shell. For the anthropomorphic phantom study, the mean reader scores for images resulting from the d\u27genāAEC method were 3.3 (reference image), 3.5 (IR1), 3.6 (IR2), 3.5 (IR3), and 2.2 (IR4). When using the d\u27genāAEC method, the observersā IQ scores for the reference reconstruction were statistical equivalent to the scores for IR1, IR2, and IR3 iterative reconstructions (P \u3e 0.35). The d\u27genāAEC method achieved this equivalent IQ at lower dose for the IR scans compared to the reference scans. Conclusions
A novel AEC method, based on a generalized detectability index, was investigated. The proposed method can be used with some existing AEC systems to derive the tube current profile for iterative reconstruction algorithms. The results provide preliminary evidence that the proposed d\u27genāAEC can produce similar IQ across different iterative reconstruction approaches at different dose levels
- ā¦