2,947 research outputs found
EXPLORING MULTIPLE LEVELS OF PERFORMANCE MODELING FOR HETEROGENEOUS SYSTEMS
The current trend in High-Performance Computing (HPC) is to extract concurrency from clusters that include heterogeneous resources such as General Purpose Graphical Processing Units (GPGPUs) and Field Programmable Gate Array (FPGAs). Although these heterogeneous systems can provide substantial performance for massively parallel applications, much of the available computing resources are often under-utilized due to inefficient application mapping, load balancing, and tuning. While several performance prediction models exist to efficiently tune applications, they often require significant computing architecture knowledge for reliable prediction. In addition, they do not address multiple levels of design space abstraction and it is often difficult to choose a reliable prediction model for a given design. In this research, we develop a multi-level suite of performance prediction models for heterogeneous systems that primarily targets Synchronous Iterative Algorithms (SIAs). The modeling suite aims to produce accurate and straightforward application runtime prediction prior to the actual large-scale implementation. This suite addresses two levels of system abstraction: 1) low-level where partial knowledge of the application implementation is present along with the system specifications and 2) high-level where the implementation details are minimum and only high-level computing system specifications are given. The performance prediction modeling suite is developed using our proposed Synchronous Iterative GPGPU Execution (SIGE) model for GPGPU clusters, motivated by the RC Amenability Test for Scalable Systems (RATSS) model for FPGA clusters. The low-level abstraction for GPGPU clusters consists of a regression-based performance prediction framework that statistically abstracts system architecture characteristics, enabling performance prediction without detailed architecture knowledge. In this framework, the overall execution time of an application is predicted using regression models developed for host-device computations and network-level communications performed in the algorithm. We have used a family of Spiking Neural Network (SNN) models and an Anisotropic Diffusion Filter (ADF) algorithm as SIA case studies for verification of the regression-based framework and achieved over 90% prediction accuracy compared to the actual implementations for several GPGPU cluster configurations tested. The results establish the adequacy of the low-level abstraction model for advanced, fine-grained performance prediction and design space exploration (DSE). The high-level abstraction consists of the following two primary modeling approaches: qualitative modeling that uses existing subjective-analytical models for computation and communication; and quantitative modeling that predicts computation and communication performance by measuring hardware events associated with objective-analytical models using micro-benchmarks. The performance prediction provided by the high-level abstraction approaches, albeit coarse-grained, delivers useful insight into application performance on the chosen heterogeneous system. A blend of the two high-level modeling approaches, labeled as hybrid modeling, is explored for insightful preliminary performance prediction. The performance prediction models in the multi-level suite are verified and compared for their accuracy and ease-of-use, allowing developers to choose a model that best satisfies their design space abstraction. We also construct a roadmap that guides user from optimal Application-to-Accelerator (A2A) mapping to fine-grained performance prediction, thereby providing a hierarchical approach to optimal application porting on the target heterogeneous system. The end goal of this dissertation research is to offer the HPC community a thorough, non-architecture specific, performance prediction framework in the form of a hierarchical modeling suite that enables them to optimally utilize the heterogeneous resources
Verifying a Systematic Application to Accelerator Roadmap using Shallow Water Wave Equations
With the advent of parallel computing, a number of hardware architectures have become available for data parallel applications. Every architecture is unique with respect to characteristics such as floating point operations per second, memory bandwidth and synchronization costs. Data parallel applications possess inherent parallelism that needs to be studied and the hardware that can best exploit this parallelism can be identified and selected for large-scale implementation. The application that I have considered for my thesis is - numerical solution of shallow water wave equations using finite difference method. These equations are a set of partial differential equations that model the propagation of disturbances in water and other incompressible liquids. This application fits in the category of a Synchronous Iterative Algorithm (SIA) and hence, the Synchronous Iterative GPGPU Execution (SIGE) model can be directly applied for performance modeling. In the high performance computing community, Graphical Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs) have become highly popular architectures. Homogeneous clusters comprising of multiple processors and heterogeneous clusters that have nodes consisting of both CPU and GPU, are the architectures of interest for this thesis. An initial or high level comparison between the two architectures is performed with regards to the chosen application using a technique known as the Initial Application to Accelerator (A2A) mapping which ranks which architecture delivers the best performance with respect to execution time for large scale implementation. The subsequent part of the thesis will focus on a low level abstraction of the application of interest to accurately predict the runtime using the multi-level SIGE performance-modeling suite. Through this abstraction, performance modeling of the computation and communication portion of the application is undertaken. The behavior of the computation and communication portions is captured through several instrumented iterations of the application and regression analysis is performed on the execution times. The predicted run time is the sum of the computation and communication run time predictions and is validated by executing the application at higher data sizes. The thesis concludes with the pros and cons of applying the A2A fitness model and the low level abstraction for run time prediction to the chosen application. A critique of the SIGE model is presented and a Strength, Weakness, Opportunities (SWO) analysis is presented
Statistical Regression Methods for GPGPU Design Space Exploration
General Purpose Graphics Processing Units (GPGPUs) have leveraged the performance and power efficiency of today\u27s heterogeneous systems to usher in a new era of innovation in high-performance scientific computing. These systems can offer significantly high performance for massively parallel applications; however, their resources may be wasted due to inefficient tuning strategies. Previous application tuning studies pre-dominantly employ low-level, architecture specific tuning which can make the performance modeling task difficult and less generic. In this research, we explore the GPGPU design space featuring the memory hierarchy for application tuning using regression-based performance prediction framework and rank the design space based on the runtime performance. The regression-based framework models the GPGPU device computations using algorithm characteristics such as the number of floating-point operations, total number of bytes, and hardware parameters pertaining to the GPGPU memory hierarchy as predictor variables. The computation component regression models are developed using several instrumented executions of the algorithms that include a range of FLOPS-to-Byte requirement. We validate our model with a Synchronous Iterative Algorithm (SIA) set that includes Spiking Neural Networks (SNNs) and Anisotropic Diffusion Filtering (ADF) for massive images. The highly parallel nature of the above mentioned algorithms, in addition to their wide range of communication-to-computation complexities, makes them good candidates for this study. A hierarchy of implementations for the SNNs and ADF is constructed and ranked using the regression-based framework. We further illustrate the Synchronous Iterative GPGPU Execution (SIGE) model on the GPGPU-augmented Palmetto Cluster. The performance prediction framework maps appropriate design space implementation for 4 out of 5 case studies used in this research. The final goal of this research is to establish the efficacy of the regression-based framework to accurately predict the application kernel runtime, allowing developers to correctly rank their design space prior to the large-scale implementation
Doctor of Philosophy
dissertationStochastic methods, dense free-form mapping, atlas construction, and total variation are examples of advanced image processing techniques which are robust but computationally demanding. These algorithms often require a large amount of computational power as well as massive memory bandwidth. These requirements used to be ful lled only by supercomputers. The development of heterogeneous parallel subsystems and computation-specialized devices such as Graphic Processing Units (GPUs) has brought the requisite power to commodity hardware, opening up opportunities for scientists to experiment and evaluate the in uence of these techniques on their research and practical applications. However, harnessing the processing power from modern hardware is challenging. The di fferences between multicore parallel processing systems and conventional models are signi ficant, often requiring algorithms and data structures to be redesigned signi ficantly for efficiency. It also demands in-depth knowledge about modern hardware architectures to optimize these implementations, sometimes on a per-architecture basis. The goal of this dissertation is to introduce a solution for this problem based on a 3D image processing framework, using high performance APIs at the core level to utilize parallel processing power of the GPUs. The design of the framework facilitates an efficient application development process, which does not require scientists to have extensive knowledge about GPU systems, and encourages them to harness this power to solve their computationally challenging problems. To present the development of this framework, four main problems are described, and the solutions are discussed and evaluated: (1) essential components of a general 3D image processing library: data structures and algorithms, as well as how to implement these building blocks on the GPU architecture for optimal performance; (2) an implementation of unbiased atlas construction algorithms|an illustration of how to solve a highly complex and computationally expensive algorithm using this framework; (3) an extension of the framework to account for geometry descriptors to solve registration challenges with large scale shape changes and high intensity-contrast di fferences; and (4) an out-of-core streaming model, which enables developers to implement multi-image processing techniques on commodity hardware
A Survey on Vertical and Horizontal Scaling Platforms for Big Data Analytics
There is no doubt that we are entering the era of big data. The challenge is on how to store, search, and analyze the huge amount of data that is being generated per second. One of the main obstacles to the big data researchers is how to find the appropriate big data analysis platform. The basic aim of this work is to present a complete investigation of all the available platforms for big data analysis in terms of vertical and horizontal scaling, and its compatible framework and applications in detail. Finally, this article will outline some research trends and other open issues in big data analytic
Recommended from our members
Non-parametric Bayesian models for structured output prediction
Structured output prediction is a machine learning tasks in which an input object is not just assigned a single class, as in classification, but multiple, interdependent labels. This means that the presence or value of a given label affects the other labels, for instance in text labelling problems, where output labels are applied to each word, and their interdependencies must be modelled.
Non-parametric Bayesian (NPB) techniques are probabilistic modelling techniques which have the interesting property of allowing model capacity to grow, in a controllable way, with data complexity, while maintaining the advantages of Bayesian modelling. In this thesis, we develop NPB algorithms to solve structured output problems.
We first study a map-reduce implementation of a stochastic inference method designed for the infinite hidden Markov model, applied to a computational linguistics task, part-of-speech tagging. We show that mainstream map-reduce frameworks do not easily support highly iterative algorithms.
The main contribution of this thesis consists in a conceptually novel discriminative model, GPstruct. It is motivated by labelling tasks, and combines attractive properties of conditional random fields (CRF), structured support vector machines, and Gaussian process (GP) classifiers. In probabilistic terms, GPstruct combines a CRF likelihood with a GP prior on factors; it can also be described as a Bayesian kernelized CRF.
To train this model, we develop a Markov chain Monte Carlo algorithm based on elliptical slice sampling and investigate its properties. We then validate it on real data experiments, and explore two topologies: sequence output with text labelling tasks, and grid output with semantic segmentation of images. The latter case poses scalability issues, which are addressed using likelihood approximations and an ensemble method which allows distributed inference and prediction.
The experimental validation demonstrates: (a) the model is flexible and its constituent parts are modular and easy to engineer; (b) predictive performance and, most crucially, the probabilistic calibration of predictions are better than or equal to that of competitor models, and (c) model hyperparameters can be learnt from data
Power System Stability Analysis using Neural Network
This work focuses on the design of modern power system controllers for
automatic voltage regulators (AVR) and the applications of machine learning
(ML) algorithms to correctly classify the stability of the IEEE 14 bus system.
The LQG controller performs the best time domain characteristics compared to
PID and LQG, while the sensor and amplifier gain is changed in a dynamic
passion. After that, the IEEE 14 bus system is modeled, and contingency
scenarios are simulated in the System Modelica Dymola environment. Application
of the Monte Carlo principle with modified Poissons probability distribution
principle is reviewed from the literature that reduces the total contingency
from 1000k to 20k. The damping ratio of the contingency is then extracted,
pre-processed, and fed to ML algorithms, such as logistic regression, support
vector machine, decision trees, random forests, Naive Bayes, and k-nearest
neighbor. A neural network (NN) of one, two, three, five, seven, and ten hidden
layers with 25%, 50%, 75%, and 100% data size is considered to observe and
compare the prediction time, accuracy, precision, and recall value. At lower
data size, 25%, in the neural network with two-hidden layers and a single
hidden layer, the accuracy becomes 95.70% and 97.38%, respectively. Increasing
the hidden layer of NN beyond a second does not increase the overall score and
takes a much longer prediction time; thus could be discarded for similar
analysis. Moreover, when five, seven, and ten hidden layers are used, the F1
score reduces. However, in practical scenarios, where the data set contains
more features and a variety of classes, higher data size is required for NN for
proper training. This research will provide more insight into the damping
ratio-based system stability prediction with traditional ML algorithms and
neural networks.Comment: Masters Thesis Dissertatio
Fundamentals
Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters
- …