21,256 research outputs found

    A MapReduce-based rotation forest classifier for epileptic seizure prediction

    Full text link
    In this era, big data applications including biomedical are becoming attractive as the data generation and storage is increased in the last years. The big data processing to extract knowledge becomes challenging since the data mining techniques are not adapted to the new requirements. In this study, we analyse the EEG signals for epileptic seizure detection in the big data scenario using Rotation Forest classifier. Specifically, MSPCA is used for denoising, WPD is used for feature extraction and Rotation Forest is used for classification in a MapReduce framework to correctly predict the epileptic seizure. This paper presents a MapReduce-based distributed ensemble algorithm for epileptic seizure prediction and trains a Rotation Forest on each dataset in parallel using a cluster of computers. The results of MapReduce based Rotation Forest show that the proposed framework reduces the training time significantly while accomplishing a high level of performance in classifications

    Lightweight Task Analysis for Cache-Aware Scheduling on Heterogeneous Clusters

    Full text link
    We present a novel characterization of how a program stresses cache. This characterization permits fast performance prediction in order to simulate and assist task scheduling on heterogeneous clusters. It is based on the estimation of stack distance probability distributions. The analysis requires the observation of a very small subset of memory accesses, and yields a reasonable to very accurate prediction in constant time.Comment: The paper was originally published in: ISBN #: 1-60132-084-1 (a two-volume set) Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'08) Editors: Hamid R. Arabnia and Youngsong Mu

    OpenCL Performance Prediction using Architecture-Independent Features

    Full text link
    OpenCL is an attractive model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform accurate performance predictions for OpenCL workloads on varied compute devices, which is challenging due to diverse computation, communication and memory access characteristics which result in varying performance between devices. The Architecture Independent Workload Characterization (AIWC) tool can be used to characterize OpenCL kernels according to a set of architecture-independent features. This work presents a methodology where AIWC features are used to form a model capable of predicting accelerator execution times. We used this methodology to predict execution times for a set of 37 computational kernels running on 15 different devices representing a broad range of CPU, GPU and MIC architectures. The predictions are highly accurate, differing from the measured experimental run-times by an average of only 1.2%, and correspond to actual execution time mispredictions of 9 {\mu}s to 1 sec according to problem size. A previously unencountered code can be instrumented once and the AIWC metrics embedded in the kernel, to allow performance prediction across the full range of modelled devices. The results suggest that this methodology supports correct selection of the most appropriate device for a previously unencountered code, which is highly relevant to the HPC scheduling setting.Comment: 9 pages, 6 figures, International Workshop on High Performance and Dynamic Reconfigurable Systems and Networks (DRSN-2018) published in conjunction with The 2018 International Conference on High Performance Computing & Simulation (HPCS 2018

    Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning

    Full text link
    In many domains, the previous decade was characterized by increasing data volumes and growing complexity of computational workloads, creating new demands for highly data-parallel computing in distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e.g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i.e., methods that can be applied to any workload without modification, since the workload is usually a black-box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field.Comment: 19 pages, 3 figures, 5 table

    A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

    Full text link
    The design of general purpose processors relies heavily on a workload gathering step in which representative programs are collected from various application domains. Processor performance, when running the workload set, is profiled using simulators that model the targeted processor architecture. However, simulating the entire workload set is prohibitively time-consuming, which precludes considering a large number of programs. To reduce simulation time, several techniques in the literature have exploited the internal program repetitiveness to extract and execute only representative code segments. Existing so- lutions are based on reducing cross-program computational redundancy or on eliminating internal-program redundancy to decrease execution time. In this work, we propose an orthogonal and complementary loop- centric methodology that targets loop-dominant programs by exploiting internal-program characteristics to reduce cross-program computational redundancy. The approach employs a newly developed framework that extracts and analyzes core loops within workloads. The collected characteristics model memory behavior, computational complexity, and data structures of a program, and are used to construct a signature vector for each program. From these vectors, cross-workload similarity metrics are extracted, which are processed by a novel heuristic to exclude similar programs and reduce redundancy within the set. Finally, a reverse engineering approach that synthesizes executable micro-benchmarks having the same instruction mix as the loops in the original workload is introduced. A tool that automates the flow steps of the proposed methodology is developed. Simulation results demonstrate that applying the proposed methodology to a set of workloads reduces the set size by half, while preserving the main characterizations of the initial workloads

    A general guide to applying machine learning to computer architecture

    Get PDF
    The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version

    Analytical Cost Metrics : Days of Future Past

    Full text link
    As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore, the major challenge that we face in computing systems research is: "how to solve massive-scale computational problems in the most time/power/energy efficient manner?" The architectures are constantly evolving making the current performance optimizing strategies less applicable and new strategies to be invented. The solution is for the new architectures, new programming models, and applications to go forward together. Doing this is, however, extremely hard. There are too many design choices in too many dimensions. We propose the following strategy to solve the problem: (i) Models - Develop accurate analytical models (e.g. execution time, energy, silicon area) to predict the cost of executing a given program, and (ii) Complete System Design - Simultaneously optimize all the cost models for the programs (computational problems) to obtain the most time/area/power/energy efficient solution. Such an optimization problem evokes the notion of codesign

    Cloud engineering is search based software engineering too

    Get PDF
    Many of the problems posed by the migration of computation to cloud platforms can be formulated and solved using techniques associated with Search Based Software Engineering (SBSE). Much of cloud software engineering involves problems of optimisation: performance, allocation, assignment and the dynamic balancing of resources to achieve pragmatic trade-offs between many competing technical and business objectives. SBSE is concerned with the application of computational search and optimisation to solve precisely these kinds of software engineering challenges. Interest in both cloud computing and SBSE has grown rapidly in the past five years, yet there has been little work on SBSE as a means of addressing cloud computing challenges. Like many computationally demanding activities, SBSE has the potential to benefit from the cloud; ‘SBSE in the cloud’. However, this paper focuses, instead, of the ways in which SBSE can benefit cloud computing. It thus develops the theme of ‘SBSE for the cloud’, formulating cloud computing challenges in ways that can be addressed using SBSE

    Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search

    Full text link
    This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-Nearest Neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. Experimental results show that: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.Comment: Submitted manuscript. 9 Figures, 3 Table
    • …
    corecore