Search CORE

21,256 research outputs found

A MapReduce-based rotation forest classifier for epileptic seizure prediction

Author: Jukic Samed
Subasi Abdulhamit
Publication venue
Publication date: 17/12/2017
Field of study

In this era, big data applications including biomedical are becoming attractive as the data generation and storage is increased in the last years. The big data processing to extract knowledge becomes challenging since the data mining techniques are not adapted to the new requirements. In this study, we analyse the EEG signals for epileptic seizure detection in the big data scenario using Rotation Forest classifier. Specifically, MSPCA is used for denoising, WPD is used for feature extraction and Rotation Forest is used for classification in a MapReduce framework to correctly predict the epileptic seizure. This paper presents a MapReduce-based distributed ensemble algorithm for epileptic seizure prediction and trains a Rotation Forest on each dataset in parallel using a cluster of computers. The results of MapReduce based Rotation Forest show that the proposed framework reduces the training time significantly while accomplishing a high level of performance in classifications

arXiv.org e-Print Archive

Lightweight Task Analysis for Cache-Aware Scheduling on Heterogeneous Clusters

Author: Grehant Xavier
Jarp Sverre
Publication venue
Publication date: 27/02/2009
Field of study

We present a novel characterization of how a program stresses cache. This characterization permits fast performance prediction in order to simulate and assist task scheduling on heterogeneous clusters. It is based on the estimation of stack distance probability distributions. The analysis requires the observation of a very small subset of memory accesses, and yields a reasonable to very accurate prediction in constant time.Comment: The paper was originally published in: ISBN #: 1-60132-084-1 (a two-volume set) Proceedings of the 2008 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'08) Editors: Hamid R. Arabnia and Youngsong Mu

arXiv.org e-Print Archive

OpenCL Performance Prediction using Architecture-Independent Features

Author: Falzon Greg
Johnston Beau
Milthorpe Josh
Publication venue
Publication date: 31/10/2018
Field of study

OpenCL is an attractive model for heterogeneous high-performance computing systems, with wide support from hardware vendors and significant performance portability. To support efficient scheduling on HPC systems it is necessary to perform accurate performance predictions for OpenCL workloads on varied compute devices, which is challenging due to diverse computation, communication and memory access characteristics which result in varying performance between devices. The Architecture Independent Workload Characterization (AIWC) tool can be used to characterize OpenCL kernels according to a set of architecture-independent features. This work presents a methodology where AIWC features are used to form a model capable of predicting accelerator execution times. We used this methodology to predict execution times for a set of 37 computational kernels running on 15 different devices representing a broad range of CPU, GPU and MIC architectures. The predictions are highly accurate, differing from the measured experimental run-times by an average of only 1.2%, and correspond to actual execution time mispredictions of 9 {\mu}s to 1 sec according to problem size. A previously unencountered code can be instrumented once and the AIWC metrics embedded in the kernel, to allow performance prediction across the full range of modelled devices. The results suggest that this methodology supports correct selection of the most appropriate device for a previously unencountered code, which is highly relevant to the HPC scheduling setting.Comment: 9 pages, 6 figures, International Workshop on High Performance and Dynamic Reconfigurable Systems and Networks (DRSN-2018) published in conjunction with The 2018 International Conference on High Performance Computing & Simulation (HPCS 2018

arXiv.org e-Print Archive

Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning

Author: Bux Marc
Gusew Wladislaw
Leser Ulf
Witt Carl
Publication venue: 'Elsevier BV'
Publication date: 30/05/2018
Field of study

In many domains, the previous decade was characterized by increasing data volumes and growing complexity of computational workloads, creating new demands for highly data-parallel computing in distributed systems. Effective operation of these systems is challenging when facing uncertainties about the performance of jobs and tasks under varying resource configurations, e.g., for scheduling and resource allocation. We survey predictive performance modeling (PPM) approaches to estimate performance metrics such as execution duration, required memory or wait times of future jobs and tasks based on past performance observations. We focus on non-intrusive methods, i.e., methods that can be applied to any workload without modification, since the workload is usually a black-box from the perspective of the systems managing the computational infrastructure. We classify and compare sources of performance variation, predicted performance metrics, required training data, use cases, and the underlying prediction techniques. We conclude by identifying several open problems and pressing research needs in the field.Comment: 19 pages, 3 figures, 5 table

arXiv.org e-Print Archive

A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets

Author: Mansour Mohammad M.
Shaccour Elie M.
Publication venue
Publication date: 30/12/2017
Field of study

The design of general purpose processors relies heavily on a workload gathering step in which representative programs are collected from various application domains. Processor performance, when running the workload set, is profiled using simulators that model the targeted processor architecture. However, simulating the entire workload set is prohibitively time-consuming, which precludes considering a large number of programs. To reduce simulation time, several techniques in the literature have exploited the internal program repetitiveness to extract and execute only representative code segments. Existing so- lutions are based on reducing cross-program computational redundancy or on eliminating internal-program redundancy to decrease execution time. In this work, we propose an orthogonal and complementary loop- centric methodology that targets loop-dominant programs by exploiting internal-program characteristics to reduce cross-program computational redundancy. The approach employs a newly developed framework that extracts and analyzes core loops within workloads. The collected characteristics model memory behavior, computational complexity, and data structures of a program, and are used to construct a signature vector for each program. From these vectors, cross-workload similarity metrics are extracted, which are processed by a novel heuristic to exclude similar programs and reduce redundancy within the set. Finally, a reverse engineering approach that synthesizes executable micro-benchmarks having the same instruction mix as the loops in the original workload is introduced. A tool that automates the flow steps of the proposed methodology is developed. Simulation results demonstrate that applying the proposed methodology to a set of workloads reduces the set size by half, while preserving the main characterizations of the initial workloads

arXiv.org e-Print Archive

A general guide to applying machine learning to computer architecture

Author: Arkose Tugberk
Cristal Kestelman Adrián
Markovic Nikola
Nemirovsky Daniel
Nemirovsky Mario
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2018
Field of study

The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version

Analytical Cost Metrics : Days of Future Past

Author: Djidjev Hristo
Prajapati Nirmal
Rajopadhye Sanjay
Publication venue
Publication date: 05/02/2018
Field of study

As we move towards the exascale era, the new architectures must be capable of running the massive computational problems efficiently. Scientists and researchers are continuously investing in tuning the performance of extreme-scale computational problems. These problems arise in almost all areas of computing, ranging from big data analytics, artificial intelligence, search, machine learning, virtual/augmented reality, computer vision, image/signal processing to computational science and bioinformatics. With Moore's law driving the evolution of hardware platforms towards exascale, the dominant performance metric (time efficiency) has now expanded to also incorporate power/energy efficiency. Therefore, the major challenge that we face in computing systems research is: "how to solve massive-scale computational problems in the most time/power/energy efficient manner?" The architectures are constantly evolving making the current performance optimizing strategies less applicable and new strategies to be invented. The solution is for the new architectures, new programming models, and applications to go forward together. Doing this is, however, extremely hard. There are too many design choices in too many dimensions. We propose the following strategy to solve the problem: (i) Models - Develop accurate analytical models (e.g. execution time, energy, silicon area) to predict the cost of executing a given program, and (ii) Complete System Design - Simultaneously optimize all the cost models for the programs (computational problems) to obtain the most time/area/power/energy efficient solution. Such an optimization problem evokes the notion of codesign

arXiv.org e-Print Archive

Recommended from our members

Privacy-preserving model learning on a blockchain network-of-networks.

Author: Gabriel Rodney A
Kim Jihoon
Kuo Tsung-Ting
Publication venue: eScholarship, University of California
Publication date: 01/03/2020
Field of study

ObjectiveTo facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.Materials and methodsWe propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.ResultsHierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.DiscussionHierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.ConclusionWe demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction

eScholarship - University of California

Cloud engineering is search based software engineering too

Author: Abadi
Afzal
Afzal
Afzal
Ali
Alshahwan
Arcuri
Armbrust
Barham
Barroso
Beckman
Beloglazov
Ben-Yehuda
Cadar
Calheiros
Carzaniga
Chang
Cheng
Cliff
Cohen
Cooper
Cornford
Darley
De Millo
DeCandia
Dijkstra
Durillo
Emberson
Fan
Fatiregun
Forrest
Fraser
Freitas
Guo
Harman
Harman
Harman
Harman
Harman
Harman
Harman
Harman
Harman
Harman
Harman
Hoare
Hoare
Hoare
Hoste
Jacobs
Jakobovic
Jakobović
Jia
Jones
Justafort
Kirkpatrick
Kliazovich
Koza
Lagar-cavilla
Lakhotia
Lakhotia
Langdon
Le Goues
Le Goues
Lee
Lutz
Madhavapeddy
McMinn
Mell
Mishra
Mitchell
Narzisi
Nurmi
Papazoglou
Rappa
Reese
Rogers
Ryan
Räihä
Silva
Sitthi-Amorn
Sotomayor
Srikantaiah
Stillwell
Viegas
Vishwanath
Voorsluys
Wegener
Weimer
White
White
Whitley
Williams
Yoo
Zhang
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/09/2013
Field of study

Many of the problems posed by the migration of computation to cloud platforms can be formulated and solved using techniques associated with Search Based Software Engineering (SBSE). Much of cloud software engineering involves problems of optimisation: performance, allocation, assignment and the dynamic balancing of resources to achieve pragmatic trade-offs between many competing technical and business objectives. SBSE is concerned with the application of computational search and optimisation to solve precisely these kinds of software engineering challenges. Interest in both cloud computing and SBSE has grown rapidly in the past five years, yet there has been little work on SBSE as a means of addressing cloud computing challenges. Like many computationally demanding activities, SBSE has the potential to benefit from the cloud; ‘SBSE in the cloud’. However, this paper focuses, instead, of the ways in which SBSE can benefit cloud computing. It thus develops the theme of ‘SBSE for the cloud’, formulating cloud computing challenges in ways that can be addressed using SBSE

Enlighten

Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search

Author: Mei Gang
Xu Liangliang
Xu Nengxiong
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/01/2016
Field of study

This paper presents an efficient parallel Adaptive Inverse Distance Weighting (AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The presented algorithm is an improvement of our previous GPU-accelerated AIDW algorithm by adopting fast k-Nearest Neighbors (kNN) search. In AIDW, it needs to find several nearest neighboring data points for each interpolated point to adaptively determine the power parameter; and then the desired prediction value of the interpolated point is obtained by weighted interpolating using the power parameter. In this work, we develop a fast kNN search approach based on the space-partitioning data structure, even grid, to improve the previous GPU-accelerated AIDW algorithm. The improved algorithm is composed of the stages of kNN search and weighted interpolating. To evaluate the performance of the improved algorithm, we perform five groups of experimental tests. Experimental results show that: (1) the improved algorithm can achieve a speedup of up to 1017 over the corresponding serial algorithm; (2) the improved algorithm is at least two times faster than our previous GPU-accelerated AIDW algorithm; and (3) the utilization of fast kNN search can significantly improve the computational efficiency of the entire GPU-accelerated AIDW algorithm.Comment: Submitted manuscript. 9 Figures, 3 Table

arXiv.org e-Print Archive