21,256 research outputs found
A MapReduce-based rotation forest classifier for epileptic seizure prediction
In this era, big data applications including biomedical are becoming
attractive as the data generation and storage is increased in the last years.
The big data processing to extract knowledge becomes challenging since the data
mining techniques are not adapted to the new requirements. In this study, we
analyse the EEG signals for epileptic seizure detection in the big data
scenario using Rotation Forest classifier. Specifically, MSPCA is used for
denoising, WPD is used for feature extraction and Rotation Forest is used for
classification in a MapReduce framework to correctly predict the epileptic
seizure. This paper presents a MapReduce-based distributed ensemble algorithm
for epileptic seizure prediction and trains a Rotation Forest on each dataset
in parallel using a cluster of computers. The results of MapReduce based
Rotation Forest show that the proposed framework reduces the training time
significantly while accomplishing a high level of performance in
classifications
Lightweight Task Analysis for Cache-Aware Scheduling on Heterogeneous Clusters
We present a novel characterization of how a program stresses cache. This
characterization permits fast performance prediction in order to simulate and
assist task scheduling on heterogeneous clusters. It is based on the estimation
of stack distance probability distributions. The analysis requires the
observation of a very small subset of memory accesses, and yields a reasonable
to very accurate prediction in constant time.Comment: The paper was originally published in: ISBN #: 1-60132-084-1 (a
two-volume set) Proceedings of the 2008 International Conference on Parallel
and Distributed Processing Techniques and Applications (PDPTA'08) Editors:
Hamid R. Arabnia and Youngsong Mu
OpenCL Performance Prediction using Architecture-Independent Features
OpenCL is an attractive model for heterogeneous high-performance computing
systems, with wide support from hardware vendors and significant performance
portability. To support efficient scheduling on HPC systems it is necessary to
perform accurate performance predictions for OpenCL workloads on varied compute
devices, which is challenging due to diverse computation, communication and
memory access characteristics which result in varying performance between
devices. The Architecture Independent Workload Characterization (AIWC) tool can
be used to characterize OpenCL kernels according to a set of
architecture-independent features. This work presents a methodology where AIWC
features are used to form a model capable of predicting accelerator execution
times. We used this methodology to predict execution times for a set of 37
computational kernels running on 15 different devices representing a broad
range of CPU, GPU and MIC architectures. The predictions are highly accurate,
differing from the measured experimental run-times by an average of only 1.2%,
and correspond to actual execution time mispredictions of 9 {\mu}s to 1 sec
according to problem size. A previously unencountered code can be instrumented
once and the AIWC metrics embedded in the kernel, to allow performance
prediction across the full range of modelled devices. The results suggest that
this methodology supports correct selection of the most appropriate device for
a previously unencountered code, which is highly relevant to the HPC scheduling
setting.Comment: 9 pages, 6 figures, International Workshop on High Performance and
Dynamic Reconfigurable Systems and Networks (DRSN-2018) published in
conjunction with The 2018 International Conference on High Performance
Computing & Simulation (HPCS 2018
Predictive Performance Modeling for Distributed Computing using Black-Box Monitoring and Machine Learning
In many domains, the previous decade was characterized by increasing data
volumes and growing complexity of computational workloads, creating new demands
for highly data-parallel computing in distributed systems. Effective operation
of these systems is challenging when facing uncertainties about the performance
of jobs and tasks under varying resource configurations, e.g., for scheduling
and resource allocation. We survey predictive performance modeling (PPM)
approaches to estimate performance metrics such as execution duration, required
memory or wait times of future jobs and tasks based on past performance
observations. We focus on non-intrusive methods, i.e., methods that can be
applied to any workload without modification, since the workload is usually a
black-box from the perspective of the systems managing the computational
infrastructure. We classify and compare sources of performance variation,
predicted performance metrics, required training data, use cases, and the
underlying prediction techniques. We conclude by identifying several open
problems and pressing research needs in the field.Comment: 19 pages, 3 figures, 5 table
A Loop-Based Methodology for Reducing Computational Redundancy in Workload Sets
The design of general purpose processors relies heavily on a workload
gathering step in which representative programs are collected from various
application domains. Processor performance, when running the workload set, is
profiled using simulators that model the targeted processor architecture.
However, simulating the entire workload set is prohibitively time-consuming,
which precludes considering a large number of programs. To reduce simulation
time, several techniques in the literature have exploited the internal program
repetitiveness to extract and execute only representative code segments.
Existing so- lutions are based on reducing cross-program computational
redundancy or on eliminating internal-program redundancy to decrease execution
time. In this work, we propose an orthogonal and complementary loop- centric
methodology that targets loop-dominant programs by exploiting internal-program
characteristics to reduce cross-program computational redundancy. The approach
employs a newly developed framework that extracts and analyzes core loops
within workloads. The collected characteristics model memory behavior,
computational complexity, and data structures of a program, and are used to
construct a signature vector for each program. From these vectors,
cross-workload similarity metrics are extracted, which are processed by a novel
heuristic to exclude similar programs and reduce redundancy within the set.
Finally, a reverse engineering approach that synthesizes executable
micro-benchmarks having the same instruction mix as the loops in the original
workload is introduced. A tool that automates the flow steps of the proposed
methodology is developed. Simulation results demonstrate that applying the
proposed methodology to a set of workloads reduces the set size by half, while
preserving the main characterizations of the initial workloads
A general guide to applying machine learning to computer architecture
The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results.
The purpose of this paper is to serve as a foundational base and guide to future computer
architecture research seeking to make use of machine learning models for improving system efficiency.
We describe a method that highlights when, why, and how to utilize machine learning
models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data
generation every execution quantum and parameter engineering. This is followed by a survey of a
set of popular machine learning models. We discuss their strengths and weaknesses and provide
an evaluation of implementations for the purpose of creating a workload performance predictor
for different core types in an x86 processor. The predictions can then be exploited by a scheduler
for heterogeneous processors to improve the system throughput. The algorithms of focus are
stochastic gradient descent based linear regression, decision trees, random forests, artificial neural
networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version
Analytical Cost Metrics : Days of Future Past
As we move towards the exascale era, the new architectures must be capable of
running the massive computational problems efficiently. Scientists and
researchers are continuously investing in tuning the performance of
extreme-scale computational problems. These problems arise in almost all areas
of computing, ranging from big data analytics, artificial intelligence, search,
machine learning, virtual/augmented reality, computer vision, image/signal
processing to computational science and bioinformatics. With Moore's law
driving the evolution of hardware platforms towards exascale, the dominant
performance metric (time efficiency) has now expanded to also incorporate
power/energy efficiency. Therefore, the major challenge that we face in
computing systems research is: "how to solve massive-scale computational
problems in the most time/power/energy efficient manner?"
The architectures are constantly evolving making the current performance
optimizing strategies less applicable and new strategies to be invented. The
solution is for the new architectures, new programming models, and applications
to go forward together. Doing this is, however, extremely hard. There are too
many design choices in too many dimensions. We propose the following strategy
to solve the problem: (i) Models - Develop accurate analytical models (e.g.
execution time, energy, silicon area) to predict the cost of executing a given
program, and (ii) Complete System Design - Simultaneously optimize all the cost
models for the programs (computational problems) to obtain the most
time/area/power/energy efficient solution. Such an optimization problem evokes
the notion of codesign
Recommended from our members
Privacy-preserving model learning on a blockchain network-of-networks.
ObjectiveTo facilitate clinical/genomic/biomedical research, constructing generalizable predictive models using cross-institutional methods while protecting privacy is imperative. However, state-of-the-art methods assume a "flattened" topology, while real-world research networks may consist of "network-of-networks" which can imply practical issues including training on small data for rare diseases/conditions, prioritizing locally trained models, and maintaining models for each level of the hierarchy. In this study, we focus on developing a hierarchical approach to inherit the benefits of the privacy-preserving methods, retain the advantages of adopting blockchain, and address practical concerns on a research network-of-networks.Materials and methodsWe propose a framework to combine level-wise model learning, blockchain-based model dissemination, and a novel hierarchical consensus algorithm for model ensemble. We developed an example implementation HierarchicalChain (hierarchical privacy-preserving modeling on blockchain), evaluated it on 3 healthcare/genomic datasets, as well as compared its predictive correctness, learning iteration, and execution time with a state-of-the-art method designed for flattened network topology.ResultsHierarchicalChain improves the predictive correctness for small training datasets and provides comparable correctness results with the competing method with higher learning iteration and similar per-iteration execution time, inherits the benefits of the privacy-preserving learning and advantages of blockchain technology, and immutable records models for each level.DiscussionHierarchicalChain is independent of the core privacy-preserving learning method, as well as of the underlying blockchain platform. Further studies are warranted for various types of network topology, complex data, and privacy concerns.ConclusionWe demonstrated the potential of utilizing the information from the hierarchical network-of-networks topology to improve prediction
Cloud engineering is search based software engineering too
Many of the problems posed by the migration of computation to cloud platforms can be formulated and solved using techniques associated with Search Based Software Engineering (SBSE). Much of cloud software engineering involves problems of optimisation: performance, allocation, assignment and the dynamic balancing of resources to achieve pragmatic trade-offs between many competing technical and business objectives. SBSE is concerned with the application of computational search and optimisation to solve precisely these kinds of software engineering challenges. Interest in both cloud computing and SBSE has grown rapidly in the past five years, yet there has been little work on SBSE as a means of addressing cloud computing challenges. Like many computationally demanding activities, SBSE has the potential to benefit from the cloud; ‘SBSE in the cloud’. However, this paper focuses, instead, of the ways in which SBSE can benefit cloud computing. It thus develops the theme of ‘SBSE for the cloud’, formulating cloud computing challenges in ways that can be addressed using SBSE
Improving GPU-accelerated Adaptive IDW Interpolation Algorithm Using Fast kNN Search
This paper presents an efficient parallel Adaptive Inverse Distance Weighting
(AIDW) interpolation algorithm on modern Graphics Processing Unit (GPU). The
presented algorithm is an improvement of our previous GPU-accelerated AIDW
algorithm by adopting fast k-Nearest Neighbors (kNN) search. In AIDW, it needs
to find several nearest neighboring data points for each interpolated point to
adaptively determine the power parameter; and then the desired prediction value
of the interpolated point is obtained by weighted interpolating using the power
parameter. In this work, we develop a fast kNN search approach based on the
space-partitioning data structure, even grid, to improve the previous
GPU-accelerated AIDW algorithm. The improved algorithm is composed of the
stages of kNN search and weighted interpolating. To evaluate the performance of
the improved algorithm, we perform five groups of experimental tests.
Experimental results show that: (1) the improved algorithm can achieve a
speedup of up to 1017 over the corresponding serial algorithm; (2) the improved
algorithm is at least two times faster than our previous GPU-accelerated AIDW
algorithm; and (3) the utilization of fast kNN search can significantly improve
the computational efficiency of the entire GPU-accelerated AIDW algorithm.Comment: Submitted manuscript. 9 Figures, 3 Table
- …