Search CORE

1,304 research outputs found

A general guide to applying machine learning to computer architecture

Author: Arkose Tugberk
Cristal Kestelman Adrián
Markovic Nikola
Nemirovsky Daniel
Nemirovsky Mario
Unsal Osman Sabri
Valero Cortés Mateo
Publication venue: 'FSAEIHE South Ural State University (National Research University)'
Publication date: 01/01/2018
Field of study

The resurgence of machine learning since the late 1990s has been enabled by significant advances in computing performance and the growth of big data. The ability of these algorithms to detect complex patterns in data which are extremely difficult to achieve manually, helps to produce effective predictive models. Whilst computer architects have been accelerating the performance of machine learning algorithms with GPUs and custom hardware, there have been few implementations leveraging these algorithms to improve the computer system performance. The work that has been conducted, however, has produced considerably promising results. The purpose of this paper is to serve as a foundational base and guide to future computer architecture research seeking to make use of machine learning models for improving system efficiency. We describe a method that highlights when, why, and how to utilize machine learning models for improving system performance and provide a relevant example showcasing the effectiveness of applying machine learning in computer architecture. We describe a process of data generation every execution quantum and parameter engineering. This is followed by a survey of a set of popular machine learning models. We discuss their strengths and weaknesses and provide an evaluation of implementations for the purpose of creating a workload performance predictor for different core types in an x86 processor. The predictions can then be exploited by a scheduler for heterogeneous processors to improve the system throughput. The algorithms of focus are stochastic gradient descent based linear regression, decision trees, random forests, artificial neural networks, and k-nearest neighbors.This work has been supported by the European Research Council (ERC) Advanced Grant RoMoL (Grant Agreemnt 321253) and by the Spanish Ministry of Science and Innovation (contract TIN 2015-65316P).Peer ReviewedPostprint (published version

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

An imperialist competitive algorithm for a bi-objective parallel machine scheduling problem with load balancing consideration

Author: Ehsan Ghobadian
Hassan Irani Tekmehdash
Mahdi Naderi-Beni
Mansooreh Madani-Isfahani
Reza Tavakkoli-Moghaddam
Publication venue: 'Growing Science'
Publication date: 01/04/2013
Field of study

In this paper, we present a new Imperialist Competitive Algorithm (ICA) to solve a bi-objective scheduling of parallel-unrelated machines where setup times are sequence dependent. The objectives include mean completion tasks and mean squares of deviations from machines workload from their averages. The performance of the proposed ICA (PICA) method is examined using some randomly generated data and they are compared with three alternative methods including particle swarm optimization (PSO), original version of imperialist competitive algorithm (OICA) and genetic algorithm (GA) in terms of the objective function values. The preliminary results indicate that the proposed study outperforms other alternative methods. In addition, while OICA performs the worst as alternative solution strategy, PSO and GA seem to perform better

Directory of Open Access Journals

An imperialist competitive algorithm for a bi-objective parallel machine scheduling problem with load balancing consideration

Author: Ehsan Ghobadian
Hassan Irani Tekmehdash
Mahdi Naderi-Beni
Mansooreh Madani-Isfahani
Reza Tavakkoli-Moghaddam
Publication venue
Publication date: 01/01/2013
Field of study

In this paper, we present a new Imperialist Competitive Algorithm (ICA) to solve a bi-objective unrelated parallel machine scheduling problem where setup times are sequence dependent. The objectives include mean completion time of jobs and mean squares of deviations from machines workload from their averages. The performance of the proposed ICA (PICA) method is examined using some randomly generated data and they are compared with three alternative methods including particle swarm optimization (PSO), original version of imperialist competitive algorithm (OICA) and genetic algorithm (GA) in terms of the objective function values. The preliminary results indicate that the proposed study outperforms other alternative methods. In addition, while OICA performs the worst as alternative solution strategy, PSO and GA seem to perform better

CiteSeerX

Power-Performance Modeling and Adaptive Management of Heterogeneous Mobile Platforms

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Nearly 60% of the world population uses a mobile phone, which is typically powered by a system-on-chip (SoC). While the mobile platform capabilities range widely, responsiveness, long battery life and reliability are common design concerns that are crucial to remain competitive. Consequently, state-of-the-art mobile platforms have become highly heterogeneous by combining a powerful SoC with numerous other resources, including display, memory, power management IC, battery and wireless modems. Furthermore, the SoC itself is a heterogeneous resource that integrates many processing elements, such as CPU cores, GPU, video, image, and audio processors. Therefore, CPU cores do not dominate the platform power consumption under many application scenarios. Competitive performance requires higher operating frequency, and leads to larger power consumption. In turn, power consumption increases the junction and skin temperatures, which have adverse effects on the device reliability and user experience. As a result, allocating the power budget among the major platform resources and temperature control have become fundamental consideration for mobile platforms. Dynamic thermal and power management algorithms address this problem by putting a subset of the processing elements or shared resources to sleep states, or throttling their frequencies. However, an adhoc approach could easily cripple the performance, if it slows down the performance-critical processing element. Furthermore, mobile platforms run a wide range of applications with time varying workload characteristics, unlike early generations, which supported only limited functionality. As a result, there is a need for adaptive power and performance management approaches that consider the platform as a whole, rather than focusing on a subset. Towards this need, our specific contributions include (a) a framework to dynamically select the Pareto-optimal frequency and active cores for the heterogeneous CPUs, such as ARM big.Little architecture, (b) a dynamic power budgeting approach for allocating optimal power consumption to the CPU and GPU using performance sensitivity models for each PE, (c) an adaptive GPU frame time sensitivity prediction model to aid power management algorithms, and (d) an online learning algorithm that constructs adaptive run-time models for non-stationary workloads.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository

Mechanistic modeling of architectural vulnerability factor

Author: Chen Jian
Eeckhout Lieven
Eyerman Stijn
John Lizy
Nair Arun
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2015
Field of study

Reliability to soft errors is a significant design challenge in modern microprocessors owing to an exponential increase in the number of transistors on chip and the reduction in operating voltages with each process generation. Architectural Vulnerability Factor (AVF) modeling using microarchitectural simulators enables architects to make informed performance, power, and reliability tradeoffs. However, such simulators are time-consuming and do not reveal the microarchitectural mechanisms that influence AVF. In this article, we present an accurate first-order mechanistic analytical model to compute AVF, developed using the first principles of an out-of-order superscalar execution. This model provides insight into the fundamental interactions between the workload and microarchitecture that together influence AVF. We use the model to perform design space exploration, parametric sweeps, and workload characterization for AVF

Ghent University Academic Bibliography

User-aware performance evaluation and optimization of parallel job schedulers

Author: Schlagkamp Stephan
Publication venue
Publication date
Field of study

Die Dissertation User-Aware Performance Evaluation and Optimization of Parallel Job Schedulers beschäftigt sich mit der realitätsnahen, dynamischen Simulation und Optimierung von Lastsituationen in parallelen Rechensystemen unter Berücksichtigung von Feedback-Effekten zwischen Performance und Nutzerverhalten. Die Besonderheit solcher Systeme liegt in der geteilten Nutzung durch mehrere Anwender, was zu einer Einschr änkung der Verfügbarkeit von Ressourcen führt. Sollten nicht alle Rechenanfragen, die sogenannten Jobs, gleichzeitig ausgeführt werden können, werden diese in Warteschlangen zwischengespeichert. Da das Verhalten der Nutzer nicht genau bekannt ist, entsteht eine große Unsicherheit bezüglich zukünftiger Lastsituationen. Ziel ist es, Methoden zu finden, die eine Ressourcenzuweisung erzeugt, die den Zielvorstellungen der Nutzer soweit wie möglich entspricht und diese Methoden realistisch zu evaluieren. Dabei ist auch zu berücksichtigen, dass das Nutzerverhalten und die Zielvorstellungen der Nutzer in Abhängigkeit von der Lastsituation und der Ressourcenzuweisung variieren. Es wird ein dreigliedriger Forschungsansatz gewählt: Analyse von Nutzerverhalten unter Ressourcenbeschränkung: In Traces von parallelen Rechensystem zeigt sich, dass die Wartezeit auf Rechenergebnisse mit dem zukünftigen Nutzerverhalten korreliert, d.h. dass es im Durchschnitt länger dauert bis ein Nutzer, der lange warten musste, erneut das System nutzt. Im Rahmen des Promotionsprojekts wurde diese Analyse fortgesetzt und zusätzliche Korrelationen zwischen weiteren Systemparametern (neben der Wartezeit) und Nutzerverhalten aufgedeckt. Des Weiteren wurden Funktionen zur Zufriedenheit und Reaktion von Nutzern auf variierende Antwortzeiten von Rechensystemen entwickelt. Diese Ergebnisse wurden durch eine Umfrage unter Nutzern von Parallelrechner an der TU Dortmund erzielt, für die ein spezieller Frageboden entwickelt wurde. Modellierung von Nutzerverhalten und Feedback Wegen des dynamischen Zusammenhangs zwischen Systemgeschwindigkeit und Nutzerverhalten ist es nötig, Zuweisungsstrategien in dynamischen, feedback-getriebenen Simulationen zu evaluieren. Hierzu wurde ein mehrstufiges Nutzermodell entwickelt, welches die aktuellen Annahmen an Nutzerverhalten beinhaltet und das zukünftige Hinzufügen von zusätzlichen Verhaltenskomponenten ermöglicht. Die Kernelemente umfassen bisher Modelle für den Tages- und Nachtrhythmus, den Arbeitsrhythmus und Eigenschaften der submittierten Jobs. Das dynamische Feedback ist derart gestaltet, dass erst die Fertigstellung von bestimmten Jobs die zukünftige Jobeinreichung auslöst. Optimierung der Allokationsstrategien zur Steigerung der Nutzerzufriedenheit Die mit Hilfe des Fragebogens entwickelte Wartezeitakzeptanz von Nutzern ist durch ein MILP optimiert worden. Das MILP sucht nach Lösungen, die möglichst viele Jobs innerhalb eines akzeptierten Wartezeitfensters startet bzw. die Summe der Verspätungen minimiert. Durch die Komplexität dieses Optimierungsalgorithmus besteht die Evaluation bisher nur auf fixierten, statischen Szenarien, die Abbilder bestimmter System- und Warteschlangenzustände abbilden. Deswegen ist weiterhin geplant, Schedulingverfahren zur Steigerung der Anzahl an eingereichten Jobs und der Wartezeitzufriedenheit mit Hilfe des dynamischen Modells zu evaluieren

Eldorado - Ressourcen aus und für Lehre, Studium und Forschung

High precision simulations of weak lensing effect on Cosmic Microwave Background polarization

Author: Fabbian Giulio
Stompor Radek
Publication venue: 'EDP Sciences'
Publication date: 01/01/2013
Field of study

We study accuracy, robustness and self-consistency of pixel-domain simulations of the gravitational lensing effect on the primordial CMB anisotropies due to the large-scale structure of the Universe. In particular, we investigate dependence of the results precision on some crucial parameters of such techniques and propose a semi-analytic framework to determine their values so the required precision is a priori assured and the numerical workload simultaneously optimized. Our focus is on the B-mode signal but we discuss also other CMB observables, such as total intensity, T, and E-mode polarization, emphasizing differences and similarities between all these cases. Our semi-analytic considerations are backed up by extensive numerical results. Those are obtained using a code, nicknamed lenS2HAT -- for Lensing using Scalable Spherical Harmonic Transforms (S2HAT) -- which we have developed in the course of this work. The code implements a version of the pixel-domain approach of Lewis (2005) and permits performing the simulations at very high resolutions and data volumes, thanks to its efficient parallelization provided by the S2HAT library -- a parallel library for a calculation of the spherical harmonic transforms. The code is made publicly available.Comment: 20 pages, 14 figures, submitted to A&A, matches version accepted for publication in A&

arXiv.org e-Print Archive

HAL-IN2P3

EDP Sciences OAI-PMH repository (1.2.0)

GAMER: a GPU-Accelerated Adaptive Mesh Refinement Code for Astrophysics

Author: Aubert
Bagla
Bryan
Campbell
Collins
Frigo
Fryxell
Gingold
Godunov
Hallman
Hockney
Hsi-Yu Schive
Klypin
Kravtsov
Landau
Martin
NVIDIA
O'Shea
Pen
Press
Ricker
Tzihong Chiueh
Woo
Yu-Chih Tsai
Publication venue: 'IOP Publishing'
Publication date: 24/12/2009
Field of study

We present the newly developed code, GAMER (GPU-accelerated Adaptive MEsh Refinement code), which has adopted a novel approach to improve the performance of adaptive mesh refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing TVD scheme for the hydrodynamic solver, and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is made to diminish by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely-baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using 1 GPU with 4096^3 effective resolution and 16 GPUs with 8192^3 effective resolution, respectively.Comment: 60 pages, 22 figures, 3 tables. More accuracy tests are included. Accepted for publication in ApJ

arXiv.org e-Print Archive

CiteSeerX

Crossref

National Taiwan University Repository