Search CORE

1,840 research outputs found

RPPM : Rapid Performance Prediction of Multithreaded workloads on multicore processors

Author: Akram Shoaib
De Pestel Sander
Eeckhout Lieven
Van den Steen Sam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Analytical performance modeling is a useful complement to detailed cycle-level simulation to quickly explore the design space in an early design stage. Mechanistic analytical modeling is particularly interesting as it provides deep insight and does not require expensive offline profiling as empirical modeling. Previous work in mechanistic analytical modeling, unfortunately, is limited to single-threaded applications running on single-core processors. This work proposes RPPM, a mechanistic analytical performance model for multi-threaded applications on multicore hardware. RPPM collects microarchitecture-independent characteristics of a multi-threaded workload to predict performance on a previously unseen multicore architecture. The profile needs to be collected only once to predict a range of processor architectures. We evaluate RPPM's accuracy against simulation and report a performance prediction error of 11.2% on average (23% max). We demonstrate RPPM's usefulness for conducting design space exploration experiments as well as for analyzing parallel application performance

Crossref

Ghent University Academic Bibliography

Bringing Real Processorsto Labs

Author: Gómez Requena Crispín
Gómez Requena María Engracia
Sahuquillo Borrás Julio
Publication venue: 'Wiley'
Publication date: 01/09/2015
Field of study

This is the accepted version of the following article: Gómez, C., Gómez, M. E. and Sahuquillo, J. (2015), Bringing real processors to labs. Comput Appl Eng Educ, 23: 724–732. , which has been published in final form at http://dx.doi.org/10.1002/cae.21645The architecture of current processors has experienced great changes in the last years, leading to sophisticated multithreaded multicore processors. The inherent complexity of such processors makes difficult to update processor teaching to include current commercial products, especially at lab sessions where simplistic simulators are usually used. However, instructors are forced to reduce this gap if they want to properly prepare students in this topic. Dealing with these complex concepts at labs does not only help reinforce theoretical concepts but also has a positive effect in the students motivation. This article presents amethodology designed for the study of current microprocessor mechanisms in a gradual way without overwhelming students. The methodology is based on the use of a detailed simulation framework, used both in the academia and in the industry, which accurately models features from current processors. Due to the huge simulator complexity, it is introduced through several learning phases. Qualitative and quantitative results demonstrate that students are able to develop skills in a detailed simulator in a reasonable time period and, at the same time they learn the details of complex architectural mechanisms of commercial microprocessors.Contract grant sponsor: Spanish Government; Contract grant number: TIN2012-38341-C04-01Gómez Requena, C.; Gómez Requena, ME.; Sahuquillo Borrás, J. (2015). Bringing Real Processorsto Labs. Computer Applications in Engineering Education. 23(5):724-732. https://doi.org/10.1002/cae.21645S724732235D. Sanchez C. Kozyrakis ZSim: Fast and accurate microarchitectural simulation of thousand-core systems 2013 475 486U. Rafael J. Sahuquillo S. Petit P. Lopez Multi2Sim: A simulation framework to evaluate multicore-multithreaded processors 2007 62 68Aziz, S. M., Sicard, E., & Ben Dhia, S. (2010). Effective Teaching of the Physical Design of Integrated Circuits Using Educational Tools. IEEE Transactions on Education, 53(4), 517-531. doi:10.1109/te.2009.2031842Dexter, S. L., Anderson, R. E., & Becker, H. J. (1999). Teachers’ Views of Computers as Catalysts for Changes in Their Teaching Practice. Journal of Research on Computing in Education, 31(3), 221-239. doi:10.1080/08886504.1999.10782252Austin, T., Larson, E., & Ernst, D. (2002). SimpleScalar: an infrastructure for computer system modeling. Computer, 35(2), 59-67. doi:10.1109/2.982917T. E. Carlson W. Heirman L. Eeckhout Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation 2011 52http://www.multi2sim.orgS. Woo M. Ohara E. Torrie J. Singh A. Gupta The Splash-2 programs: Characterization and methodological considerations 1995 24 3

RiuNet

An Efficient Thread Mapping Strategy for Multiprogramming on Manycore Processors

Author: Tousimojarad Ashkan
Vanderbauwhede Wim
Publication venue: 'IOS Press'
Publication date: 01/03/2014
Field of study

The emergence of multicore and manycore processors is set to change the parallel computing world. Applications are shifting towards increased parallelism in order to utilise these architectures efficiently. This leads to a situation where every application creates its desirable number of threads, based on its parallel nature and the system resources allowance. Task scheduling in such a multithreaded multiprogramming environment is a significant challenge. In task scheduling, not only the order of the execution, but also the mapping of threads to the execution resources is of a great importance. In this paper we state and discuss some fundamental rules based on results obtained from selected applications of the BOTS benchmarks on the 64-core TILEPro64 processor. We demonstrate how previously efficient mapping policies such as those of the SMP Linux scheduler become inefficient when the number of threads and cores grows. We propose a novel, low-overhead technique, a heuristic based on the amount of time spent by each CPU doing some useful work, to fairly distribute the workloads amongst the cores in a multiprogramming environment. Our novel approach could be implemented as a pragma similar to those in the new task-based OpenMP versions, or can be incorporated as a distributed thread mapping mechanism in future manycore programming frameworks. We show that our thread mapping scheme can outperform the native GNU/Linux thread scheduler in both single-programming and multiprogramming environments.Comment: ParCo Conference, Munich, Germany, 201

arXiv.org e-Print Archive

Enlighten

RPPM : rapid performance prediction of multithreaded applications on multicore hardware

Author: Akram Shoaib
De Pestel Sander
Eeckhout Lieven
Van den Steen Sam
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

This paper proposes RPPM which, based on a microarchitecture-independent profile of a multithreaded application, predicts its performance on a previously unseen multicore platform. RPPM breaks up multithreaded program execution into epochs based on synchronization primitives, and then predicts per-epoch active execution times for each thread and synchronization overhead to arrive at a prediction for overall application performance. RPPM predicts performance within 12 percent on average (27 percent max error) compared to cycle-level simulation. We present a case study to illustrate that RPPM can be used for making accurate multicore design trade-offs early in the design cycle

Ghent University Academic Bibliography

The "MIND" Scalable PIM Architecture

Author: Brodowicz Maciej
Sterling Thomas
Publication venue
Publication date: 01/01/2005
Field of study

MIND (Memory, Intelligence, and Network Device) is an advanced parallel computer architecture for high performance computing and scalable embedded processing. It is a Processor-in-Memory (PIM) architecture integrating both DRAM bit cells and CMOS logic devices on the same silicon die. MIND is multicore with multiple memory/processor nodes on each chip and supports global shared memory across systems of MIND components. MIND is distinguished from other PIM architectures in that it incorporates mechanisms for efficient support of a global parallel execution model based on the semantics of message-driven multithreaded split-transaction processing. MIND is designed to operate either in conjunction with other conventional microprocessors or in standalone arrays of like devices. It also incorporates mechanisms for fault tolerance, real time execution, and active power management. This paper describes the major elements and operational methods of the MIND architecture

Caltech Authors