Search CORE

21,178 research outputs found

Investigation into runtime workload classification and management for energy-efficient many-core systems

Author: Aalsaud Ali Majeed Mohammed.
Publication venue: Newcastle University
Publication date: 01/01/2019
Field of study

PhD ThesisRecent advances in semiconductor technology have facilitated placing many cores on a single chip. This has led to increases in system architecture complexity with diverse application workloads, with single or multiple applications running concurrently. Determining the most energy-efﬁcient system conﬁguration, i.e. the number of parallel threads, their core allocations and operating frequencies, tailored for each kind of workload and application concurrency scenario is extremely challenging because of the multifaceted relationships between these conﬁguration knobs. Modelling and classifying the workloads can greatly simplify the runtime formulation of these relationships, delivering on energy efﬁciency, which is the key aim of this thesis. This thesis is focused on the development of new models for classifying single- and multi-application workloads in relation to how these workloads depend on the aforementioned system conﬁgurations. Underpinning these models, we implement and practically validate low-cost runtime methodologies for energy-efﬁcient many-core processors. This thesis makes four major contributions. Firstly, a comprehensive study is presented that proﬁles the power consumption and performance characteristics of a multi-threaded many-core system workload, associating power consumption and performance with multiple concurrent applications. These applications are exercised on a heterogeneous platform generating varying system workloads, viz. CPU-intensive or memory-intensive or a combination of both. Fundamental to this study is an investigation of the tradeoffs between inter-application concurrency with performance and power consumption under different system conﬁgurations. The second is a novel model-based runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variations of workload and application scenarios. Using real experimental measurements on a heterogeneous platform with a number of PARSEC benchmark applications, we study power normalized performance (in terms of IPS/Watt) underpinned with analytical power and performance models, derived through multivariate linear regression (MLR). Using these models we show that CPU intensive applications behave differently in IPS/Watt compared to memory intensive applications in both sequential and concurrent application scenarios. Furthermore, this approach demonstrate that it is possible to continuously adapt system conﬁguration through a per-application runtime optimization algorithm, which can improve the IPS/Watt compared to the existing approach. Runtime overheads vii are at least three cycles for each frequency to determine the control action. To reduce overheads and complexity, a novel model-free runtime optimization approach with the aim of maximizing power-normalized performance considering dynamic workload variations has been proposed. This approach is the third contribution. This approach is based on workload classiﬁcation. This classiﬁcation is supported by analysis of data collected from a comprehensive study investigating the tradeoffsbetweeninter-applicationconcurrencywithperformanceand power under different system conﬁgurations. Extensive experiments have been carried out on heterogeneous and homogeneous platforms with synthetic and standard benchmark applications to develop the control policies and validate our approach. These experiments show that workload classiﬁcation into CPU-intensive and memory-intensive types provides the foundation for scalable energy minimization with low complexity. Thefourthcontributioncombinesworkloadclassiﬁcationwithmodel based multivariate linear regression. The ﬁrst approach has been used to reduce the problem complexity, and the second approach has been used for optimization in a reduced decision space using linearregression. This approach further improves IPS/Watt signiﬁcantly compared to existing approaches. This thesis presents a new runtime governor framework which interfaces runtime management algorithms with system monitors and actuators. This tool is not tied down to the speciﬁc control algorithms presented in this thesis and therefore has much wider applications.Iraqi Ministry of Higher Education and Scientiﬁc Research and Mustansiriyah Universit

Newcastle University eTheses

Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

Author: Chen Langshi
Jiang Lei
Qiu Judy
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/02/2019
Field of study

Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance Characterization of Multi-threaded Graph Processing Applications on Many-Integrated-Core Architecture," 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, United Kingdom, 2018, pp. 199-20

arXiv.org e-Print Archive

IUScholarWorks Open

Adaptive energy minimization of OpenMP parallel applications on many-core systems

Author: Al-Hashimi Bashir
Das Anup K.
Merrett Geoff V.
Shafik Rishad Ahmed
Yang Sheng
Publication venue
Publication date
Field of study

Energy minimization of parallel applications is an emerging challenge for current and future generations of many-core computing systems. In this paper, we propose a novel and scalable energy minimization approach that suitably applies DVFS in the sequential part and jointly considers DVFS and dynamic core allocations in the parallel part. Fundamental to this approach is an iterative learning based control algorithm that adapt the voltage/frequency scaling and core allocations dynamically based on workload predictions and is guided by the CPU performance counters at regular intervals. The adaptation is facilitated through performance annotations in the application codes, defined in a modified OpenMP runtime library. The proposed approach is validated on an Intel Xeon E5-2630 platform with up to 24 CPUs running NAS parallel benchmark applications. We show that our proposed approach can effectively adapt to different architecture and core allocations and minimize energy consumption by up to 17% compared to the existing approaches for a given performance requirement

Southampton (e-Prints Soton)

Waveform Design for Secure SISO Transmissions and Multicasting

Author: Dimitris A. Pados
Ipan Kundu
Ming Li
Senior Member
Stella N. Batalama
Publication venue
Publication date: 03/06/2013
Field of study

Wireless physical-layer security is an emerging field of research aiming at preventing eavesdropping in an open wireless medium. In this paper, we propose a novel waveform design approach to minimize the likelihood that a message transmitted between trusted single-antenna nodes is intercepted by an eavesdropper. In particular, with knowledge first of the eavesdropper's channel state information (CSI), we find the optimum waveform and transmit energy that minimize the signal-to-interference-plus-noise ratio (SINR) at the output of the eavesdropper's maximum-SINR linear filter, while at the same time provide the intended receiver with a required pre-specified SINR at the output of its own max-SINR filter. Next, if prior knowledge of the eavesdropper's CSI is unavailable, we design a waveform that maximizes the amount of energy available for generating disturbance to eavesdroppers, termed artificial noise (AN), while the SINR of the intended receiver is maintained at the pre-specified level. The extensions of the secure waveform design problem to multiple intended receivers are also investigated and semidefinite relaxation (SDR) -an approximation technique based on convex optimization- is utilized to solve the arising NP-hard design problems. Extensive simulation studies confirm our analytical performance predictions and illustrate the benefits of the designed waveforms on securing single-input single-output (SISO) transmissions and multicasting

arXiv.org e-Print Archive

CiteSeerX