21,178 research outputs found

    Investigation into runtime workload classification and management for energy-efficient many-core systems

    Get PDF
    PhD ThesisRecent advances in semiconductor technology have facilitated placing many cores on a single chip. This has led to increases in system architecture complexity with diverse application workloads, with single or multiple applications running concurrently. Determining the most energy-efficient system configuration, i.e. the number of parallel threads, their core allocations and operating frequencies, tailored for each kind of workload and application concurrency scenario is extremely challenging because of the multifaceted relationships between these configuration knobs. Modelling and classifying the workloads can greatly simplify the runtime formulation of these relationships, delivering on energy efficiency, which is the key aim of this thesis. This thesis is focused on the development of new models for classifying single- and multi-application workloads in relation to how these workloads depend on the aforementioned system configurations. Underpinning these models, we implement and practically validate low-cost runtime methodologies for energy-efficient many-core processors. This thesis makes four major contributions. Firstly, a comprehensive study is presented that profiles the power consumption and performance characteristics of a multi-threaded many-core system workload, associating power consumption and performance with multiple concurrent applications. These applications are exercised on a heterogeneous platform generating varying system workloads, viz. CPU-intensive or memory-intensive or a combination of both. Fundamental to this study is an investigation of the tradeoffs between inter-application concurrency with performance and power consumption under different system configurations. The second is a novel model-based runtime optimization approach with the aim of achieving maximized power normalized performance considering dynamic variations of workload and application scenarios. Using real experimental measurements on a heterogeneous platform with a number of PARSEC benchmark applications, we study power normalized performance (in terms of IPS/Watt) underpinned with analytical power and performance models, derived through multivariate linear regression (MLR). Using these models we show that CPU intensive applications behave differently in IPS/Watt compared to memory intensive applications in both sequential and concurrent application scenarios. Furthermore, this approach demonstrate that it is possible to continuously adapt system configuration through a per-application runtime optimization algorithm, which can improve the IPS/Watt compared to the existing approach. Runtime overheads vii are at least three cycles for each frequency to determine the control action. To reduce overheads and complexity, a novel model-free runtime optimization approach with the aim of maximizing power-normalized performance considering dynamic workload variations has been proposed. This approach is the third contribution. This approach is based on workload classification. This classification is supported by analysis of data collected from a comprehensive study investigating the tradeoffsbetweeninter-applicationconcurrencywithperformanceand power under different system configurations. Extensive experiments have been carried out on heterogeneous and homogeneous platforms with synthetic and standard benchmark applications to develop the control policies and validate our approach. These experiments show that workload classification into CPU-intensive and memory-intensive types provides the foundation for scalable energy minimization with low complexity. Thefourthcontributioncombinesworkloadclassificationwithmodel based multivariate linear regression. The first approach has been used to reduce the problem complexity, and the second approach has been used for optimization in a reduced decision space using linearregression. This approach further improves IPS/Watt significantly compared to existing approaches. This thesis presents a new runtime governor framework which interfaces runtime management algorithms with system monitors and actuators. This tool is not tied down to the specific control algorithms presented in this thesis and therefore has much wider applications.Iraqi Ministry of Higher Education and Scientific Research and Mustansiriyah Universit

    Performance Characterization of Multi-threaded Graph Processing Applications on Intel Many-Integrated-Core Architecture

    Full text link
    Intel Xeon Phi many-integrated-core (MIC) architectures usher in a new era of terascale integration. Among emerging killer applications, parallel graph processing has been a critical technique to analyze connected data. In this paper, we empirically evaluate various computing platforms including an Intel Xeon E5 CPU, a Nvidia Geforce GTX1070 GPU and an Xeon Phi 7210 processor codenamed Knights Landing (KNL) in the domain of parallel graph processing. We show that the KNL gains encouraging performance when processing graphs, so that it can become a promising solution to accelerating multi-threaded graph applications. We further characterize the impact of KNL architectural enhancements on the performance of a state-of-the art graph framework.We have four key observations: 1 Different graph applications require distinctive numbers of threads to reach the peak performance. For the same application, various datasets need even different numbers of threads to achieve the best performance. 2 Only a few graph applications benefit from the high bandwidth MCDRAM, while others favor the low latency DDR4 DRAM. 3 Vector processing units executing AVX512 SIMD instructions on KNLs are underutilized when running the state-of-the-art graph framework. 4 The sub-NUMA cache clustering mode offering the lowest local memory access latency hurts the performance of graph benchmarks that are lack of NUMA awareness. At last, We suggest future works including system auto-tuning tools and graph framework optimizations to fully exploit the potential of KNL for parallel graph processing.Comment: published as L. Jiang, L. Chen and J. Qiu, "Performance Characterization of Multi-threaded Graph Processing Applications on Many-Integrated-Core Architecture," 2018 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), Belfast, United Kingdom, 2018, pp. 199-20

    Adaptive energy minimization of OpenMP parallel applications on many-core systems

    No full text
    Energy minimization of parallel applications is an emerging challenge for current and future generations of many-core computing systems. In this paper, we propose a novel and scalable energy minimization approach that suitably applies DVFS in the sequential part and jointly considers DVFS and dynamic core allocations in the parallel part. Fundamental to this approach is an iterative learning based control algorithm that adapt the voltage/frequency scaling and core allocations dynamically based on workload predictions and is guided by the CPU performance counters at regular intervals. The adaptation is facilitated through performance annotations in the application codes, defined in a modified OpenMP runtime library. The proposed approach is validated on an Intel Xeon E5-2630 platform with up to 24 CPUs running NAS parallel benchmark applications. We show that our proposed approach can effectively adapt to different architecture and core allocations and minimize energy consumption by up to 17% compared to the existing approaches for a given performance requirement

    Waveform Design for Secure SISO Transmissions and Multicasting

    Full text link
    Wireless physical-layer security is an emerging field of research aiming at preventing eavesdropping in an open wireless medium. In this paper, we propose a novel waveform design approach to minimize the likelihood that a message transmitted between trusted single-antenna nodes is intercepted by an eavesdropper. In particular, with knowledge first of the eavesdropper's channel state information (CSI), we find the optimum waveform and transmit energy that minimize the signal-to-interference-plus-noise ratio (SINR) at the output of the eavesdropper's maximum-SINR linear filter, while at the same time provide the intended receiver with a required pre-specified SINR at the output of its own max-SINR filter. Next, if prior knowledge of the eavesdropper's CSI is unavailable, we design a waveform that maximizes the amount of energy available for generating disturbance to eavesdroppers, termed artificial noise (AN), while the SINR of the intended receiver is maintained at the pre-specified level. The extensions of the secure waveform design problem to multiple intended receivers are also investigated and semidefinite relaxation (SDR) -an approximation technique based on convex optimization- is utilized to solve the arising NP-hard design problems. Extensive simulation studies confirm our analytical performance predictions and illustrate the benefits of the designed waveforms on securing single-input single-output (SISO) transmissions and multicasting
    corecore