2,210 research outputs found

    PowerPack: Energy Profiling and Analysis of High-Performance Systems and Applications

    Get PDF
    Energy efficiency is a major concern in modern high-performance computing system design. In the past few years, there has been mounting evidence that power usage limits system scale and computing density, and thus, ultimately system performance. However, despite the impact of power and energy on the computer systems community, few studies provide insight to where and how power is consumed on high-performance systems and applications. In previous work, we designed a framework called PowerPack that was the first tool to isolate the power consumption of devices including disks, memory, NICs, and processors in a high-performance cluster and correlate these measurements to application functions. In this work, we extend our framework to support systems with multicore, multiprocessor-based nodes, and then provide in-depth analyses of the energy consumption of parallel applications on clusters of these systems. These analyses include the impacts of chip multiprocessing on power and energy efficiency, and its interaction with application executions. In addition, we use PowerPack to study the power dynamics and energy efficiencies of dynamic voltage and frequency scaling (DVFS) techniques on clusters. Our experiments reveal conclusively how intelligent DVFS scheduling can enhance system energy efficiency while maintaining performance

    FIFTY YEARS OF MICROPROCESSOR EVOLUTION: FROM SINGLE CPU TO MULTICORE AND MANYCORE SYSTEMS

    Get PDF
    Nowadays microprocessors are among the most complex electronic systems that man has ever designed. One small silicon chip can contain the complete processor, large memory and logic needed to connect it to the input-output devices. The performance of today's processors implemented on a single chip surpasses the performance of a room-sized supercomputer from just 50 years ago, which cost over $ 10 million [1]. Even the embedded processors found in everyday devices such as mobile phones are far more powerful than computer developers once imagined. The main components of a modern microprocessor are a number of general-purpose cores, a graphics processing unit, a shared cache, memory and input-output interface and a network on a chip to interconnect all these components [2]. The speed of the microprocessor is determined by its clock frequency and cannot exceed a certain limit. Namely, as the frequency increases, the power dissipation increases too, and consequently the amount of heating becomes critical. So, silicon manufacturers decided to design new processor architecture, called multicore processors [3]. With aim to increase performance and efficiency these multiple cores execute multiple instructions simultaneously. In this way, the amount of parallel computing or parallelism is increased [4]. In spite of mentioned advantages, numerous challenges must be addressed carefully when more cores and parallelism are used.This paper presents a review of microprocessor microarchitectures, discussing their generations over the past 50 years. Then, it describes the currently used implementations of the microarchitecture of modern microprocessors, pointing out the specifics of parallel computing in heterogeneous microprocessor systems. To use efficiently the possibility of multi-core technology, software applications must be multithreaded. The program execution must be distributed among the multi-core processors so they can operate simultaneously. To use multi-threading, it is imperative for programmer to understand the basic principles of parallel computing and parallel hardware. Finally, the paper provides details how to implement hardware parallelism in multicore systems

    Algorithms for Extracting Frequent Episodes in the Process of Temporal Data Mining

    Get PDF
    An important aspect in the data mining process is the discovery of patterns having a great influence on the studied problem. The purpose of this paper is to study the frequent episodes data mining through the use of parallel pattern discovery algorithms. Parallel pattern discovery algorithms offer better performance and scalability, so they are of a great interest for the data mining research community. In the following, there will be highlighted some parallel and distributed frequent pattern mining algorithms on various platforms and it will also be presented a comparative study of their main features. The study takes into account the new possibilities that arise along with the emerging novel Compute Unified Device Architecture from the latest generation of graphics processing units. Based on their high performance, low cost and the increasing number of features offered, GPU processors are viable solutions for an optimal implementation of frequent pattern mining algorithmsFrequent Pattern Mining, Parallel Computing, Dynamic Load Balancing, Temporal Data Mining, CUDA, GPU, Fermi, Thread

    Multiprocessing techniques for unmanned multifunctional satellites Final report,

    Get PDF
    Simulation of on-board multiprocessor for long lived unmanned space satellite contro

    Partitioning of computationally intensive tasks between FPGA and CPUs

    Get PDF
    With the recent development of faster and more complex Multiprocessor System-on-Cips (MPSoCs), a large number of different resources have become available on a single chip. For example, Xilinx's UltraScale+ is a powerful MPSoC with four ARM Cortex-A53 CPUs, two Cortex-R5 real-time cores, an FPGA fabric and a Mali-400 GPU. Optimal partitioning between CPUs, real-time cores, GPU and FPGA is therefore a challenge. For many scientific applications with high sampling rates and real-time signal analysis, an FFT needs to be calculated and analyzed directly in the measuring device. The goal of partitioning such an FFT in an MPSoC is to make best use of the available resources, to minimize latency and to optimize performance. The paper compares different partitioning designs and discusses their advantages and disadvantages. Measurement results with up to 250 MSamples per second are shown

    Implementation of Asymmetric Multiprocessing Support in a Real-Time Operating System

    Get PDF
    The semiconductor industry can no longer afford to rely on decreasing the size of the die, and increasing the frequency of operation to achieve higher performance. An alternative that has been proven to increase performance is multiprocessing. Multiprocessing refers to the concept of running more than one application or task on more than one central processor. Multi-core processors are the main engine of multiprocessing. In asymmetric multiprocessing, each core in a multi-core systems is independent and has its own code that determines its execution. These cores must be able to communicate and synchronize access to resources
    corecore