34,814 research outputs found

    Power Modelling for Heterogeneous Cloud-Edge Data Centers

    Get PDF
    Existing power modelling research focuses not on the method used for developing models but rather on the model itself. This paper aims to develop a method for deploying power models on emerging processors that will be used, for example, in cloud-edge data centers. Our research first develops a hardware counter selection method that appropriately selects counters most correlated to power on ARM and Intel processors. Then, we propose a two stage power model that works across multiple architectures. The key results are: (i) the automated hardware performance counter selection method achieves comparable selection to the manual selection methods reported in literature, and (ii) the two stage power model can predict dynamic power more accurately on both ARM and Intel processors when compared to classic power models.Comment: 10 pages,10 figures,conferenc

    Coarse-grained reconfigurable array architectures

    Get PDF
    Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code

    Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks

    Full text link
    Predicting the number of clock cycles a processor takes to execute a block of assembly instructions in steady state (the throughput) is important for both compiler designers and performance engineers. Building an analytical model to do so is especially complicated in modern x86-64 Complex Instruction Set Computer (CISC) machines with sophisticated processor microarchitectures in that it is tedious, error prone, and must be performed from scratch for each processor generation. In this paper we present Ithemal, the first tool which learns to predict the throughput of a set of instructions. Ithemal uses a hierarchical LSTM--based approach to predict throughput based on the opcodes and operands of instructions in a basic block. We show that Ithemal is more accurate than state-of-the-art hand-written tools currently used in compiler backends and static machine code analyzers. In particular, our model has less than half the error of state-of-the-art analytical models (LLVM's llvm-mca and Intel's IACA). Ithemal is also able to predict these throughput values just as fast as the aforementioned tools, and is easily ported across a variety of processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML) 201

    Minimum entropy restoration using FPGAs and high-level techniques

    Get PDF
    One of the greatest perceived barriers to the widespread use of FPGAs in image processing is the difficulty for application specialists of developing algorithms on reconfigurable hardware. Minimum entropy deconvolution (MED) techniques have been shown to be effective in the restoration of star-field images. This paper reports on an attempt to implement a MED algorithm using simulated annealing, first on a microprocessor, then on an FPGA. The FPGA implementation uses DIME-C, a C-to-gates compiler, coupled with a low-level core library to simplify the design task. Analysis of the C code and output from the DIME-C compiler guided the code optimisation. The paper reports on the design effort that this entailed and the resultant performance improvements

    An overview of data acquisition, signal coding and data analysis techniques for MST radars

    Get PDF
    An overview is given of the data acquisition, signal processing, and data analysis techniques that are currently in use with high power MST/ST (mesosphere stratosphere troposphere/stratosphere troposphere) radars. This review supplements the works of Rastogi (1983) and Farley (1984) presented at previous MAP workshops. A general description is given of data acquisition and signal processing operations and they are characterized on the basis of their disparate time scales. Then signal coding, a brief description of frequently used codes, and their limitations are discussed, and finally, several aspects of statistical data processing such as signal statistics, power spectrum and autocovariance analysis, outlier removal techniques are discussed

    REPP-H: runtime estimation of power and performance on heterogeneous data centers

    Get PDF
    Modern data centers increasingly demand improved performance with minimal power consumption. Managing the power and performance requirements of the applications is challenging because these data centers, incidentally or intentionally, have to deal with server architecture heterogeneity [19], [22]. One critical challenge that data centers have to face is how to manage system power and performance given the different application behavior across multiple different architectures.This work has been supported by the EU FP7 program (Mont-Blanc 2, ICT-610402), by the Ministerio de Economia (CAP-VII, TIN2015-65316-P), and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051). The material herein is based in part upon work supported by the US NSF, grant numbers ACI-1535232 and CNS-1305220.Peer ReviewedPostprint (author's final draft

    Temperature Regulation in Multicore Processors Using Adjustable-Gain Integral Controllers

    Full text link
    This paper considers the problem of temperature regulation in multicore processors by dynamic voltage-frequency scaling. We propose a feedback law that is based on an integral controller with adjustable gain, designed for fast tracking convergence in the face of model uncertainties, time-varying plants, and tight computing-timing constraints. Moreover, unlike prior works we consider a nonlinear, time-varying plant model that trades off precision for simple and efficient on-line computations. Cycle-level, full system simulator implementation and evaluation illustrates fast and accurate tracking of given temperature reference values, and compares favorably with fixed-gain controllers.Comment: 8 pages, 6 figures, IEEE Conference on Control Applications 2015, Accepted Versio
    corecore