34,814 research outputs found
Power Modelling for Heterogeneous Cloud-Edge Data Centers
Existing power modelling research focuses not on the method used for
developing models but rather on the model itself. This paper aims to develop a
method for deploying power models on emerging processors that will be used, for
example, in cloud-edge data centers. Our research first develops a hardware
counter selection method that appropriately selects counters most correlated to
power on ARM and Intel processors. Then, we propose a two stage power model
that works across multiple architectures. The key results are: (i) the
automated hardware performance counter selection method achieves comparable
selection to the manual selection methods reported in literature, and (ii) the
two stage power model can predict dynamic power more accurately on both ARM and
Intel processors when compared to classic power models.Comment: 10 pages,10 figures,conferenc
Coarse-grained reconfigurable array architectures
Coarse-Grained Reconfigurable Array (CGRA) architectures accelerate the same inner loops that benefit from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efficiently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on flexibility, performance, and power-efficiency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual fine-tuning of source code
Ithemal: Accurate, Portable and Fast Basic Block Throughput Estimation using Deep Neural Networks
Predicting the number of clock cycles a processor takes to execute a block of
assembly instructions in steady state (the throughput) is important for both
compiler designers and performance engineers. Building an analytical model to
do so is especially complicated in modern x86-64 Complex Instruction Set
Computer (CISC) machines with sophisticated processor microarchitectures in
that it is tedious, error prone, and must be performed from scratch for each
processor generation. In this paper we present Ithemal, the first tool which
learns to predict the throughput of a set of instructions. Ithemal uses a
hierarchical LSTM--based approach to predict throughput based on the opcodes
and operands of instructions in a basic block. We show that Ithemal is more
accurate than state-of-the-art hand-written tools currently used in compiler
backends and static machine code analyzers. In particular, our model has less
than half the error of state-of-the-art analytical models (LLVM's llvm-mca and
Intel's IACA). Ithemal is also able to predict these throughput values just as
fast as the aforementioned tools, and is easily ported across a variety of
processor microarchitectures with minimal developer effort.Comment: Published at 36th International Conference on Machine Learning (ICML)
201
Minimum entropy restoration using FPGAs and high-level techniques
One of the greatest perceived barriers to the widespread use of FPGAs in image processing is the difficulty for application specialists of developing algorithms on reconfigurable hardware. Minimum entropy deconvolution (MED) techniques have been shown to be effective in the restoration of star-field images. This paper reports on an attempt to implement a MED algorithm using simulated annealing, first on a microprocessor, then on an FPGA. The FPGA implementation uses DIME-C, a C-to-gates compiler, coupled with a low-level core library to simplify the design task. Analysis of the C code and output from the DIME-C compiler guided the code optimisation. The paper reports on the design effort that this entailed and the resultant performance improvements
An overview of data acquisition, signal coding and data analysis techniques for MST radars
An overview is given of the data acquisition, signal processing, and data analysis techniques that are currently in use with high power MST/ST (mesosphere stratosphere troposphere/stratosphere troposphere) radars. This review supplements the works of Rastogi (1983) and Farley (1984) presented at previous MAP workshops. A general description is given of data acquisition and signal processing operations and they are characterized on the basis of their disparate time scales. Then signal coding, a brief description of frequently used codes, and their limitations are discussed, and finally, several aspects of statistical data processing such as signal statistics, power spectrum and autocovariance analysis, outlier removal techniques are discussed
REPP-H: runtime estimation of power and performance on heterogeneous data centers
Modern data centers increasingly demand improved performance with minimal power consumption. Managing the power and performance requirements of the applications is challenging because these data centers, incidentally or intentionally, have to deal with server architecture heterogeneity [19], [22]. One critical challenge that data centers have to face is how to manage system power and performance given the different application behavior across multiple different architectures.This work has been supported by the EU FP7 program (Mont-Blanc 2, ICT-610402), by the
Ministerio de Economia (CAP-VII, TIN2015-65316-P), and the Generalitat de Catalunya (MPEXPAR, 2014-SGR-1051).
The material herein is based in part upon work supported by the US NSF, grant numbers ACI-1535232 and CNS-1305220.Peer ReviewedPostprint (author's final draft
Temperature Regulation in Multicore Processors Using Adjustable-Gain Integral Controllers
This paper considers the problem of temperature regulation in multicore
processors by dynamic voltage-frequency scaling. We propose a feedback law that
is based on an integral controller with adjustable gain, designed for fast
tracking convergence in the face of model uncertainties, time-varying plants,
and tight computing-timing constraints. Moreover, unlike prior works we
consider a nonlinear, time-varying plant model that trades off precision for
simple and efficient on-line computations. Cycle-level, full system simulator
implementation and evaluation illustrates fast and accurate tracking of given
temperature reference values, and compares favorably with fixed-gain
controllers.Comment: 8 pages, 6 figures, IEEE Conference on Control Applications 2015,
Accepted Versio
- …