7 research outputs found

    Compiler-assisted Adaptive Program Scheduling in big.LITTLE Systems

    Full text link
    Energy-aware architectures provide applications with a mix of low (LITTLE) and high (big) frequency cores. Choosing the best hardware configuration for a program running on such an architecture is difficult, because program parts benefit differently from the same hardware configuration. State-of-the-art techniques to solve this problem adapt the program's execution to dynamic characteristics of the runtime environment, such as energy consumption and throughput. We claim that these purely dynamic techniques can be improved if they are aware of the program's syntactic structure. To support this claim, we show how to use the compiler to partition source code into program phases: regions whose syntactic characteristics lead to similar runtime behavior. We use reinforcement learning to map pairs formed by a program phase and a hardware state to the configuration that best fit this setup. To demonstrate the effectiveness of our ideas, we have implemented the Astro system. Astro uses Q-learning to associate syntactic features of programs with hardware configurations. As a proof of concept, we provide evidence that Astro outperforms GTS, the ARM-based Linux scheduler tailored for heterogeneous architectures, on the parallel benchmarks from Rodinia and Parsec

    Consumo de energía y asignaturas de arquitectura y tecnología de computadores

    Get PDF
    El consumo energético de los programas ha pasado a ser una medida de prestaciones tan importante como el tiempo de procesamiento, a pesar de que no suele incluirse en las medidas de rendimiento de los programas. Por esta razón es conveniente incluir en las asignaturas del área de arquitectura y tecnología de computadores contenidos relacionados con las prestaciones energéticas de los programas y las arquitecturas, y disponer de herramientas que permitan caracterizar la potencia y la energía consumida según las características del código a ejecutar. La necesidad de evaluar los programas según su eficiencia energética y su tiempo de ejecución constituye una aproximación multiobjetivo a la evaluación de prestaciones que debería introducirse en las asignaturas de Ingeniería Informática. En este artículo también describimos un sistema basado en Arduino que permite obtener medidas de potencia y energía consumida en las prácticas y proyectos que abordan la generación de códigos óptimos para una determinada plataforma.Nowadays, energy consumption of applications has become a performance measure as relevant as runtime although does not frequently appear in the program performance measures. This way, issues related with the energy consumption of applications and systems should be included in the subjects of computer architecture. Moreover, the availability of tools and strategies to characterize the instant power and consumed energy according to the code profile should be also considered. This could make possible the development of approaches to distribute the workload among the hardware to reach a tradeoff among time and energy efficiency. The searching for these tradeoffs clearly sets a multi-objective approach for performance evaluation that should be taken into account in the Computer Engineering and Computer Science courses. This paper also describes an Arduino-based system to measure the instant power and consumed energy in projects and practical exercises related with the generation of optimal codes.Universidad de Granada: Departamento de Arquitectura y Tecnología de Computadores; Vicerrectorado para la Garantía de la Calidad.Proyecto TIN2015-67020-P (Ministerio de Economía y Competitividad” y fondos FEDER)

    Energy-aware Load Balancing of Parallel Evolutionary Algorithms with Heavy Fitness Functions in Heterogeneous CPU-GPU Architectures

    Get PDF
    By means of the availability of mechanisms such as Dynamic Voltage and Frequency Scaling (DVFS) and heterogeneous architectures including processors with different power consumption profiles, it is possible to devise scheduling algorithms aware of both runtime and energy consumption in parallel programs. In this paper, we propose and evaluate a multi-objective (more specifically, a bi-objective) approach to distribute the workload among the processing cores in a given heterogeneous parallel CPU-GPU architecture. The aim of this distribution may be either to save energy without increasing the running time or to reach a trade-off among time and energy consumption. The parallel programs considered here are master-worker evolutionary algorithms where the evaluation of the fitness function for the individuals in the population demands the most part of the computing time. As many useful bioinformatics and data mining applications exhibit this kind of parallel profile, the proposed energy-aware approach for workload scheduling could be frequently applied.Spanish Ministerio de Economía y Competitividad under grant TIN2015-67020-PERDF fun

    SoCodeCNN: Program Source Code for Visual CNN Classification Using Computer Vision Methodology

    Get PDF
    Automated feature extraction from program source-code such that proper computing resources could beallocated to the program is very difficult given the current state of technology. Therefore, conventionalmethods call for skilled human intervention in order to achieve the task of feature extraction from programs.This research is the first to propose a novel human-inspired approach to automatically convert programsource-codes to visual images. The images could be then utilized for automated classification by visualconvolutional neural network (CNN) based algorithm. Experimental results show high prediction accuracyin classifying the types of program in a completely automated manner using this approach

    An Intelligent Framework for Energy-Aware Mobile Computing Subject to Stochastic System Dynamics

    Get PDF
    abstract: User satisfaction is pivotal to the success of mobile applications. At the same time, it is imperative to maximize the energy efficiency of the mobile device to ensure optimal usage of the limited energy source available to mobile devices while maintaining the necessary levels of user satisfaction. However, this is complicated due to user interactions, numerous shared resources, and network conditions that produce substantial uncertainty to the mobile device's performance and power characteristics. In this dissertation, a new approach is presented to characterize and control mobile devices that accurately models these uncertainties. The proposed modeling framework is a completely data-driven approach to predicting power and performance. The approach makes no assumptions on the distributions of the underlying sources of uncertainty and is capable of predicting power and performance with over 93% accuracy. Using this data-driven prediction framework, a closed-loop solution to the DEM problem is derived to maximize the energy efficiency of the mobile device subject to various thermal, reliability and deadline constraints. The design of the controller imposes minimal operational overhead and is able to tune the performance and power prediction models to changing system conditions. The proposed controller is implemented on a real mobile platform, the Google Pixel smartphone, and demonstrates a 19% improvement in energy efficiency over the standard frequency governor implemented on all Android devices.Dissertation/ThesisDoctoral Dissertation Computer Engineering 201

    Worst-Case Execution Time Guarantees for Runtime-Reconfigurable Architectures

    Get PDF
    Real-time systems are ubiquitous in our everyday life, e.g., in safety-critical domains such as automotive, avionics or robotics. The correctness of a real-time system does not only depend on the correctness of its calculations, but also on the non-functional requirement of adhering to deadlines. Failing to meet a deadline may lead to severe malfunctions, therefore worst-case execution times (WCET) need to be guaranteed. Despite significant scientific advances, however, timing analysis of WCET guarantees lags years behind current high-performance microarchitectures with out-of-order scheduling pipelines, several hardware threads and multiple (shared) cache layers. To satisfy the increasing performance demands of real-time systems, analyzable performance features are required. In order to escape the scarcity of timing-analyzable performance features, the main contribution of this thesis is the introduction of runtime reconfiguration of hardware accelerators onto a field-programmable gate array (FPGA) as a novel means to achieve performance that is amenable to WCET guarantees. Instead of designing an architecture for a specific application domain, this approach preserves the flexibility of the system. First, this thesis contributes novel co-scheduling approaches to distribute work among CPU and GPU in an extensive analysis of how (average-case) performance is achieved on fused CPU-GPU architectures, a main trend in current high-performance microarchitectures that combines a CPU and a GPU on a single chip. Being able to employ such architectures in real-time systems would be highly desirable, because they provide high performance within a limited area and power budget. As a result of this analysis, however, a cache coherency bottleneck is uncovered in recent fused CPU-GPU architectures that share the last level cache between CPU and GPU. This insight (i) complicates performance predictions and (ii) adds a shared last level cache between CPU and GPU to the growing list of microarchitectural features that benefit average-case performance, but render the analysis of WCET guarantees on high-performance architectures virtually infeasible. Thus, further motivating the need for novel microarchitectural features that provide predictable performance and are amenable to timing analysis. Towards this end, a runtime reconfiguration controller called ``Command-based Reconfiguration Queue\u27\u27 (CoRQ) is presented that provides guaranteed latencies for its operations, especially for the reconfiguration delay, i.e., the time it takes to reconfigure a hardware accelerator onto a reconfigurable fabric (e.g., FPGA). CoRQ enables the design of timing-analyzable runtime-reconfigurable architectures that support WCET guarantees. Based on the --now feasible-- guaranteed reconfiguration delay of accelerators, a WCET analysis is introduced that enables tasks to reconfigure application-specific custom instructions (CIs) at runtime. CIs are executed by a processor pipeline and invoke execution of one or more accelerators. Different measures to deal with reconfiguration delays are compared for their impact on accelerated WCET guarantees and overestimation. The timing anomaly of runtime reconfiguration is identified and safely bounded: a case where executing iterations of a computational kernel faster than in WCET during reconfiguration of CIs can prolong the total execution time of a task. Once tasks that perform runtime reconfiguration of CIs can be analyzed for WCET guarantees, the question of which CIs to configure on a constrained reconfigurable area to optimize the WCET is raised. The question is addressed for systems where multiple CIs with different implementations each (allowing to trade-off latency and area requirements) can be selected. This is generally the case, e.g., when employing high-level synthesis. This so-called WCET-optimizing instruction set selection problem is modeled based on the Implicit Path Enumeration Technique (IPET), which is the path analysis technique state-of-the-art timing analyzers rely on. To our knowledge, this is the first approach that enables WCET optimization with support for making use of global program flow information (and information about reconfiguration delay). An optimal algorithm (similar to Branch and Bound) and a fast greedy heuristic algorithm (that achieves the optimal solution in most cases) are presented. Finally, an approach is presented that, for the first time, combines optimized static WCET guarantees and runtime optimization of the average-case execution (maintaining WCET guarantees) using runtime reconfiguration of hardware accelerators by leveraging runtime slack (the amount of time that program parts are executed faster than in WCET). It comprises an analysis of runtime slack bounds that enable safe reconfiguration for average-case performance under WCET guarantees and presents a mechanism to monitor runtime slack using a simple performance counter that is commonly available in many microprocessors. Ultimately, this thesis shows that runtime reconfiguration of accelerators is a key feature to achieve predictable performance
    corecore