424 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation

    Full text link
    Resource balancing has been effectively used to mitigate the long-term aging effects of Negative Bias Temperature Instability (NBTI) in multi-core and Graphics Processing Unit (GPU) architectures. In this work, we investigate this strategy in Coarse-Grained Reconfigurable Arrays (CGRAs) with a novel application-to-CGRA allocation approach. By introducing important extensions to the reconfiguration logic and the datapath, we enable the dynamic movement of configurations throughout the fabric and allow overutilized Functional Units (FUs) to recover from stress-induced NBTI aging. Implementing the approach in a resource-constrained state-of-the-art CGRA reveals 2.2×2.2\times lifetime improvement with negligible performance overheads and less than 10%10\% increase in area.Comment: Please cite this as: M. Brandalero, B. N. Lignati, A. Carlos Schneider Beck, M. Shafique and M. H\"ubner, "Proactive Aging Mitigation in CGRAs through Utilization-Aware Allocation," 2020 57th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 2020, pp. 1-6, doi: 10.1109/DAC18072.2020.921858

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination

    Rewriting History: Repurposing Domain-Specific CGRAs

    Full text link
    Coarse-grained reconfigurable arrays (CGRAs) are domain-specific devices promising both the flexibility of FPGAs and the performance of ASICs. However, with restricted domains comes a danger: designing chips that cannot accelerate enough current and future software to justify the hardware cost. We introduce FlexC, the first flexible CGRA compiler, which allows CGRAs to be adapted to operations they do not natively support. FlexC uses dataflow rewriting, replacing unsupported regions of code with equivalent operations that are supported by the CGRA. We use equality saturation, a technique enabling efficient exploration of a large space of rewrite rules, to effectively search through the program-space for supported programs. We applied FlexC to over 2,000 loop kernels, compiling to four different research CGRAs and 300 generated CGRAs and demonstrate a 2.2×\times increase in the number of loop kernels accelerated leading to 3×\times speedup compared to an Arm A5 CPU on kernels that would otherwise be unsupported by the accelerator

    Extending the HybridThread SMP Model for Distributed Memory Systems

    Get PDF
    Memory Hierarchy is of growing importance in system design today. As Moore\u27s Law allows system designers to include more processors within their designs, data locality becomes a priority. Traditional multiprocessor systems on chip (MPSoC) experience difficulty scaling as the quantity of processors increases. This challenge is common behavior of memory accesses in a shared memory environment and causes a decrease in memory bandwidth as processor numbers increase. In order to provide the necessary levels of scalability, the computer architecture community has sought to decentralize memory accesses by distributing memory throughout the system. Distributed memory offers greater bandwidth due to decoupled access paths. Today\u27s million gate Field Programmable Gate Arrays (FPGA) offer an invaluable opportunity to explore this type of memory hierarchy. FPGA vendors such as Xilinx provide dual-ported on-chip memory for decoupled access in addition to configurable sized memories. In this work, a new platform was created around the use of dual-ported SRAMs for distributed memory to explore the possible scalability of this form of memory hierarchy. However, developing distributed memory poses a tremendous challenge: supporting a linear address space that allows wide applicability to be achieved. Many have agreed that a linear address space eases the programmability of a system. Although the abstraction of disjointed memories via underlying architecture and/or new programming presents an advantage in exploring the possibilities of distributed memory, automatic data partitioning and migration remains a considerable challenge. In this research this challenge was dealt with by the inclusion of both a shared memory and distributed memory model. This research is vital because exposing the programmer to the underlying architecture while providing a linear address space results in desired standards of programmability and performance alike. In addition, standard shared memory programming models can be applied allowing the user to enjoy full scalable performance potential

    Embedded electronic systems driven by run-time reconfigurable hardware

    Get PDF
    Abstract This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability
    • …
    corecore