2,719 research outputs found

    Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review

    Get PDF
    The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER

    A hardware-assisted translation cache for dynamic binary translation in embedded systems

    Get PDF
    Approaches to Dynamic Binary Translation (DBT) on resource-constrained embedded systems are not straight forward, leading to several improvements and acceleration suggestions that rely on dedicated hardware. Software to hardware offloading is a common acceleration procedure used when software-only approaches do not meet the performance requirements, making such approach suitable to be successfully applied to DBT. This article approaches hardware offloading to address some limitations of an in-house DBT engine, the DBTOR, regarding its Translation Cache (TCache) management mechanism. The suggested approaches are non-intrusive to the target architecture, which cope with the commercial-off-the-shelf (COTS)-driven deployment of DBT for the resource-constrained embedded devices. This work proposes a TCache management hardware module that overpasses the linked list and hash table software-only approaches, resulting in a performance improvement of 25% and 26%, respectively..This work has been supported by COMPETE: POCI-01-0145-FEDER-007043 and FCT - Fundação para a Ciência e Tecnologia within the Project Scope: UID/CEC/00319/2013

    Instrumenting self-modifying code

    Full text link
    Adding small code snippets at key points to existing code fragments is called instrumentation. It is an established technique to debug certain otherwise hard to solve faults, such as memory management issues and data races. Dynamic instrumentation can already be used to analyse code which is loaded or even generated at run time.With the advent of environments such as the Java Virtual Machine with optimizing Just-In-Time compilers, a new obstacle arises: self-modifying code. In order to instrument this kind of code correctly, one must be able to detect modifications and adapt the instrumentation code accordingly, preferably without incurring a high penalty speedwise. In this paper we propose an innovative technique that uses the hardware page protection mechanism of modern processors to detect such modifications. We also show how an instrumentor can adapt the instrumented version depending on the kind of modificiations as well as an experimental evaluation of said techniques.Comment: In M. Ronsse, K. De Bosschere (eds), proceedings of the Fifth International Workshop on Automated Debugging (AADEBUG 2003), September 2003, Ghent. cs.SE/030902

    On the Feasibility and Limitations of Just-in-Time Instruction Set Extension for FPGA-Based Reconfigurable Processors

    Get PDF
    Reconfigurable instruction set processors provide the possibility of tailor the instruction set of a CPU to a particular application. While this customization process could be performed during runtime in order to adapt the CPU to the currently executed workload, this use case has been hardly investigated. In this paper, we study the feasibility of moving the customization process to runtime and evaluate the relation of the expected speedups and the associated overheads. To this end, we present a tool flow that is tailored to the requirements of this just-in-time ASIP specialization scenario. We evaluate our methods by targeting our previously introduced Woolcano reconfigurable ASIP architecture for a set of applications from the SPEC2006, SPEC2000, MiBench, and SciMark2 benchmark suites. Our results show that just-in-time ASIP specialization is promising for embedded computing applications, where average speedups of 5x can be achieved by spending 50 minutes for custom instruction identification and hardware generation. These overheads will be compensated if the applications execute for more than 2 hours. For the scientific computing benchmarks, the achievable speedup is only 1.2x, which requires significant execution times in the order of days to amortize the overheads

    Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

    Get PDF
    Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements
    • …
    corecore