3 research outputs found

    Dynamic HW/SW Partitioning: Configuration Scheduling and Design Space Exploration

    Get PDF
    Hardware/software partitioning is a process that occurs frequently in embedded system design. It is the procedure of determining whether a part of a system should be implemented in software or hardware. This dissertation is a study of hardware/software partitioning and the use of scheduling algorithms to improve the performance of dynamically reconfigurable computing devices. Reconfigurable computing devices are devices that are adaptable at the logic level to solve specific problems [Tes05]. One example of a reconfigurable computing device is the field programmable gate array (FPGA). The emergence of dynamically reconfigurable FPGAs made it possible to configure FPGAs at runtime. Most current approaches use a simple on demand configuration scheduling algorithm for the FPGA configurations. The on demand configuration scheduling algorithm reconfigures the FPGA at runtime, whenever a configuration is needed and is found not to be configured. The problem with this approach of dynamic reconfiguration is the reconfiguration time overhead, which is the time it takes to reconfigure the FPGA with a new configuration at runtime. Configuration caches and partial configuration have been proposed as possible solutions to this problem, but these techniques suffer from various limitations. The emergence of dynamically reconfigurable FPGAs also made it possible to perform dynamic hardware/software partitioning (DHSP), which is the procedure of determining at runtime whether a computation should be performed using its software or hardware implementation. The drawback of performing DHSP using configurations that are generated at runtime is that the profiling and the dynamic generation of configurations require profiling tool and synthesis tool access at runtime. This study proposes that configuration scheduling algorithms, which perform DHSP using statically generated configurations, can be developed to combine the advantages and reduce the major disadvantages of current approaches. A case study is used to compare and evaluate the tradeoffs between the currently existing approach for dynamic reconfiguration and the DHSP configuration scheduling algorithm based approach proposed in the study. A simulation model is developed to examine the performance of the various configuration scheduling algorithms. First, the difference in the execution time between the different approaches is analyzed. Afterwards, other important design criteria such as power consumption, energy consumption, area requirements and unit cost are analyzed and estimated. Also, business and marketing considerations such as time to market and development cost are considered. The study illustrates how different types of DHSP configuration scheduling algorithms can be implemented and how their performance can be evaluated using a variety of software applications. It is also shown how to evaluate when which of the approaches would be more advantageous by determining the tradeoffs that exist between them. Also the underlying factors that affect when which design alternative is more advantageous are determined and analyzed. The study shows that configuration scheduling algorithms, which perform DHSP using statically generated configurations, can be developed to combine the advantages and reduce some major disadvantages of current approaches. It is shown that there are situations where DHSP configuration scheduling algorithms can be more advantageous than the other approaches

    Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis

    Get PDF
    Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform. This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements

    Implementation of an AMIDAR-based Java Processor

    Get PDF
    This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels
    corecore