3 research outputs found
Dynamic HW/SW Partitioning: Configuration Scheduling and Design Space Exploration
Hardware/software partitioning is a process that occurs frequently in embedded system design. It is the
procedure of determining whether a part of a system should be implemented in software or hardware.
This dissertation is a study of hardware/software partitioning and the use of scheduling algorithms to
improve the performance of dynamically reconfigurable computing devices. Reconfigurable computing
devices are devices that are adaptable at the logic level to solve specific problems [Tes05]. One example
of a reconfigurable computing device is the field programmable gate array (FPGA). The emergence of
dynamically reconfigurable FPGAs made it possible to configure FPGAs at runtime. Most current
approaches use a simple on demand configuration scheduling algorithm for the FPGA configurations. The
on demand configuration scheduling algorithm reconfigures the FPGA at runtime, whenever a
configuration is needed and is found not to be configured. The problem with this approach of dynamic
reconfiguration is the reconfiguration time overhead, which is the time it takes to reconfigure the FPGA
with a new configuration at runtime. Configuration caches and partial configuration have been proposed
as possible solutions to this problem, but these techniques suffer from various limitations.
The emergence of dynamically reconfigurable FPGAs also made it possible to perform dynamic
hardware/software partitioning (DHSP), which is the procedure of determining at runtime whether a
computation should be performed using its software or hardware implementation. The drawback of
performing DHSP using configurations that are generated at runtime is that the profiling and the dynamic
generation of configurations require profiling tool and synthesis tool access at runtime. This study
proposes that configuration scheduling algorithms, which perform DHSP using statically generated
configurations, can be developed to combine the advantages and reduce the major disadvantages of
current approaches. A case study is used to compare and evaluate the tradeoffs between the currently
existing approach for dynamic reconfiguration and the DHSP configuration scheduling algorithm based
approach proposed in the study. A simulation model is developed to examine the performance of the
various configuration scheduling algorithms. First, the difference in the execution time between the
different approaches is analyzed. Afterwards, other important design criteria such as power consumption,
energy consumption, area requirements and unit cost are analyzed and estimated. Also, business and
marketing considerations such as time to market and development cost are considered.
The study illustrates how different types of DHSP configuration scheduling algorithms can be
implemented and how their performance can be evaluated using a variety of software applications. It is
also shown how to evaluate when which of the approaches would be more advantageous by determining
the tradeoffs that exist between them. Also the underlying factors that affect when which design
alternative is more advantageous are determined and analyzed. The study shows that configuration
scheduling algorithms, which perform DHSP using statically generated configurations, can be developed
to combine the advantages and reduce some major disadvantages of current approaches. It is shown that
there are situations where DHSP configuration scheduling algorithms can be more advantageous than the
other approaches
Generation of Application Specific Hardware Extensions for Hybrid Architectures: The Development of PIRANHA - A GCC Plugin for High-Level-Synthesis
Architectures combining a field programmable gate array (FPGA) and a general-purpose processor on a single chip became increasingly popular in recent years. On the one hand, such hybrid architectures facilitate the use of application specific hardware accelerators that improve the performance of the software on the host processor. On the other hand, it obliges system designers to handle the whole process of hardware/software co-design. The complexity of this process is still one of the main reasons, that hinders the widespread use of hybrid architectures. Thus, an automated process that aids programmers with the hardware/software partitioning and the generation of application specific accelerators is an important issue. The method presented in this thesis neither requires restrictions of the used high-level-language nor special source code annotations. Usually, this is an entry barrier for programmers without deeper understanding of the underlying hardware platform.
This thesis introduces a seamless programming flow that allows generating hardware accelerators for unrestricted, legacy C code. The implementation consists of a GCC plugin that automatically identifies application hot-spots and generates hardware accelerators accordingly. Apart from the accelerator implementation in a hardware description language, the compiler plugin provides the generation of a host processor interfaces and, if necessary, a prototypical integration with the host operating system. An evaluation with typical embedded applications shows general benefits of the approach, but also reveals limiting factors that hamper possible performance improvements
Implementation of an AMIDAR-based Java Processor
This thesis presents a Java processor based on the Adaptive Microinstruction Driven Architecture (AMIDAR). This processor is intended as a research platform for investigating adaptive processor architectures. Combined with a configurable accelerator, it is able to detect and speed up hot spots of arbitrary applications dynamically. In contrast to classical RISC processors, an AMIDAR-based processor consists of four main types of components: a token machine, functional units (FUs), a token distribution network and an FU interconnect structure. The token machine is a specialized functional unit and controls the other FUs by means of tokens. These tokens are delivered to the FUs over the token distribution network. The tokens inform the FUs about what to do with input data and where to send the results. Data is exchanged among the FUs over the FU interconnect structure. Based on the virtual machine architecture defined by the Java bytecode, a total of six FUs have been developed for the Java processor, namely a frame stack, a heap manager, a thread scheduler, a debugger, an integer ALU and a floating-point unit. Using these FUs, the processor can already execute the SPEC JVM98 benchmark suite properly. This indicates that it can be employed to run a broad variety of applications rather than embedded software only. Besides bytecode execution, several enhanced features have also been implemented in the processor to improve its performance and usability. First, the processor includes an object cache using a novel cache index generation scheme that provides a better average hit rate than the classical XOR-based scheme. Second, a hardware garbage collector has been integrated into the heap manager, which greatly reduces the overhead caused by the garbage collection process. Third, thread scheduling has been realized in hardware as well, which allows it to be performed concurrently with the running application. Furthermore, a complete debugging framework has been developed for the processor, which provides powerful debugging functionalities at both software and hardware levels