6 research outputs found
Defragmenting the Module Layout of a Partially Reconfigurable Device
Modern generations of field-programmable gate arrays (FPGAs) allow for
partial reconfiguration. In an online context, where the sequence of modules to
be loaded on the FPGA is unknown beforehand, repeated insertion and deletion of
modules leads to progressive fragmentation of the available space, making
defragmentation an important issue. We address this problem by propose an
online and an offline component for the defragmentation of the available space.
We consider defragmenting the module layout on a reconfigurable device. This
corresponds to solving a two-dimensional strip packing problem. Problems of
this type are NP-hard in the strong sense, and previous algorithmic results are
rather limited. Based on a graph-theoretic characterization of feasible
packings, we develop a method that can solve two-dimensional defragmentation
instances of practical size to optimality. Our approach is validated for a set
of benchmark instances.Comment: 10 pages, 11 figures, 1 table, Latex, to appear in "Engineering of
Reconfigurable Systems and Algorithms" as a "Distinguished Paper
Resource-efficient dynamic partial reconfiguration on FPGAs for space instruments
Field-Programmable Gate Arrays (FPGAs) provide highly flexible platforms to implement sophisticated data processing for scientific space instruments. The dynamic partial reconfiguration (DPR) capability of FPGAs allows it to schedule HW tasks. While this feature adds another dimension of processing power that can be exploited without significantly increasing system complexity and power consumption, there are still several challenges for an efficient DPR use. State-of-the-art concepts concentrate either on resource-efficient implementations at design time or flexible HW task scheduling at runtime. In this paper we propose a balanced algorithm that considers both optimization goals and is well suited for resource-limited space applications
Task scheduling and placement for reconfigurable devices
Partially reconfigurable devices allow the execution of different tasks at the same time, removing tasks when they finish and inserting new tasks when they arrive. This dissertation investigates scheduling and placing real-time tasks (tasks with deadline) on reconfigurable devices. One basic scheduler is the First-Fit scheduler. By allowing the First-Fit scheduler to retry tasks while they can satisfy their deadlines, we found that its performance can be enhanced to be better than other schedulers. We also proposed a placement idea based on partitioning the reconfigurable area into regions of various widths, assigning a task to a region based on its width. This idea has a similar rejection rate to a First-Fit scheduler that retries placing tasks and performs better than the First-Fit that does not retry tasks. Also, this regions-based scheduling method has a better running time. Managing how the space will be shared among tasks is a problems of interest. The main function of the free-space manager is to maintain information about the free space (areas not used by active tasks) after any placement or deletion of a task. Speed and efficiency of the free-space data structure are important as well as its effect on scheduler performance. We introduce the use of maximal horizontal strips and maximal vertical strips to represent free space. This resulted in a faster free space manager compared to what has been used in the area. Most researchers in the area of scheduling on reconfigurable devices assumed a homogeneous FPGA with only CLBs in the reconfigurable area. Most reconfigurable devices offered in the market, however, are not homogeneous but heterogeneous with other components between CLBs. We studied the effect of heterogeneity on the performance of schedulers designed for a homogeneous structure. We found that current schedulers result in worse performance when applied to a heterogeneous structure, but by simple modifications, we can apply them to a heterogeneous structure and achieve good performance. Consequently, the approach of studying homogeneous FPGAs is a valid one, as the scheduling ideas discovered there do carry over to heterogeneous FPGAs
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Towards the development of a reliable reconfigurable real-time operating system on FPGAs
In the last two decades, Field Programmable Gate Arrays (FPGAs) have been
rapidly developed from simple “glue-logic” to a powerful platform capable of
implementing a System on Chip (SoC). Modern FPGAs achieve not only the high
performance compared with General Purpose Processors (GPPs), thanks to hardware
parallelism and dedication, but also better programming flexibility, in comparison to
Application Specific Integrated Circuits (ASICs). Moreover, the hardware
programming flexibility of FPGAs is further harnessed for both performance and
manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR
allows a part or parts of a circuit to be reconfigured at run-time, without interrupting
the rest of the chip’s operation. As a result, hardware resources can be more
efficiently exploited since the chip resources can be reused by swapping in or out
hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR
improves fault tolerance against transient errors and permanent damage, such as
Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid
error accumulation. Furthermore, power and heat can be reduced by removing
finished or idle tasks from the chip. For all these reasons above, DPR has
significantly promoted Reconfigurable Computing (RC) and has become a very hot
topic. However, since hardware integration is increasing at an exponential rate, and
applications are becoming more complex with the growth of user demands, highlevel
application design and low-level hardware implementation are increasingly
separated and layered. As a consequence, users can obtain little advantage from DPR
without the support of system-level middleware.
To bridge the gap between the high-level application and the low-level hardware
implementation, this thesis presents the important contributions towards a Reliable,
Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the
user exploitation of DPR from the application level, by managing the complex
hardware in the background. In R3TOS, hardware tasks behave just like software
tasks, which can be created, scheduled, and mapped to different computing resources
on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time
scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and
EVC), which schedule tasks in real-time and circumvent emerging faults while
maintaining more compact empty areas. 3) Design and implementation of a faulttolerant
microprocessor by harnessing the existing FPGA resources, such as Error
Correction Code (ECC) and configuration primitives. 4) A novel symmetric
multiprocessing (SMP)-based architectures that supports shared memory programing
interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest
Neighbour classifier, which is a non-parametric classification algorithm widely used
in various fields of data mining; and b) pairwise sequence alignment, namely the
Smith Waterman algorithm, used for identifying similarities between two biological
sequences.
R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking
applications, whereby resources can be dynamically managed in respect of
user requirements and hardware availability. Benefiting from this, not only the
hardware resources can be more efficiently used, but also the system performance
can be significantly increased. Results show that the scheduling and allocating
efficiencies have been improved up to 2x, and the overall system performance is
further improved by ~2.5x. Future work includes the development of Network on
Chip (NoC), which is expected to further increase the communication throughput; as
well as the standardization and automation of our system design, which will be
carried out in line with the enablement of other high-level synthesis tools, to allow
application developers to benefit from the system in a more efficient manner