3 research outputs found
A Functionality-Based Runtime Relocation System for Circuits on Heterogeneous FPGAs
Runtime relocation of circuits on field-programmable gate arrays (FPGAs) has been proposed for achieving many desirable features including fault tolerance, defragmentation, and system load balancing. However, the changes in the architectural composition of FPGAs have made relocation more challenging mainly because FPGAs have become more heterogeneous. Previous and state-of-the-art circuit relocation systems on FPGAs have relied only on direct bitstream relocation which requires the source and destination resource layouts to be the same, as well as access to the design bitstream for manipulation. Hence, their efficiency on modern heterogeneous chips greatly reduces, and mostly cannot be applied to encrypted bitstreams of intellectual property blocks. In this brief, we present a circuit relocator which augments direct bitstream relocation with a functionality-based relocation scheme. We demonstrate the feasibility of the proposed technique using a CORDIC application and show that an average of over 2.6-fold increase in the number of relocations can be obtained compared to only direct bitstream relocation at the expense of a small memory overhead and manageable relocation time for this case study
Efficient runtime placement management for high performance and reliability in COTS FPGAs
Designing high-performance, fault-tolerant multisensory electronic systems for
hostile environments such as nuclear plants and outer space within the constraints of
cost, power and flexibility is challenging. Issues such as ionizing radiation, extreme
temperature and ageing can lead to faults in the electronics of these systems. In
addition, the remote nature of these environments demands a level of flexibility and
autonomy in their operations. The standard practice of using specially hardened
electronic devices for such systems is not only very expensive but also has limited
flexibility.
This thesis proposes novel techniques that promote the use of Commercial Off-The-
Shelf (COTS) reconfigurable devices to meet the challenges of high-performance
systems for hostile environments. Reconfigurable hardware such as Field
Programmable Gate Arrays (FPGA) have a unique combination of flexibility and
high performance. The flexibility offered through features such as dynamic partial
reconfiguration (DPR) can be harnessed not only to achieve cost-effective designs as
a smaller area can be used to execute multiple tasks, but also to improve the
reliability of a system as a circuit on one portion of the device can be physically
relocated to another portion in the case of fault occurrence. However, to harness
these potentials for high performance and reliability in a cost-effective manner, novel
runtime management tools are required. Most runtime support tools for
reconfigurable devices are based on ideal models which do not adequately consider
the limitations of realistic FPGAs, in particular modern FPGAs which are
increasingly heterogeneous. Specifically, these tools lack efficient mechanisms for
ensuring a high utilization of FPGA resources, including the FPGA area and the
configuration port and clocking resources, in a reliable manner.
To ensure high utilization of reconfigurable device area, placement management is a
key aspect of these tools. This thesis presents novel techniques for the management
of hardware task placement on COTS reconfigurable devices for high performance
and reliability. To this end, it addresses design-time issues that affect efficient
hardware task placement, with a focus on reliability. It also presents techniques to
maximize the utilization of the FPGA area in runtime, including techniques to
minimize fragmentation. Fragmentation leads to the creation of unusable areas due to
dynamic placement of tasks and the heterogeneity of the resources on the chip.
Moreover, this thesis also presents an efficient task reuse mechanism to improve the
availability of the internal configuration infrastructure of the FPGA for critical
responsibilities like error mitigation. The task reuse scheme, unlike previous
approaches, also improves the utilization of the chip area by offering
defragmentation.
Task relocation, which involves changing the physical location of circuits is a
technique for error mitigation and high performance. Hence, this thesis also provides
a functionality-based relocation mechanism for improving the number of locations to
which tasks can be relocated on heterogeneous FPGAs. As tasks are relocated, clock
networks need to be routed to them. As such, a reliability-aware technique of clock
network routing to tasks after placement is also proposed.
Finally, this thesis offers a prototype implementation and characterization of a
placement management system (PMS) which is an integration of the aforementioned
techniques. The performance of most of the proposed techniques are tested using
data processing tasks of a NASA JPL spectrometer application. The results show that
the proposed techniques have potentials to improve the reliability and performance of
applications in hostile environment compared to state-of-the-art techniques. The task
optimization technique presented leads to better capacity to circumvent permanent
faults on COTS FPGAs compared to state-of-the-art approaches (48.6% more errors
were circumvented for the JPL spectrometer application). The proposed task reuse
scheme leads to approximately 29% saving in the amount of configuration time. This
frees up the internal configuration interface for more error mitigation operations. In
addition, the proposed PMS has a worst-case latency of less than 50% of that of state-of-
the-art runtime placement systems, while maintaining the same level of placement
quality and resource overhead
Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing
The sheer hardware-based computational performance and programming flexibility
offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs)
make them attractive for computing in applications that require high performance,
availability, reliability, real-time processing, and high efficiency. Fueled by fabrication
process scaling, modern reconfigurable devices come with ever greater quantities of
on-chip resources, allowing a more complex variety of applications to be developed.
Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now
embrace reconfigurable computing devices likes FPGAs to meet their critical
computing needs. In addition, the capability to autonomously reprogramme these
devices in the field is being exploited for reliability in application domains like
aerospace, defence, military, and nuclear power stations. In such applications, real-time
computing is important and is often a necessity for reliability. As such, applications and
algorithms resident on these devices must be implemented with sufficient
considerations for real-time processing and reliability.
Often, to manage a reconfigurable hardware device as a computing platform for a
multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems
(ROSes) have been proposed to give a software look to hardware-based computation.
The key requirements of a ROS include partitioning, task scheduling and allocation,
task configuration or loading, and inter-task communication and synchronization.
Existing ROSes have met these requirements to varied extents. However, they are
limited in reliability, especially regarding the flexibility of placing the hardware circuits
of tasks on device’s chip area, the problem arising more from the partitioning
approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip
inter-communication among tasks, hampering the flexibility of runtime task
relocation for reliability.
This thesis proposes the enabling frameworks for reliable, available, real-time,
efficient, secure, and high-performance reconfigurable computing by providing
techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware.
This work provides task configuration infrastructures for reliable reconfigurable
computing. Key features, especially reliability-enabling functionalities, which have
been given little or no attention in state-of-the-art are implemented. These features
include internal register read and write for device diagnosis; configuration operation
abort mechanism, and tightly integrated selective-area scanning, which aims to
optimize access to the device’s reconfiguration port for both task loading and error
mitigation.
In addition, this thesis proposes a novel reliability-aware inter-task communication
framework that exploits the availability of dedicated clocking infrastructures in a
typical FPGA to provide inter-task communication and synchronization. The clock
buffers and networks of an FPGA use dedicated routing resources, which are distinct
from the general routing resources. As such, deploying these dedicated resources for
communication sidesteps the restriction of static routes and allows a better relocation
of circuits for reliability purposes.
For evaluation, a case study that uses a NASA/JPL spectrometer data processing
application is employed to demonstrate the improved reliability brought about by the
implemented configuration controller and the reliability-aware dynamic
communication infrastructure. It is observed that up to 74% time saving can be achieved
for selective-area error mitigation when compared to state-of-the-art vendor
implementations. Moreover, an improvement in overall system reliability is observed
when the proposed dynamic communication scheme is deployed in the data processing
application.
Finally, one area of reconfigurable computing that has received insufficient
attention is security. Meanwhile, considering the nature of applications which now turn
to reconfigurable computing for accelerating compute-intensive processes, a high
premium is now placed on security, not only of the device but also of the applications,
from loading to runtime execution. To address security concerns, a novel secure and
efficient task configuration technique for task relocation is also investigated, providing
configuration time savings of up to 32% or 83%, depending on the device; and resource
usage savings in excess of 90% compared to state-of-the-art