3 research outputs found
Proxy Circuits for Fault-Tolerant Primitive Interfacing in Reconfigurable Devices Targeting Extreme Environments
Continuous interface access to device-level primitives in reconfigurable devices in extreme environments is key to reliable operation. However, it is possible for a primitive's interface controller, which is static to be rendered non-operational by a permanent damage in the controller's circuitry. In order to mitigate this, this paper proposes the use of relocatable proxy circuits to provide remote interfacing capability to primitives from anywhere on a reconfigurable device. A demonstration with device register read controller shows that an improvement in fault-tolerance can be achieved
Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing
The sheer hardware-based computational performance and programming flexibility
offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs)
make them attractive for computing in applications that require high performance,
availability, reliability, real-time processing, and high efficiency. Fueled by fabrication
process scaling, modern reconfigurable devices come with ever greater quantities of
on-chip resources, allowing a more complex variety of applications to be developed.
Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now
embrace reconfigurable computing devices likes FPGAs to meet their critical
computing needs. In addition, the capability to autonomously reprogramme these
devices in the field is being exploited for reliability in application domains like
aerospace, defence, military, and nuclear power stations. In such applications, real-time
computing is important and is often a necessity for reliability. As such, applications and
algorithms resident on these devices must be implemented with sufficient
considerations for real-time processing and reliability.
Often, to manage a reconfigurable hardware device as a computing platform for a
multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems
(ROSes) have been proposed to give a software look to hardware-based computation.
The key requirements of a ROS include partitioning, task scheduling and allocation,
task configuration or loading, and inter-task communication and synchronization.
Existing ROSes have met these requirements to varied extents. However, they are
limited in reliability, especially regarding the flexibility of placing the hardware circuits
of tasks on device’s chip area, the problem arising more from the partitioning
approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip
inter-communication among tasks, hampering the flexibility of runtime task
relocation for reliability.
This thesis proposes the enabling frameworks for reliable, available, real-time,
efficient, secure, and high-performance reconfigurable computing by providing
techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware.
This work provides task configuration infrastructures for reliable reconfigurable
computing. Key features, especially reliability-enabling functionalities, which have
been given little or no attention in state-of-the-art are implemented. These features
include internal register read and write for device diagnosis; configuration operation
abort mechanism, and tightly integrated selective-area scanning, which aims to
optimize access to the device’s reconfiguration port for both task loading and error
mitigation.
In addition, this thesis proposes a novel reliability-aware inter-task communication
framework that exploits the availability of dedicated clocking infrastructures in a
typical FPGA to provide inter-task communication and synchronization. The clock
buffers and networks of an FPGA use dedicated routing resources, which are distinct
from the general routing resources. As such, deploying these dedicated resources for
communication sidesteps the restriction of static routes and allows a better relocation
of circuits for reliability purposes.
For evaluation, a case study that uses a NASA/JPL spectrometer data processing
application is employed to demonstrate the improved reliability brought about by the
implemented configuration controller and the reliability-aware dynamic
communication infrastructure. It is observed that up to 74% time saving can be achieved
for selective-area error mitigation when compared to state-of-the-art vendor
implementations. Moreover, an improvement in overall system reliability is observed
when the proposed dynamic communication scheme is deployed in the data processing
application.
Finally, one area of reconfigurable computing that has received insufficient
attention is security. Meanwhile, considering the nature of applications which now turn
to reconfigurable computing for accelerating compute-intensive processes, a high
premium is now placed on security, not only of the device but also of the applications,
from loading to runtime execution. To address security concerns, a novel secure and
efficient task configuration technique for task relocation is also investigated, providing
configuration time savings of up to 32% or 83%, depending on the device; and resource
usage savings in excess of 90% compared to state-of-the-art