8 research outputs found
Dynamic partial reconfiguration management for high performance and reliability in FPGAs
Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement
small “glue logic” circuitries. The high-density of reconfigurable logic resources in
today’s FPGAs enable the implementation of large systems in a single chip. FPGAs
are highly flexible devices; their functionality can be altered by simply loading a new
binary file in their configuration memory. While the flexibility of FPGAs is
comparable to General-Purpose Processors (GPPs), in the sense that different
functions can be performed using the same hardware, the performance gain that can
be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the
ability for customisation of parallel computational architectures.
Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of
certain blocks on the chip while the rest of the FPGA is operational. DPR has
sparked the interest of researchers to explore new computational platforms where
computational tasks are off-loaded from a main CPU to be executed using dedicated
reconfigurable hardware accelerators configured on demand at run-time. By having a
battery of custom accelerators which can be swapped in and out of the FPGA at runtime,
a higher computational density can be achieved compared to static systems
where the accelerators are bound to fixed locations within the chip. Furthermore, the
ability of relocating these accelerators across several locations on the chip allows for
the implementation of adaptive systems which can mitigate emerging faults in the
FPGA chip when operating in harsh environments. By porting the appropriate fault
mitigation techniques in such computational platforms, the advantages of FPGAs can
be harnessed in different applications in space and military electronics where FPGAs
are usually seen as unreliable devices due to their sensitivity to radiation and extreme
environmental conditions.
In light of the above, this thesis investigates the deployment of DPR as: 1) a method
for enhancing performance by efficient exploitation of the FPGA resources, and 2) a
method for enhancing the reliability of systems intended to operate in harsh
environments. Achieving optimal performance in such systems requires an efficient
internal configuration management system to manage the reconfiguration and
execution of the reconfigurable modules in the FPGA. In addition, the system needs
to support “fault-resilience” features by integrating parameterisable fault detection
and recovery capabilities to meet the reliability standard of fault-tolerant
applications. This thesis addresses all the design and implementation aspects of an
Internal Configuration Manger (ICM) which supports a novel bitstream relocation
model to enable the placement of relocatable accelerators across several locations on
the FPGA chip. In addition to supporting all the configuration capabilities required to
implement a Reconfigurable Operating System (ROS), the proposed ICM also
supports the novel multiple-clone configuration technique which allows for cloning
several instances of the same hardware accelerator at the same time resulting in much
shorter configuration time compared to traditional configuration techniques. A faulttolerant
(FT) version of the proposed ICM which supports a comprehensive faultrecovery
scheme is also introduced in this thesis. The proposed FT-ICM is designed
with a much smaller area footprint compared to Triple Modular Redundancy (TMR)
hardening techniques while keeping a comparable level of fault-resilience.
The capabilities of the proposed ICM system are demonstrated with two novel
applications. The first application demonstrates a proof-of-concept reliable FPGA
server solution used for executing encryption/decryption queries. The proposed
server deploys bitstream relocation and modular redundancy to mitigate both
permanent and transient faults in the device. It also deploys a novel Built-In Self-
Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent
faults in the system at run-time. The second application is a data mining application
where DPR is used to increase the computational density of a system used to
implement the Frequent Itemset Mining (FIM) problem
Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing
The sheer hardware-based computational performance and programming flexibility
offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs)
make them attractive for computing in applications that require high performance,
availability, reliability, real-time processing, and high efficiency. Fueled by fabrication
process scaling, modern reconfigurable devices come with ever greater quantities of
on-chip resources, allowing a more complex variety of applications to be developed.
Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now
embrace reconfigurable computing devices likes FPGAs to meet their critical
computing needs. In addition, the capability to autonomously reprogramme these
devices in the field is being exploited for reliability in application domains like
aerospace, defence, military, and nuclear power stations. In such applications, real-time
computing is important and is often a necessity for reliability. As such, applications and
algorithms resident on these devices must be implemented with sufficient
considerations for real-time processing and reliability.
Often, to manage a reconfigurable hardware device as a computing platform for a
multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems
(ROSes) have been proposed to give a software look to hardware-based computation.
The key requirements of a ROS include partitioning, task scheduling and allocation,
task configuration or loading, and inter-task communication and synchronization.
Existing ROSes have met these requirements to varied extents. However, they are
limited in reliability, especially regarding the flexibility of placing the hardware circuits
of tasks on device’s chip area, the problem arising more from the partitioning
approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip
inter-communication among tasks, hampering the flexibility of runtime task
relocation for reliability.
This thesis proposes the enabling frameworks for reliable, available, real-time,
efficient, secure, and high-performance reconfigurable computing by providing
techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware.
This work provides task configuration infrastructures for reliable reconfigurable
computing. Key features, especially reliability-enabling functionalities, which have
been given little or no attention in state-of-the-art are implemented. These features
include internal register read and write for device diagnosis; configuration operation
abort mechanism, and tightly integrated selective-area scanning, which aims to
optimize access to the device’s reconfiguration port for both task loading and error
mitigation.
In addition, this thesis proposes a novel reliability-aware inter-task communication
framework that exploits the availability of dedicated clocking infrastructures in a
typical FPGA to provide inter-task communication and synchronization. The clock
buffers and networks of an FPGA use dedicated routing resources, which are distinct
from the general routing resources. As such, deploying these dedicated resources for
communication sidesteps the restriction of static routes and allows a better relocation
of circuits for reliability purposes.
For evaluation, a case study that uses a NASA/JPL spectrometer data processing
application is employed to demonstrate the improved reliability brought about by the
implemented configuration controller and the reliability-aware dynamic
communication infrastructure. It is observed that up to 74% time saving can be achieved
for selective-area error mitigation when compared to state-of-the-art vendor
implementations. Moreover, an improvement in overall system reliability is observed
when the proposed dynamic communication scheme is deployed in the data processing
application.
Finally, one area of reconfigurable computing that has received insufficient
attention is security. Meanwhile, considering the nature of applications which now turn
to reconfigurable computing for accelerating compute-intensive processes, a high
premium is now placed on security, not only of the device but also of the applications,
from loading to runtime execution. To address security concerns, a novel secure and
efficient task configuration technique for task relocation is also investigated, providing
configuration time savings of up to 32% or 83%, depending on the device; and resource
usage savings in excess of 90% compared to state-of-the-art
Towards the development of a reliable reconfigurable real-time operating system on FPGAs
In the last two decades, Field Programmable Gate Arrays (FPGAs) have been
rapidly developed from simple “glue-logic” to a powerful platform capable of
implementing a System on Chip (SoC). Modern FPGAs achieve not only the high
performance compared with General Purpose Processors (GPPs), thanks to hardware
parallelism and dedication, but also better programming flexibility, in comparison to
Application Specific Integrated Circuits (ASICs). Moreover, the hardware
programming flexibility of FPGAs is further harnessed for both performance and
manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR
allows a part or parts of a circuit to be reconfigured at run-time, without interrupting
the rest of the chip’s operation. As a result, hardware resources can be more
efficiently exploited since the chip resources can be reused by swapping in or out
hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR
improves fault tolerance against transient errors and permanent damage, such as
Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid
error accumulation. Furthermore, power and heat can be reduced by removing
finished or idle tasks from the chip. For all these reasons above, DPR has
significantly promoted Reconfigurable Computing (RC) and has become a very hot
topic. However, since hardware integration is increasing at an exponential rate, and
applications are becoming more complex with the growth of user demands, highlevel
application design and low-level hardware implementation are increasingly
separated and layered. As a consequence, users can obtain little advantage from DPR
without the support of system-level middleware.
To bridge the gap between the high-level application and the low-level hardware
implementation, this thesis presents the important contributions towards a Reliable,
Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the
user exploitation of DPR from the application level, by managing the complex
hardware in the background. In R3TOS, hardware tasks behave just like software
tasks, which can be created, scheduled, and mapped to different computing resources
on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time
scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and
EVC), which schedule tasks in real-time and circumvent emerging faults while
maintaining more compact empty areas. 3) Design and implementation of a faulttolerant
microprocessor by harnessing the existing FPGA resources, such as Error
Correction Code (ECC) and configuration primitives. 4) A novel symmetric
multiprocessing (SMP)-based architectures that supports shared memory programing
interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest
Neighbour classifier, which is a non-parametric classification algorithm widely used
in various fields of data mining; and b) pairwise sequence alignment, namely the
Smith Waterman algorithm, used for identifying similarities between two biological
sequences.
R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking
applications, whereby resources can be dynamically managed in respect of
user requirements and hardware availability. Benefiting from this, not only the
hardware resources can be more efficiently used, but also the system performance
can be significantly increased. Results show that the scheduling and allocating
efficiencies have been improved up to 2x, and the overall system performance is
further improved by ~2.5x. Future work includes the development of Network on
Chip (NoC), which is expected to further increase the communication throughput; as
well as the standardization and automation of our system design, which will be
carried out in line with the enablement of other high-level synthesis tools, to allow
application developers to benefit from the system in a more efficient manner
Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems
Current FPGAs can implement large systems because of the high density of
reconfigurable logic resources in a single chip. FPGAs are comprehensive devices
that combine flexibility and high performance in the same platform compared to
other platform such as General-Purpose Processors (GPPs) and Application Specific
Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by
introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for
changing the functionality of part of the system while other parts are functioning.
FPGAs became an important platform for digital image processing applications
because of the aforementioned features. They can fulfil the need of efficient and
flexible platforms that execute imaging tasks efficiently as well as the reliably with
low power, high performance and high flexibility. The use of FPGAs as accelerators
for image processing outperforms most of the current solutions. Current FPGA
solutions can to load part of the imaging application that needs high computational
power on dedicated reconfigurable hardware accelerators while other parts are
working on the traditional solution to increase the system performance. Moreover,
the use of the DPR feature enhances the flexibility of image processing further by
swapping accelerators in and out at run-time. The use of fault mitigation techniques
in FPGAs enables imaging applications to operate in harsh environments following
the fact that FPGAs are sensitive to radiation and extreme conditions.
The aim of this thesis is to present a platform for efficient implementations of
imaging tasks. The research uses FPGAs as the key component of this platform and
uses the concept of DPR to increase the performance, flexibility, to reduce the power
dissipation and to expand the cycle of possible imaging applications. In this context,
it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP)
stages, the core part of most imaging devices. The thesis has a number of novel
concepts. The first novel concept is the use of FPGA hardware environment and
DPR feature to increase the parallelism and achieve high flexibility. The concept also
increases the performance and reduces the power consumption and area utilisation.
Based on this concept, the following implementations are presented in this thesis: An
implementation of Adams Hamilton Demosaicing algorithm for camera colour
interpolation, which exploits the FPGA parallelism to outperform other equivalents.
In addition, an implementation of Automatic White Balance (AWB), another IPP
stage that employs DPR feature to prove the mentioned novelty aspects. Another
novel concept in this thesis is presented in chapter 6, which uses DPR feature to
develop a novel flexible imaging system that requires less logic and can be
implemented in small FPGAs. The system can be employed as a template for any
imaging application with no limitation. Moreover, discussed in this thesis is a novel
reliable version of the imaging system that adopts novel techniques including
scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to
detect and correct errors using the Internal Configuration Access Port (ICAP)
primitive. These techniques exploit the datapath-based nature of the implemented
imaging system to improve the system's overall reliability. The thesis presents a
proposal for integrating the imaging system with the Robust Reliable Reconfigurable
Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the
system. The proposal shows the suitability of the proposed DPR imaging system to
be used as part of the core system of autonomous cars because of its unbounded
flexibility. These novel works are presented in a number of publications as shown in section
1.3 later in this thesis
A Practical Investigation into Achieving Bio-Plausibility in Evo-Devo Neural Microcircuits Feasible in an FPGA
Many researchers has conjectured, argued, or in some cases demonstrated, that bio-plausibility can bring about emergent properties such as adaptability, scalability, fault-tolerance, self-repair, reliability, and autonomy to bio-inspired intelligent systems. Evolutionary-developmental (evo-devo) spiking neural networks are a very bio-plausible mixture of such bio-inspired intelligent systems that have been proposed and studied by a few researchers. However, the general trend is that the complexity and thus the computational cost grow with the bio-plausibility of the system. FPGAs (Field- Programmable Gate Arrays) have been used and proved to be one of the flexible and cost efficient hardware platforms for research' and development of such evo-devo systems. However, mapping a bio-plausible evo-devo spiking neural network to an FPGA is a daunting task full of different constraints and trade-offs that makes it, if not infeasible, very challenging.
This thesis explores the challenges, trade-offs, constraints, practical issues, and some possible approaches in achieving bio-plausibility in creating evolutionary developmental spiking neural microcircuits in an FPGA through a practical investigation along with a series of case studies. In this study, the system performance, cost, reliability, scalability, availability, and design and testing time and complexity are defined as measures for feasibility of a system and structural accuracy and consistency with the current knowledge in biology as measures for bio-plausibility. Investigation of the challenges starts with the hardware platform selection and then neuron, cortex, and evo-devo models and integration of these models into a whole bio-inspired intelligent system are examined one by one. For further practical investigation, a new PLAQIF Digital Neuron model, a novel Cortex model, and a new multicellular LGRN evo-devo model are designed, implemented and tested as case studies. Results and their implications for the researchers, designers of such systems, and FPGA manufacturers are discussed and concluded in form of general trends, trade-offs, suggestions, and recommendations
Sustainable Trusted Computing: A Novel Approach for a Flexible and Secure Update of Cryptographic Engines on a Trusted Platform Module
Trusted computing is gaining an increasing acceptance in the
industry and finding its way to cloud computing.
With this penetration, the question arises whether the concept of hardwired security modules will cope with the increasing sophistication and security requirements of future IT systems and the
ever expanding threats and violations.
So far, embedding cryptographic hardware engines into the Trusted Platform Module (TPM) has been regarded as a security feature. However, new developments in cryptanalysis, side-channel analysis, and the emergence of novel powerful computing systems, such as quantum
computers, can render this approach useless.
Given that, the question arises: Do we have to throw away all TPMs and
lose the data protected by them, if someday a cryptographic
engine on the TPM becomes insecure?
To address this question, we present a novel architecture called Sustainable Trusted Platform Module (STPM), which guarantees a secure update of the TPM cryptographic engines without compromising the system’s trustworthiness.
The STPM architecture has been implemented as a proof-of-concept on top of a Xilinx Virtex-5 FPGA platform, demonstrating the test cases with an update of the fundamental hash and asymmetric engines of the TPM
Readout Electronics for a Novel Animal PET Scanner using Field Programmable Gate Arrays : System Definition, Implementation and Assessment
A digital FPGA-based data acquisition system for a novel preclinical PET detector developed at the University of Oslo will be described. The detector, called ComPET, employs an inventive geometry with 600 LYSO scintillator crystals interleaved with 400 wavelength-shifters, grouped into 4 modules and arranged in a rectangular fashion to attain high photon sensitivity and high spatial resolution with minimal shift-variance. By means of APDs and a custom analog front-end the detector response is converted to a digital output, its rising edge and width being a measure of the γ-photon arrival time and energy, respectively. An FPGA samples up to 84 of these channels with deserialisers clocked at up to 1 GHz, computes and stores the event photon arrival time, energy and location, provides a fan-in structure to collect data from these channels, and sends these over Ethernet to a data acquisition system. The system allows for coincidence- and energy-windows to be set for improved contrast resolution, can handle sustained event-rates of 100 Mevents/s with full 3D-readout, and is parametrised for ease of maintainability and flexibility