33 research outputs found
Runtime Scheduling, Allocation, and Execution of Real-Time Hardware Tasks onto Xilinx FPGAs Subject to Fault Occurrence
This paper describes a novel way to exploit the computation capabilities delivered by modern Field-Programmable Gate Arrays (FPGAs), not only towards a higher performance, but also towards an improved reliability. Computation-specific pieces of circuitry are dynamically scheduled and allocated to different resources on the chip based on a set of novel algorithms which are described in detail in this article. These algorithms consider most of the technological constraints existing in modern partially reconfigurable FPGAs as well as spontaneously occurring faults and emerging permanent damage in the silicon substrate of the chip. In addition, the algorithms target other important aspects such as communications and synchronization among the different computations that are carried out, either concurrently or at different times. The effectiveness of the proposed algorithms is tested by means of a wide range of synthetic simulations, and, notably, a proof-of-concept implementation of them using real FPGA hardware is outlined
An Efficient Data Structure for Dynamic Two-Dimensional Reconfiguration
In the presence of dynamic insertions and deletions into a partially
reconfigurable FPGA, fragmentation is unavoidable. This poses the challenge of
developing efficient approaches to dynamic defragmentation and reallocation.
One key aspect is to develop efficient algorithms and data structures that
exploit the two-dimensional geometry of a chip, instead of just one. We propose
a new method for this task, based on the fractal structure of a quadtree, which
allows dynamic segmentation of the chip area, along with dynamically adjusting
the necessary communication infrastructure. We describe a number of algorithmic
aspects, and present different solutions. We also provide a number of basic
simulations that indicate that the theoretical worst-case bound may be
pessimistic.Comment: 11 pages, 12 figures; full version of extended abstract that appeared
in ARCS 201
Towards the development of a reliable reconfigurable real-time operating system on FPGAs
In the last two decades, Field Programmable Gate Arrays (FPGAs) have been
rapidly developed from simple âglue-logicâ to a powerful platform capable of
implementing a System on Chip (SoC). Modern FPGAs achieve not only the high
performance compared with General Purpose Processors (GPPs), thanks to hardware
parallelism and dedication, but also better programming flexibility, in comparison to
Application Specific Integrated Circuits (ASICs). Moreover, the hardware
programming flexibility of FPGAs is further harnessed for both performance and
manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR
allows a part or parts of a circuit to be reconfigured at run-time, without interrupting
the rest of the chipâs operation. As a result, hardware resources can be more
efficiently exploited since the chip resources can be reused by swapping in or out
hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR
improves fault tolerance against transient errors and permanent damage, such as
Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid
error accumulation. Furthermore, power and heat can be reduced by removing
finished or idle tasks from the chip. For all these reasons above, DPR has
significantly promoted Reconfigurable Computing (RC) and has become a very hot
topic. However, since hardware integration is increasing at an exponential rate, and
applications are becoming more complex with the growth of user demands, highlevel
application design and low-level hardware implementation are increasingly
separated and layered. As a consequence, users can obtain little advantage from DPR
without the support of system-level middleware.
To bridge the gap between the high-level application and the low-level hardware
implementation, this thesis presents the important contributions towards a Reliable,
Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the
user exploitation of DPR from the application level, by managing the complex
hardware in the background. In R3TOS, hardware tasks behave just like software
tasks, which can be created, scheduled, and mapped to different computing resources
on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time
scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and
EVC), which schedule tasks in real-time and circumvent emerging faults while
maintaining more compact empty areas. 3) Design and implementation of a faulttolerant
microprocessor by harnessing the existing FPGA resources, such as Error
Correction Code (ECC) and configuration primitives. 4) A novel symmetric
multiprocessing (SMP)-based architectures that supports shared memory programing
interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest
Neighbour classifier, which is a non-parametric classification algorithm widely used
in various fields of data mining; and b) pairwise sequence alignment, namely the
Smith Waterman algorithm, used for identifying similarities between two biological
sequences.
R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking
applications, whereby resources can be dynamically managed in respect of
user requirements and hardware availability. Benefiting from this, not only the
hardware resources can be more efficiently used, but also the system performance
can be significantly increased. Results show that the scheduling and allocating
efficiencies have been improved up to 2x, and the overall system performance is
further improved by ~2.5x. Future work includes the development of Network on
Chip (NoC), which is expected to further increase the communication throughput; as
well as the standardization and automation of our system design, which will be
carried out in line with the enablement of other high-level synthesis tools, to allow
application developers to benefit from the system in a more efficient manner
Performance and Power Optimization of Multi-kernel Applications on Multi-FPGA Platforms
L'abstract è presente nell'allegato / the abstract is in the attachmen
Efficient Algorithms for Large-Scale Image Analysis
This work develops highly efficient algorithms for analyzing large images. Applications include object-based change detection and screening. The algorithms are 10-100 times as fast as existing software, sometimes even outperforming FGPA/GPU hardware, because they are designed to suit the computer architecture. This thesis describes the implementation details and the underlying algorithm engineering methodology, so that both may also be applied to other applications
An automated OpenCL FPGA compilation framework targeting a configurable, VLIW chip multiprocessor
Modern system-on-chips augment their baseline CPU with coprocessors and accelerators to increase overall computational capacity and power efficiency, and thus have evolved into heterogeneous systems. Several languages have been developed to enable this paradigm shift, including CUDA and OpenCL. This thesis discusses a unified compilation environment to enable heterogeneous system design through the use of OpenCL and a customised VLIW chip multiprocessor (CMP) architecture, known as the LE1. An LLVM compilation framework was researched and a prototype developed to enable the execution of OpenCL applications on the LE1 CPU. The framework fully automates the compilation flow and supports work-item coalescing to better utilise the CPU cores and alleviate the effects of thread divergence. This thesis discusses in detail both the software stack and target hardware architecture and evaluates the scalability of the proposed framework on a highly precise cycle-accurate simulator. This is achieved through the execution of 12 benchmarks across 240 different machine configurations, as well as further results utilising an incomplete development branch of the compiler. It is shown that the problems generally scale well with the LE1 architecture, up to eight cores, when the memory system becomes a serious bottleneck. Results demonstrate superlinear performance on certain benchmarks (x9 for the bitonic sort benchmark with 8 dual-issue cores) with further improvements from compiler optimisations (x14 for bitonic with the same configuration
Dynamic partial reconfiguration management for high performance and reliability in FPGAs
Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement
small âglue logicâ circuitries. The high-density of reconfigurable logic resources in
todayâs FPGAs enable the implementation of large systems in a single chip. FPGAs
are highly flexible devices; their functionality can be altered by simply loading a new
binary file in their configuration memory. While the flexibility of FPGAs is
comparable to General-Purpose Processors (GPPs), in the sense that different
functions can be performed using the same hardware, the performance gain that can
be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the
ability for customisation of parallel computational architectures.
Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of
certain blocks on the chip while the rest of the FPGA is operational. DPR has
sparked the interest of researchers to explore new computational platforms where
computational tasks are off-loaded from a main CPU to be executed using dedicated
reconfigurable hardware accelerators configured on demand at run-time. By having a
battery of custom accelerators which can be swapped in and out of the FPGA at runtime,
a higher computational density can be achieved compared to static systems
where the accelerators are bound to fixed locations within the chip. Furthermore, the
ability of relocating these accelerators across several locations on the chip allows for
the implementation of adaptive systems which can mitigate emerging faults in the
FPGA chip when operating in harsh environments. By porting the appropriate fault
mitigation techniques in such computational platforms, the advantages of FPGAs can
be harnessed in different applications in space and military electronics where FPGAs
are usually seen as unreliable devices due to their sensitivity to radiation and extreme
environmental conditions.
In light of the above, this thesis investigates the deployment of DPR as: 1) a method
for enhancing performance by efficient exploitation of the FPGA resources, and 2) a
method for enhancing the reliability of systems intended to operate in harsh
environments. Achieving optimal performance in such systems requires an efficient
internal configuration management system to manage the reconfiguration and
execution of the reconfigurable modules in the FPGA. In addition, the system needs
to support âfault-resilienceâ features by integrating parameterisable fault detection
and recovery capabilities to meet the reliability standard of fault-tolerant
applications. This thesis addresses all the design and implementation aspects of an
Internal Configuration Manger (ICM) which supports a novel bitstream relocation
model to enable the placement of relocatable accelerators across several locations on
the FPGA chip. In addition to supporting all the configuration capabilities required to
implement a Reconfigurable Operating System (ROS), the proposed ICM also
supports the novel multiple-clone configuration technique which allows for cloning
several instances of the same hardware accelerator at the same time resulting in much
shorter configuration time compared to traditional configuration techniques. A faulttolerant
(FT) version of the proposed ICM which supports a comprehensive faultrecovery
scheme is also introduced in this thesis. The proposed FT-ICM is designed
with a much smaller area footprint compared to Triple Modular Redundancy (TMR)
hardening techniques while keeping a comparable level of fault-resilience.
The capabilities of the proposed ICM system are demonstrated with two novel
applications. The first application demonstrates a proof-of-concept reliable FPGA
server solution used for executing encryption/decryption queries. The proposed
server deploys bitstream relocation and modular redundancy to mitigate both
permanent and transient faults in the device. It also deploys a novel Built-In Self-
Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent
faults in the system at run-time. The second application is a data mining application
where DPR is used to increase the computational density of a system used to
implement the Frequent Itemset Mining (FIM) problem
Proceedings of the First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014): Porto, Portugal
Proceedings of: First International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2014). Porto (Portugal), August 27-28, 2014
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing