7 research outputs found
Fault-tolerant fpga for mission-critical applications.
One of the devices that play a great role in electronic circuits design, specifically safety-critical design applications, is Field programmable Gate Arrays (FPGAs). This is because of its high performance, re-configurability and low development cost. FPGAs are used in many applications such as data processing, networks, automotive, space and industrial applications. Negative impacts on the reliability of such applications result from moving to smaller feature sizes in the latest FPGA architectures. This increases the need for fault-tolerant techniques to improve reliability and extend system lifetime of FPGA-based applications. In this thesis, two fault-tolerant techniques for FPGA-based applications are proposed with a built-in fault detection region. A low cost fault detection scheme is proposed for detecting faults using the fault detection region used in both schemes. The fault detection scheme primarily detects open faults in the programmable interconnect resources in the FPGAs. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can be detected. For fault recovery, each scheme has its own fault recovery approach. The first approach uses a spare module and a 2-to-1 multiplexer to recover from any fault detected. On the other hand, the second approach recovers from any fault detected using the property of Partial Reconfiguration (PR) in the FPGAs. It relies on identifying a Partially Reconfigurable block (P_b) in the FPGA that is used in the recovery process after the first faulty module is identified in the system. This technique uses only one location to recover from faults in any of the FPGA’s modules and the FPGA interconnects. Simulation results show that both techniques can detect and recover from open faults. In addition, Stuck-At faults and Single Event Upsets (SEUs) fault can also be detected. Finally, both techniques require low area overhead
Towards the development of a reliable reconfigurable real-time operating system on FPGAs
In the last two decades, Field Programmable Gate Arrays (FPGAs) have been
rapidly developed from simple “glue-logic” to a powerful platform capable of
implementing a System on Chip (SoC). Modern FPGAs achieve not only the high
performance compared with General Purpose Processors (GPPs), thanks to hardware
parallelism and dedication, but also better programming flexibility, in comparison to
Application Specific Integrated Circuits (ASICs). Moreover, the hardware
programming flexibility of FPGAs is further harnessed for both performance and
manipulability, which makes Dynamic Partial Reconfiguration (DPR) possible. DPR
allows a part or parts of a circuit to be reconfigured at run-time, without interrupting
the rest of the chip’s operation. As a result, hardware resources can be more
efficiently exploited since the chip resources can be reused by swapping in or out
hardware tasks to or from the chip in a time-multiplexed fashion. In addition, DPR
improves fault tolerance against transient errors and permanent damage, such as
Single Event Upsets (SEUs) can be mitigated by reconfiguring the FPGA to avoid
error accumulation. Furthermore, power and heat can be reduced by removing
finished or idle tasks from the chip. For all these reasons above, DPR has
significantly promoted Reconfigurable Computing (RC) and has become a very hot
topic. However, since hardware integration is increasing at an exponential rate, and
applications are becoming more complex with the growth of user demands, highlevel
application design and low-level hardware implementation are increasingly
separated and layered. As a consequence, users can obtain little advantage from DPR
without the support of system-level middleware.
To bridge the gap between the high-level application and the low-level hardware
implementation, this thesis presents the important contributions towards a Reliable,
Reconfigurable and Real-Time Operating System (R3TOS), which facilitates the
user exploitation of DPR from the application level, by managing the complex
hardware in the background. In R3TOS, hardware tasks behave just like software
tasks, which can be created, scheduled, and mapped to different computing resources
on the fly. The novel contributions of this work are: 1) a novel implementation of an efficient task scheduler and allocator; 2) implementation of a novel real-time
scheduling algorithm (FAEDF) and two efficacious allocating algorithms (EAC and
EVC), which schedule tasks in real-time and circumvent emerging faults while
maintaining more compact empty areas. 3) Design and implementation of a faulttolerant
microprocessor by harnessing the existing FPGA resources, such as Error
Correction Code (ECC) and configuration primitives. 4) A novel symmetric
multiprocessing (SMP)-based architectures that supports shared memory programing
interface. 5) Two demonstrations of the integrated system, including a) the K-Nearest
Neighbour classifier, which is a non-parametric classification algorithm widely used
in various fields of data mining; and b) pairwise sequence alignment, namely the
Smith Waterman algorithm, used for identifying similarities between two biological
sequences.
R3TOS gives considerably higher flexibility to support scalable multi-user, multitasking
applications, whereby resources can be dynamically managed in respect of
user requirements and hardware availability. Benefiting from this, not only the
hardware resources can be more efficiently used, but also the system performance
can be significantly increased. Results show that the scheduling and allocating
efficiencies have been improved up to 2x, and the overall system performance is
further improved by ~2.5x. Future work includes the development of Network on
Chip (NoC), which is expected to further increase the communication throughput; as
well as the standardization and automation of our system design, which will be
carried out in line with the enablement of other high-level synthesis tools, to allow
application developers to benefit from the system in a more efficient manner
Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems
Current FPGAs can implement large systems because of the high density of
reconfigurable logic resources in a single chip. FPGAs are comprehensive devices
that combine flexibility and high performance in the same platform compared to
other platform such as General-Purpose Processors (GPPs) and Application Specific
Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by
introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for
changing the functionality of part of the system while other parts are functioning.
FPGAs became an important platform for digital image processing applications
because of the aforementioned features. They can fulfil the need of efficient and
flexible platforms that execute imaging tasks efficiently as well as the reliably with
low power, high performance and high flexibility. The use of FPGAs as accelerators
for image processing outperforms most of the current solutions. Current FPGA
solutions can to load part of the imaging application that needs high computational
power on dedicated reconfigurable hardware accelerators while other parts are
working on the traditional solution to increase the system performance. Moreover,
the use of the DPR feature enhances the flexibility of image processing further by
swapping accelerators in and out at run-time. The use of fault mitigation techniques
in FPGAs enables imaging applications to operate in harsh environments following
the fact that FPGAs are sensitive to radiation and extreme conditions.
The aim of this thesis is to present a platform for efficient implementations of
imaging tasks. The research uses FPGAs as the key component of this platform and
uses the concept of DPR to increase the performance, flexibility, to reduce the power
dissipation and to expand the cycle of possible imaging applications. In this context,
it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP)
stages, the core part of most imaging devices. The thesis has a number of novel
concepts. The first novel concept is the use of FPGA hardware environment and
DPR feature to increase the parallelism and achieve high flexibility. The concept also
increases the performance and reduces the power consumption and area utilisation.
Based on this concept, the following implementations are presented in this thesis: An
implementation of Adams Hamilton Demosaicing algorithm for camera colour
interpolation, which exploits the FPGA parallelism to outperform other equivalents.
In addition, an implementation of Automatic White Balance (AWB), another IPP
stage that employs DPR feature to prove the mentioned novelty aspects. Another
novel concept in this thesis is presented in chapter 6, which uses DPR feature to
develop a novel flexible imaging system that requires less logic and can be
implemented in small FPGAs. The system can be employed as a template for any
imaging application with no limitation. Moreover, discussed in this thesis is a novel
reliable version of the imaging system that adopts novel techniques including
scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to
detect and correct errors using the Internal Configuration Access Port (ICAP)
primitive. These techniques exploit the datapath-based nature of the implemented
imaging system to improve the system's overall reliability. The thesis presents a
proposal for integrating the imaging system with the Robust Reliable Reconfigurable
Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the
system. The proposal shows the suitability of the proposed DPR imaging system to
be used as part of the core system of autonomous cars because of its unbounded
flexibility. These novel works are presented in a number of publications as shown in section
1.3 later in this thesis
High performance reconfigurable architectures for biological sequence alignment
Bioinformatics and computational biology (BCB) is a rapidly developing
multidisciplinary field which encompasses a wide range of domains, including genomic
sequence alignments. It is a fundamental tool in molecular biology in searching for
homology between sequences. Sequence alignments are currently gaining close attention due
to their great impact on the quality aspects of life such as facilitating early disease diagnosis,
identifying the characteristics of a newly discovered sequence, and drug engineering. With
the vast growth of genomic data, searching for a sequence homology over huge databases
(often measured in gigabytes) is unable to produce results within a realistic time, hence the
need for acceleration. Since the exponential increase of biological databases as a result of the
human genome project (HGP), supercomputers and other parallel architectures such as the
special purpose Very Large Scale Integration (VLSI) chip, Graphic Processing Unit (GPUs)
and Field Programmable Gate Arrays (FPGAs) have become popular acceleration platforms.
Nevertheless, there are always trade-off between area, speed, power, cost, development time
and reusability when selecting an acceleration platform. FPGAs generally offer more
flexibility, higher performance and lower overheads. However, they suffer from a relatively
low level programming model as compared with off-the-shelf microprocessors such as
standard microprocessors and GPUs. Due to the aforementioned limitations, the need has
arisen for optimized FPGA core implementations which are crucial for this technology to
become viable in high performance computing (HPC).
This research proposes the use of state-of-the-art reprogrammable system-on-chip
technology on FPGAs to accelerate three widely-used sequence alignment algorithms; the
Smith-Waterman with affine gap penalty algorithm, the profile hidden Markov model
(HMM) algorithm and the Basic Local Alignment Search Tool (BLAST) algorithm. The
three novel aspects of this research are firstly that the algorithms are designed and
implemented in hardware, with each core achieving the highest performance compared to the
state-of-the-art. Secondly, an efficient scheduling strategy based on the double buffering
technique is adopted into the hardware architectures. Here, when the alignment matrix
computation task is overlapped with the PE configuration in a folded systolic array, the
overall throughput of the core is significantly increased. This is due to the bound PE
configuration time and the parallel PE configuration approach irrespective of the number of
PEs in a systolic array. In addition, the use of only two configuration elements in the PE optimizes hardware resources and enables the scalability of PE systolic arrays without
relying on restricted onboard memory resources. Finally, a new performance metric is
devised, which facilitates the effective comparison of design performance between different
FPGA devices and families. The normalized performance indicator (speed-up per area per
process technology) takes out advantages of the area and lithography technology of any
FPGA resulting in fairer comparisons.
The cores have been designed using Verilog HDL and prototyped on the Alpha Data
ADM-XRC-5LX card with the Virtex-5 XC5VLX110-3FF1153 FPGA. The implementation
results show that the proposed architectures achieved giga cell updates per second (GCUPS)
performances of 26.8, 29.5 and 24.2 respectively for the acceleration of the Smith-Waterman
with affine gap penalty algorithm, the profile HMM algorithm and the BLAST algorithm. In
terms of speed-up improvements, comparisons were made on performance of the designed
cores against their corresponding software and the reported FPGA implementations. In the
case of comparison with equivalent software execution, acceleration of the optimal
alignment algorithm in hardware yielded an average speed-up of 269x as compared to the
SSEARCH 35 software. For the profile HMM-based sequence alignment, the designed core
achieved speed-up of 103x and 8.3x against the HMMER 2.0 and the latest version of
HMMER (version 3.0) respectively. On the other hand, the implementation of the gapped
BLAST with the two-hit method in hardware achieved a greater than tenfold speed-up
compared to the latest NCBI BLAST software. In terms of comparison against other reported
FPGA implementations, the proposed normalized performance indicator was used to
evaluate the designed architectures fairly. The results showed that the first architecture
achieved more than 50 percent improvement, while acceleration of the profile HMM
sequence alignment in hardware gained a normalized speed-up of 1.34. In the case of the
gapped BLAST with the two-hit method, the designed core achieved 11x speed-up after
taking out advantages of the Virtex-5 FPGA. In addition, further analysis was conducted in
terms of cost and power performances; it was noted that, the core achieved 0.46 MCUPS per
dollar spent and 958.1 MCUPS per watt. This shows that FPGAs can be an attractive
platform for high performance computation with advantages of smaller area footprint as well
as represent economic ‘green’ solution compared to the other acceleration platforms. Higher
throughput can be achieved by redeploying the cores on newer, bigger and faster FPGAs
with minimal design effort
Nova combinação de hardware e de software para veículos de desporto automóvel baseada no processamento directo de funções gráficas
Doutoramento em Engenharia EletrónicaThe main motivation for the work presented here began with previously
conducted experiments with a programming concept at the time named
"Macro". These experiments led to the conviction that it would be possible to
build a system of engine control from scratch, which could eliminate many of
the current problems of engine management systems in a direct and intrinsic
way. It was also hoped that it would minimize the full range of software and
hardware needed to make a final and fully functional system.
Initially, this paper proposes to make a comprehensive survey of the state of
the art in the specific area of software and corresponding hardware of
automotive tools and automotive ECUs. Problems arising from such software
will be identified, and it will be clear that practically all of these problems stem
directly or indirectly from the fact that we continue to make comprehensive use
of extremely long and complex "tool chains". Similarly, in the hardware, it will
be argued that the problems stem from the extreme complexity and
inter-dependency inside processor architectures. The conclusions are
presented through an extensive list of "pitfalls" which will be thoroughly
enumerated, identified and characterized.
Solutions will also be proposed for the various current issues and for the
implementation of these same solutions. All this final work will be part of a
"proof-of-concept" system called "ECU2010". The central element of this
system is the before mentioned "Macro" concept, which is an graphical block
representing one of many operations required in a automotive system having
arithmetic, logic, filtering, integration, multiplexing functions among others. The
end result of the proposed work is a single tool, fully integrated, enabling the
development and management of the entire system in one simple visual
interface. Part of the presented result relies on a hardware platform fully
adapted to the software, as well as enabling high flexibility and scalability in
addition to using exactly the same technology for ECU, data logger and
peripherals alike.
Current systems rely on a mostly evolutionary path, only allowing online
calibration of parameters, but never the online alteration of their own
automotive functionality algorithms. By contrast, the system developed and
described in this thesis had the advantage of following a "clean-slate"
approach, whereby everything could be rethought globally. In the end, out of all
the system characteristics, "LIVE-Prototyping" is the most relevant feature,
allowing the adjustment of automotive algorithms (eg. Injection, ignition,
lambda control, etc.) 100% online, keeping the engine constantly working,
without ever having to stop or reboot to make such changes. This consequently
eliminates any "turnaround delay" typically present in current automotive
systems, thereby enhancing the efficiency and handling of such systems.A principal motivação para o trabalho que conduziu a esta tese residiu na
constatação de que os actuais métodos de modelação de centralinas
automóveis conduzem a significativos problemas de desenvolvimento e
manutenção. Como resultado dessa constatação, o objectivo deste trabalho
centrou-se no desenvolvimento de um conceito de arquitectura que rompe
radicalmente com os modelos state-of-the-art e que assenta num conjunto de
conceitos que vieram a ser designados de "Macro" e "Celular ECU". Com este
modelo pretendeu-se simultaneamente minimizar a panóplia de software e de
hardware necessários à obtenção de uma sistema funcional final.
Inicialmente, esta tese propõem-se fazer um levantamento exaustivo do
estado da arte na área específica do software e correspondente hardware das
ferramentas e centralinas automóveis. Os problemas decorrentes de tal
software serão identificados e, dessa identificação deverá ficar claro, que
praticamente todos esses problemas têm origem directa ou indirecta no facto
de se continuar a fazer um uso exaustivo de "tool chains" extremamente
compridas e complexas. De forma semelhante, no hardware, os problemas
têm origem na extrema complexidade e inter-dependência das arquitecturas
dos processadores. As consequências distribuem-se por uma extensa lista de
"pitfalls" que também serão exaustivamente enumeradas, identificadas e
caracterizadas.
São ainda propostas soluções para os diversos problemas actuais e
correspondentes implementações dessas mesmas soluções. Todo este
trabalho final faz parte de um sistema "proof-of-concept" designado
"ECU2010". O elemento central deste sistema é o já referido conceito de
“Macro”, que consiste num bloco gráfico que representa uma de muitas
operações necessárias num sistema automóvel, como sejam funções
aritméticas, lógicas, de filtragem, de integração, de multiplexagem, entre
outras. O resultado final do trabalho proposto assenta numa única ferramenta,
totalmente integrada que permite o desenvolvimento e gestão de todo o
sistema de forma simples numa única interface visual. Parte do resultado
apresentado assenta numa plataforma hardware totalmente adaptada ao
software, bem como na elevada flexibilidade e escalabilidade, para além de
permitir a utilização de exactamente a mesma tecnologia quer para a
centralina, como para o datalogger e para os periféricos.
Os sistemas actuais assentam num percurso maioritariamente evolutivo,
apenas permitindo a calibração online de parâmetros, mas nunca a alteração
online dos próprios algoritmos das funcionalidades automóveis. Pelo contrário,
o sistema desenvolvido e descrito nesta tese apresenta a vantagem de seguir
um "clean-slate approach", pelo que tudo pode ser globalmente repensado. No
final e para além de todas as restantes características, o
“LIVE-PROTOTYPING” é a funcionalidade mais relevante, ao permitir alterar
algoritmos automóveis (ex: injecção, ignição, controlo lambda, etc.) de forma
100% online, mantendo o motor constantemente a trabalhar e sem nunca ter
de o parar ou re-arrancar para efectuar tais alterações. Isto elimina
consequentemente qualquer "turnaround delay" tipicamente presente em
qualquer sistema automóvel actual, aumentando de forma significativa a
eficiência global do sistema e da sua utilização
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing