15 research outputs found
10281 Abstracts Collection -- Dynamically Reconfigurable Architectures
From 11.07.10 to 16.07.10, Dagstuhl Seminar 10281 ``Dynamically Reconfigurable Architectures \u27\u27 was held
in Schloss Dagstuhl~--~Leibniz Center for Informatics.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
Towards Dilated Placement of Dynamic NoC Cores
Instead of mapping application task graphs in a compact manner onto
reconfigurable devices using a network-on-chip for interconnecting
application cores, we propose dilating the mappings as much as the
available latencies on critical connections allow. In a dilated mapping,
the unused resources between an application\u27s configured components can
be used to provide additional flexibility when the configuration needs
to change. We motivate the reasons for dilating application task graphs
targeted at reconfigurable devices; derive a simulated annealing approach
to dilating the placement of such graphs; and present preliminary results
of applying the algorithm to synthetic test cases. The method appears to
result in successful and meaningful graph dilation and could be further
tuned to satisfy desired power constraints
Embedded electronic systems driven by run-time reconfigurable hardware
Abstract
This doctoral thesis addresses the design of embedded electronic systems based on run-time reconfigurable hardware technology –available through SRAM-based FPGA/SoC devices– aimed at contributing to enhance the life quality of the human beings. This work does research on the conception of the system architecture and the reconfiguration engine that provides to the FPGA the capability of dynamic partial reconfiguration in order to synthesize, by means of hardware/software co-design, a given application partitioned in processing tasks which are multiplexed in time and space, optimizing thus its physical implementation –silicon area, processing time, complexity, flexibility, functional density, cost and power consumption– in comparison with other alternatives based on static hardware (MCU, DSP, GPU, ASSP, ASIC, etc.). The design flow of such technology is evaluated through the prototyping of several engineering applications (control systems, mathematical coprocessors, complex image processors, etc.), showing a high enough level of maturity for its exploitation in the industry.Resumen
Esta tesis doctoral abarca el diseño de sistemas electrónicos embebidos basados en tecnología hardware dinámicamente reconfigurable –disponible a través de dispositivos lógicos programables SRAM FPGA/SoC– que contribuyan a la mejora de la calidad de vida de la sociedad. Se investiga la arquitectura del sistema y del motor de reconfiguración que proporcione a la FPGA la capacidad de reconfiguración dinámica parcial de sus recursos programables, con objeto de sintetizar, mediante codiseño hardware/software, una determinada aplicación particionada en tareas multiplexadas en tiempo y en espacio, optimizando así su implementación física –área de silicio, tiempo de procesado, complejidad, flexibilidad, densidad funcional, coste y potencia disipada– comparada con otras alternativas basadas en hardware estático (MCU, DSP, GPU, ASSP, ASIC, etc.). Se evalúa el flujo de diseño de dicha tecnología a través del prototipado de varias aplicaciones de ingeniería (sistemas de control, coprocesadores aritméticos, procesadores de imagen, etc.), evidenciando un nivel de madurez viable ya para su explotación en la industria.Resum
Aquesta tesi doctoral està orientada al disseny de sistemes electrònics empotrats basats en tecnologia hardware dinàmicament reconfigurable –disponible mitjançant dispositius lògics programables SRAM FPGA/SoC– que contribueixin a la millora de la qualitat de vida de la societat. S’investiga l’arquitectura del sistema i del motor de reconfiguració que proporcioni a la FPGA la capacitat de reconfiguració dinàmica parcial dels seus recursos programables, amb l’objectiu de sintetitzar, mitjançant codisseny hardware/software, una determinada aplicació particionada en tasques multiplexades en temps i en espai, optimizant així la seva implementació física –àrea de silici, temps de processat, complexitat, flexibilitat, densitat funcional, cost i potència dissipada– comparada amb altres alternatives basades en hardware estàtic (MCU, DSP, GPU, ASSP, ASIC, etc.). S’evalúa el fluxe de disseny d’aquesta tecnologia a través del prototipat de varies aplicacions d’enginyeria (sistemes de control, coprocessadors aritmètics, processadors d’imatge, etc.), demostrant un nivell de maduresa viable ja per a la seva explotació a la indústria
Rapid Prototyping and Exploration Environment for Generating C-to-Hardware-Compilers
There is today an ever-increasing demand for more computational power
coupled with a desire to minimize energy requirements. Hardware
accelerators currently appear to be the best solution to this problem.
While general purpose computation with GPUs seem to be
very successful in this area, they perform adequately only in those cases where the data
access patterns and utilized algorithms fit the underlying
architecture. ASICs on the other hand can yield even better results
in terms of performance and energy consumption, but are very
inflexible, as they are manufactured with an application specific
circuitry. Field Programmable Gate Arrays (FPGAs) represent a combination of approaches: With their application specific hardware they provide
high computational power while requiring, for many applications, less
energy than a CPU or a GPU. On the other hand they are far more
flexible than an ASIC due to their reconfigurability.
The only remaining problem is the programming of
the FPGAs, as they are far more difficult to program compared to
regular software. To allow common software developers, who have at
best very limited
knowledge in hardware design, to make use of these devices, tools were
developed that take a regular high level language and generate
hardware from it.
Among such tools, C-to-HDL compilers are a particularly wide-spread approach. These compilers attempt to translate common C code into a hardware description language from which a
datapath is generated. Most of these compilers have many
restrictions for the input and differ in their underlying generated
micro architecture, their scheduling method, their applied
optimizations, their execution model and even their target
hardware. Thus, a comparison of a certain aspect alone, like their
implemented scheduling method or their generated micro architecture,
is almost impossible, as they differ in so many other aspects.
This work provides a survey of the existing C-to-HDL compilers and
presents a new approach to evaluating and exploring
different micro architectures for dynamic scheduling used by such
compilers. From a mathematically formulated rule set the Triad
compiler generates a backend for the Scale compiler framework,
which then implements a hardware generation backend with described dynamic
scheduling.
While more than a factor of four slower than hardware from highly
optimized compilers, this environment allows easy comparison and
exploration of different rule sets and the micro architecture for the
dynamically scheduled datapaths generated from them. For demonstration
purposes a rule set modeling the COCOMA token flow model from
the COMRADE 2.0 compiler was implemented. Multiple variants of
it were explored: Savings of up to 11% of the required hardware
resources were possible
Towards the development of flexible, reliable, reconfigurable, and high-performance imaging systems
Current FPGAs can implement large systems because of the high density of
reconfigurable logic resources in a single chip. FPGAs are comprehensive devices
that combine flexibility and high performance in the same platform compared to
other platform such as General-Purpose Processors (GPPs) and Application Specific
Integrated Circuits (ASICs). The flexibility of modern FPGAs is further enhanced by
introducing Dynamic Partial Reconfiguration (DPR) feature, which allows for
changing the functionality of part of the system while other parts are functioning.
FPGAs became an important platform for digital image processing applications
because of the aforementioned features. They can fulfil the need of efficient and
flexible platforms that execute imaging tasks efficiently as well as the reliably with
low power, high performance and high flexibility. The use of FPGAs as accelerators
for image processing outperforms most of the current solutions. Current FPGA
solutions can to load part of the imaging application that needs high computational
power on dedicated reconfigurable hardware accelerators while other parts are
working on the traditional solution to increase the system performance. Moreover,
the use of the DPR feature enhances the flexibility of image processing further by
swapping accelerators in and out at run-time. The use of fault mitigation techniques
in FPGAs enables imaging applications to operate in harsh environments following
the fact that FPGAs are sensitive to radiation and extreme conditions.
The aim of this thesis is to present a platform for efficient implementations of
imaging tasks. The research uses FPGAs as the key component of this platform and
uses the concept of DPR to increase the performance, flexibility, to reduce the power
dissipation and to expand the cycle of possible imaging applications. In this context,
it proposes the use of FPGAs to accelerate the Image Processing Pipeline (IPP)
stages, the core part of most imaging devices. The thesis has a number of novel
concepts. The first novel concept is the use of FPGA hardware environment and
DPR feature to increase the parallelism and achieve high flexibility. The concept also
increases the performance and reduces the power consumption and area utilisation.
Based on this concept, the following implementations are presented in this thesis: An
implementation of Adams Hamilton Demosaicing algorithm for camera colour
interpolation, which exploits the FPGA parallelism to outperform other equivalents.
In addition, an implementation of Automatic White Balance (AWB), another IPP
stage that employs DPR feature to prove the mentioned novelty aspects. Another
novel concept in this thesis is presented in chapter 6, which uses DPR feature to
develop a novel flexible imaging system that requires less logic and can be
implemented in small FPGAs. The system can be employed as a template for any
imaging application with no limitation. Moreover, discussed in this thesis is a novel
reliable version of the imaging system that adopts novel techniques including
scrubbing, Built-In Self Test (BIST), and Triple Modular Redundancy (TMR) to
detect and correct errors using the Internal Configuration Access Port (ICAP)
primitive. These techniques exploit the datapath-based nature of the implemented
imaging system to improve the system's overall reliability. The thesis presents a
proposal for integrating the imaging system with the Robust Reliable Reconfigurable
Real-Time Heterogeneous Operating System (R4THOS) to get the best out of the
system. The proposal shows the suitability of the proposed DPR imaging system to
be used as part of the core system of autonomous cars because of its unbounded
flexibility. These novel works are presented in a number of publications as shown in section
1.3 later in this thesis
Dynamic partial reconfiguration management for high performance and reliability in FPGAs
Modern Field-Programmable Gate Arrays (FPGAs) are no longer used to implement
small “glue logic” circuitries. The high-density of reconfigurable logic resources in
today’s FPGAs enable the implementation of large systems in a single chip. FPGAs
are highly flexible devices; their functionality can be altered by simply loading a new
binary file in their configuration memory. While the flexibility of FPGAs is
comparable to General-Purpose Processors (GPPs), in the sense that different
functions can be performed using the same hardware, the performance gain that can
be achieved using FPGAs can be orders of magnitudes higher as FPGAs offer the
ability for customisation of parallel computational architectures.
Dynamic Partial Reconfiguration (DPR) allows for changing the functionality of
certain blocks on the chip while the rest of the FPGA is operational. DPR has
sparked the interest of researchers to explore new computational platforms where
computational tasks are off-loaded from a main CPU to be executed using dedicated
reconfigurable hardware accelerators configured on demand at run-time. By having a
battery of custom accelerators which can be swapped in and out of the FPGA at runtime,
a higher computational density can be achieved compared to static systems
where the accelerators are bound to fixed locations within the chip. Furthermore, the
ability of relocating these accelerators across several locations on the chip allows for
the implementation of adaptive systems which can mitigate emerging faults in the
FPGA chip when operating in harsh environments. By porting the appropriate fault
mitigation techniques in such computational platforms, the advantages of FPGAs can
be harnessed in different applications in space and military electronics where FPGAs
are usually seen as unreliable devices due to their sensitivity to radiation and extreme
environmental conditions.
In light of the above, this thesis investigates the deployment of DPR as: 1) a method
for enhancing performance by efficient exploitation of the FPGA resources, and 2) a
method for enhancing the reliability of systems intended to operate in harsh
environments. Achieving optimal performance in such systems requires an efficient
internal configuration management system to manage the reconfiguration and
execution of the reconfigurable modules in the FPGA. In addition, the system needs
to support “fault-resilience” features by integrating parameterisable fault detection
and recovery capabilities to meet the reliability standard of fault-tolerant
applications. This thesis addresses all the design and implementation aspects of an
Internal Configuration Manger (ICM) which supports a novel bitstream relocation
model to enable the placement of relocatable accelerators across several locations on
the FPGA chip. In addition to supporting all the configuration capabilities required to
implement a Reconfigurable Operating System (ROS), the proposed ICM also
supports the novel multiple-clone configuration technique which allows for cloning
several instances of the same hardware accelerator at the same time resulting in much
shorter configuration time compared to traditional configuration techniques. A faulttolerant
(FT) version of the proposed ICM which supports a comprehensive faultrecovery
scheme is also introduced in this thesis. The proposed FT-ICM is designed
with a much smaller area footprint compared to Triple Modular Redundancy (TMR)
hardening techniques while keeping a comparable level of fault-resilience.
The capabilities of the proposed ICM system are demonstrated with two novel
applications. The first application demonstrates a proof-of-concept reliable FPGA
server solution used for executing encryption/decryption queries. The proposed
server deploys bitstream relocation and modular redundancy to mitigate both
permanent and transient faults in the device. It also deploys a novel Built-In Self-
Test (BIST) diagnosis scheme, specifically designed to detect emerging permanent
faults in the system at run-time. The second application is a data mining application
where DPR is used to increase the computational density of a system used to
implement the Frequent Itemset Mining (FIM) problem
A multi-level functional IR with rewrites for higher-level synthesis of accelerators
Specialised accelerators deliver orders of magnitude higher energy-efficiency than
general-purpose processors. Field Programmable Gate Arrays (FPGAs) have become
the substrate of choice, because the ever-changing nature of modern workloads, such
as machine learning, demands reconfigurability. However, they are notoriously hard
to program directly using Hardware Description Languages (HDLs). Traditional High-Level Synthesis (HLS) tools improve productivity, but come with their own problems.
They often produce sub-optimal designs and programmers are still required to write
hardware-specific code, thus development cycles remain long.
This thesis proposes Shir, a higher-level synthesis approach for high-performance
accelerator design with a hardware-agnostic programming entry point, a multi-level
Intermediate Representation (IR), a compiler and rewrite rules for optimisation.
First, a novel, multi-level functional IR structure for accelerator design is described.
The IRs operate on different levels of abstraction, cleanly separating different hardware
concerns. They enable the expression of different forms of parallelism and standard
memory features, such as asynchronous off-chip memories or synchronous on-chip
buffers, as well as arbitration of such shared resources. Exposing these features at the
IR level is essential for achieving high performance.
Next, mechanical lowering procedures are introduced to automatically compile
a program specification through Shir’s functional IRs until low-level HDL code for
FPGA synthesis is emitted. Each lowering step gradually adds implementation details.
Finally, this thesis presents rewrite rules for automatic optimisations around parallelisation, buffering and data reshaping. Reshaping operations pose a challenge to
functional approaches in particular. They introduce overheads that compromise performance or even prevent the generation of synthesisable hardware designs altogether.
This fundamental issue is solved by the application of rewrite rules.
The viability of this approach is demonstrated by running matrix multiplication
and 2D convolution on an Intel Arria 10 FPGA. A limited design space exploration is
conducted, confirming the ability of the IR to exploit various hardware features. Using
rewrite rules for optimisation, it is possible to generate high-performance designs
that are competitive with highly tuned OpenCL implementations and that outperform
hardware-agnostic OpenCL code. The performance impact of the optimisations is
further evaluated showing that they are essential to achieving high performance, and
in many cases also necessary to produce hardware that fits the resource constraints
Aeronautical engineering: A continuing bibliography with indexes (supplement 301)
This bibliography lists 1291 reports, articles, and other documents introduced into the NASA scientific and technical information system in Feb. 1994. Subject coverage includes: design, construction and testing of aircraft and aircraft engines; aircraft components, equipment, and systems; ground support systems; and theoretical and applied aspects of aerodynamics and general fluid dynamics
Aeronautical engineering: A cumulative index to a continuing bibliography (supplement 248)
This publication is a cumulative index to the abstracts contained in Supplements 236 through 247 of Aeronautical Engineering: A Continuing Bibliography. The bibliographic series is compiled through the cooperative efforts of the American Institute of Aeronautics and Astronautics (AIAA) and the National Aeronautics and Space Administration (NASA). Seven indexes are included -- subject, personal author, corporate source, foreign technology, contract number, report number and accession number