15 research outputs found
Evaluation of advanced techniques for structural FPGA self-test
This thesis presents a comprehensive test generation framework for FPGA logic elements and interconnects. It is based on and extends the current state-of-the-art. The purpose of FPGA testing in this work is to achieve reliable reconfiguration for a FPGA-based runtime reconfigurable system. A pre-configuration test is performed on a portion of the FPGA before it is reconfigured as part of the system to ensure that the FPGA fabric is fault-free. The implementation platform is the Xilinx Virtex-5 FPGA family.
Existing literature in FPGA testing is evaluated and reviewed thoroughly. The various approaches are compared against one another qualitatively and the approach most suitable to the target platform is chosen. The array testing method is employed in testing the FPGA logic for its low hardware overhead and optimal test time. All tests are additionally pipelined to reduce test application time and use a high test clock frequency. A hybrid fault model including both structural and functional faults is assumed.
An algorithm for the optimization of the number of required FPGA test configurations is developed and implemented in Java using a pseudo-random set-covering heuristic. Optimal solutions are obtained for Virtex-5 logic slices. The algorithm effort is parameterizable with the number of loop iterations each of which take approximately one second for a Virtex-5 sliceL circuit.
A flexible test architecture for interconnects is developed. Arbitrary wire types can be tested in the same test configuration with no hardware overhead. Furthermore, a routing algorithm is integrated with the test template generation to select the wires under test and route them appropriately.
Nine test configurations are required to achieve full test coverage for the FPGA logic. For interconnect testing, a local router-based on depth-first graph traversal is implemented in Java as the basis for creating systematic interconnect test templates. Pent wire testing is additionally implemented as a proof of concept. The test clock frequency for all tests exceeds 170 MHz and the hardware overhead is always lower than seven CLBs. All implemented tests are parameterizable such that they can be applied to any portion of the FPGA regardless of size or position
High-level synthesis of triple modular redundant FPGA circuits with energy efficient error recovery mechanisms
There is a growing interest in deploying commercial SRAM-based Field Programmable Gate Array (FPGA) circuits in space due to their low cost, reconfigurability, high logic capacity and rich I/O interfaces. However, their configuration memory (CM) is vulnerable to ionising radiation which raises the need for effective fault-tolerant design techniques. This thesis provides the following contributions to mitigate the negative effects of soft errors in SRAM FPGA circuits.
Triple Modular Redundancy (TMR) with periodic CM scrubbing or Module-based CM error recovery (MER) are popular techniques for mitigating soft errors in FPGA circuits. However, this thesis shows that MER does not recover CM soft errors in logic instantiated outside the reconfigurable regions of TMR modules. To address this limitation, a hybrid error recovery mechanism, namely FMER, is proposed. FMER uses selective periodic scrubbing and MER to recover CM soft errors inside and outside the reconfigurable regions of TMR modules, respectively. Experimental results indicate that TMR circuits with FMER achieve higher dependability with less energy consumption than those using periodic scrubbing or MER alone.
An imperative component of MER and FMER is the reconfiguration control network (RCN) that transfers the minority reports of TMR components, i.e., which, if any, TMR module needs recovery, to the FPGA's reconfiguration controller (RC). Although several reliable RCs have been proposed, a study of reliable RCNs has not been previously reported. This thesis fills this research gap, by proposing a technique that transfers the circuit's minority reports to the RC via the configuration-layer of the FPGA. This reduces the resource utilisation of the RCN and therefore its failure rate. Results show that the proposed RCN achieves higher reliability than alternative RCN architectures reported in the literature.
The last contribution of this thesis is a high-level synthesis (HLS) tool, namely TLegUp, developed within the LegUp HLS framework. TLegUp triplicates Xilinx 7-series FPGA circuits during HLS rather than during the register-transfer level pre- or post-synthesis flow stage, as existing computer-aided design tools do. Results show that TLegUp can generate non-partitioned TMR circuits with 500x less soft error sensitivity than non-triplicated functional equivalent baseline circuits, while utilising 3-4x more resources and having 11% lower frequency
Recommended from our members
Efficient FPGA implementation and power modelling of image and signal processing IP cores
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Field Programmable Gate Arrays (FPGAs) are the technology of choice in a number ofimage
and signal processing application areas such as consumer electronics, instrumentation,
medical data processing and avionics due to their reasonable energy consumption, high performance, security, low design-turnaround time and reconfigurability. Low power FPGA
devices are also emerging as competitive solutions for mobile and thermally constrained platforms. Most computationally intensive image and signal processing algorithms also consume a lot of power leading to a number of issues including reduced mobility, reliability concerns and increased design cost among others. Power dissipation has become one of the most important challenges, particularly for FPGAs. Addressing this problem requires optimisation and awareness at all levels in the design flow. The key achievements of the
work presented in this thesis are summarised here. Behavioural level optimisation strategies have been used for implementing matrix product and inner product through the use of mathematical techniques such as Distributed Arithmetic (DA) and its variations including offset binary coding, sparse factorisation and novel vector level transformations. Applications to test the impact of these algorithmic and arithmetic transformations include the fast Hadamard/Walsh transforms and Gaussian mixture models. Complete design space exploration has been performed on these cores, and where appropriate, they have been shown to clearly outperform comparable existing implementations. At the architectural level, strategies such as parallelism, pipelining and systolisation have been successfully applied for the design and optimisation of a number of
cores including colour space conversion, finite Radon transform, finite ridgelet transform and circular convolution. A pioneering study into the influence of supply voltage scaling for FPGA based designs, used in conjunction with performance enhancing strategies such as parallelism and pipelining has been performed. Initial results are very promising and indicated significant potential for future research in this area.
A key contribution of this work includes the development of a novel high level power macromodelling technique for design space exploration and characterisation of custom IP cores for FPGAs, called Functional Level Power Analysis and Modelling (FLPAM). FLPAM
is scalable, platform independent and compares favourably with existing approaches. A hybrid, top-down design flow paradigm integrating FLPAM with commercially available design tools for systematic optimisation of IP cores has also been developed
Generic low power reconfigurable distributed arithmetic processor
Higher performance, lower cost, increasingly minimizing integrated circuit components, and
higher packaging density of chips are ongoing goals of the microelectronic and computer
industry. As these goals are being achieved, however, power consumption and flexibility are
increasingly becoming bottlenecks that need to be addressed with the new technology in Very
Large-Scale Integrated (VLSI) design.
For modern systems, more energy is required to support the powerful computational capability
which accords with the increasing requirements, and these requirements cause the change of
standards not only in audio and video broadcasting but also in communication such as wireless
connection and network protocols. Powerful flexibility and low consumption are repellent, but
their combination in one system is the ultimate goal of designers.
A generic domain-specific low-power reconfigurable processor for the distributed
arithmetic algorithm is presented in this dissertation. This domain reconfigurable processor
features high efficiency in terms of area, power and delay, which approaches the
performance of an ASIC design, while retaining the flexibility of programmable platforms.
The architecture not only supports typical distributed arithmetic algorithms which can be
found in most still picture compression standards and video conferencing standards, but
also offers implementation ability for other distributed arithmetic algorithms found in
digital signal processing, telecommunication protocols and automatic control.
In this processor, a simple reconfigurable low power control unit is implemented with
good performance in area, power and timing. The generic characteristic of the architecture
makes it applicable for any small and medium size finite state machines which can be used
as control units to implement complex system behaviour and can be found in almost all
engineering disciplines. Furthermore, to map target applications efficiently onto the
proposed architecture, a new algorithm is introduced for searching for the best common
sharing terms set and it keeps the area and power consumption of the implementation at
low level. The software implementation of this algorithm is presented, which can be used
not only for the proposed architecture in this dissertation but also for all the
implementations with adder-based distributed arithmetic algorithms. In addition, some low
power design techniques are applied in the architecture, such as unsymmetrical design
style including unsymmetrical interconnection arranging, unsymmetrical PTBs selection
and unsymmetrical mapping basic computing units. All these design techniques achieve
extraordinary power consumption saving. It is believed that they can be extended to more
low power designs and architectures.
The processor presented in this dissertation can be used to implement complex, high
performance distributed arithmetic algorithms for communication and image processing
applications with low cost in area and power compared with the traditional
methods
The 1991 3rd NASA Symposium on VLSI Design
Papers from the symposium are presented from the following sessions: (1) featured presentations 1; (2) very large scale integration (VLSI) circuit design; (3) VLSI architecture 1; (4) featured presentations 2; (5) neural networks; (6) VLSI architectures 2; (7) featured presentations 3; (8) verification 1; (9) analog design; (10) verification 2; (11) design innovations 1; (12) asynchronous design; and (13) design innovations 2
Compilation efficace pour FPGA reconfigurable dynamiquement
Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal
Modelli e strumenti di programmazione parallela per piattaforme many-core
The negotiation between power consumption, performance, programmability, and portability drives all computing industry designs, in particular the mobile and embedded systems domains.
Two design paradigms have proven particularly promising in this context: architectural heterogeneity and many-core processors.
Parallel programming models are key to effectively harness the computational power of heterogeneous many-core SoC.
This thesis presents a set of techniques and HW/SW extensions that enable performance improvements and that simplify programmability for heterogeneous many-core platforms.
The thesis contributions cover vertically the entire software stack for many-core platforms, from hardware abstraction layers running on top of bare-metal, to programming models; from hardware extensions for efficient parallelism support to middleware that enables optimized resource management within many-core platforms.
First, we present mechanisms to decrease parallelism overheads on parallel programming runtimes for many-core platforms, targeting fine-grain parallelism.
Second, we present programming model support that enables the offload of computational kernels within heterogeneous many-core systems.
Third, we present a novel approach to dynamically sharing and managing many-core platforms when multiple applications coded with different programming models execute concurrently.
All these contributions were validated using STMicroelectronics STHORM, a real embodiment of a state-of-the-art many-core system. Hardware extensions and architectural explorations were explored using VirtualSoC, a SystemC based cycle-accurate simulator of many-core platforms