9 research outputs found
Reliability-energy-performance optimisation in combinational circuits in presence of soft errors
PhD ThesisThe reliability metric has a direct relationship to the amount of value produced
by a circuit, similar to the performance metric. With advances in CMOS
technology, digital circuits become increasingly more susceptible to soft errors.
Therefore, it is imperative to be able to assess and improve the level of reliability
of these circuits. A framework for evaluating and improving the reliability of
combinational circuits is proposed, and an interplay between the metrics of
reliability, energy and performance is explored.
Reliability evaluation is divided into two levels of characterisation: stochastic
fault model (SFM) of the component library and a design-specific critical vector
model (CVM). The SFM captures the properties of components with regard to
the interference which causes error. The CVM is derived from a limited number
of simulation runs on the specific design at the design time and producing
the reliability metric. The idea is to move the high-complexity problem of the
stochastic characterisation of components to the generic part of the design
process, and to do it just once for a large number of specific designs. The
method is demonstrated on a range of circuits with various structures.
A three-way trade-off between reliability, energy, and performance has
been discovered; this trade-off facilitates optimisations of circuits and their
operating conditions.
A technique for improving the reliability of a circuit is proposed, based on
adding a slow stage at the primary output. Slow stages have the ability to
absorb narrow glitches from prior stages, thus reducing the error probability.
Such stages, or filters, suppress most of the glitches generated in prior stages
and prevent them from arriving at the primary output of the circuit. Two filter
solutions have been developed and analysed. The results show a dramatic
improvement in reliability at the expense of minor performance and energy
penalties.
To alleviate the problem of the time-consuming analogue simulations involved in the proposed method, a simplification technique is proposed. This
technique exploits the equivalence between the properties of the gates within
a path and the equivalence between paths. On the basis of these equivalences,
it is possible to reduce the number of simulation runs. The effectiveness of
the proposed technique is evaluated by applying it to different circuits with
a representative variety of path topologies. The results show a significant
decrease in the time taken to estimate reliability at the expense of a minor
decrease in the accuracy of estimation. The simplification technique enables
the use of the proposed method in applications with complex circuits.Ministry of Education and Scientific Research in Liby
Analysis and Design of Resilient VLSI Circuits
The reliable operation of Integrated Circuits (ICs) has become increasingly difficult to
achieve in the deep sub-micron (DSM) era. With continuously decreasing device feature
sizes, combined with lower supply voltages and higher operating frequencies, the noise
immunity of VLSI circuits is decreasing alarmingly. Thus, VLSI circuits are becoming
more vulnerable to noise effects such as crosstalk, power supply variations and radiation-induced
soft errors. Among these noise sources, soft errors (or error caused by radiation
particle strikes) have become an increasingly troublesome issue for memory arrays as well
as combinational logic circuits. Also, in the DSM era, process variations are increasing
at an alarming rate, making it more difficult to design reliable VLSI circuits. Hence, it
is important to efficiently design robust VLSI circuits that are resilient to radiation particle
strikes and process variations. The work presented in this dissertation presents several
analysis and design techniques with the goal of realizing VLSI circuits which are tolerant
to radiation particle strikes and process variations.
This dissertation consists of two parts. The first part proposes four analysis and two
design approaches to address radiation particle strikes. The analysis techniques for the
radiation particle strikes include: an approach to analytically determine the pulse width
and the pulse shape of a radiation induced voltage glitch in combinational circuits, a technique
to model the dynamic stability of SRAMs, and a 3D device-level analysis of the
radiation tolerance of voltage scaled circuits. Experimental results demonstrate that the proposed techniques for analyzing radiation particle strikes in combinational circuits and
SRAMs are fast and accurate compared to SPICE. Therefore, these analysis approaches
can be easily integrated in a VLSI design flow to analyze the radiation tolerance of such
circuits, and harden them early in the design flow. From 3D device-level analysis of the radiation
tolerance of voltage scaled circuits, several non-intuitive observations are made and
correspondingly, a set of guidelines are proposed, which are important to consider to realize
radiation hardened circuits. Two circuit level hardening approaches are also presented
to harden combinational circuits against a radiation particle strike. These hardening approaches
significantly improve the tolerance of combinational circuits against low and very
high energy radiation particle strikes respectively, with modest area and delay overheads.
The second part of this dissertation addresses process variations. A technique is developed
to perform sensitizable statistical timing analysis of a circuit, and thereby improve the
accuracy of timing analysis under process variations. Experimental results demonstrate that
this technique is able to significantly reduce the pessimism due to two sources of inaccuracy
which plague current statistical static timing analysis (SSTA) tools. Two design approaches
are also proposed to improve the process variation tolerance of combinational circuits and
voltage level shifters (which are used in circuits with multiple interacting power supply
domains), respectively. The variation tolerant design approach for combinational circuits
significantly improves the resilience of these circuits to random process variations, with a
reduction in the worst case delay and low area penalty. The proposed voltage level shifter
is faster, requires lower dynamic power and area, has lower leakage currents, and is more
tolerant to process variations, compared to the best known previous approach.
In summary, this dissertation presents several analysis and design techniques which
significantly augment the existing work in the area of resilient VLSI circuit design
Soft Error Analysis and Mitigation at High Abstraction Levels
Radiation-induced soft errors, as one of the major reliability challenges in future technology nodes, have to be carefully taken into consideration in the design space exploration. This thesis presents several novel and efficient techniques for soft error evaluation and mitigation at high abstract levels, i.e. from register transfer level up to behavioral algorithmic level. The effectiveness of proposed techniques is demonstrated with extensive synthesis experiments
Dynamic reconfiguration frameworks for high-performance reliable real-time reconfigurable computing
The sheer hardware-based computational performance and programming flexibility
offered by reconfigurable hardware like Field-Programmable Gate Arrays (FPGAs)
make them attractive for computing in applications that require high performance,
availability, reliability, real-time processing, and high efficiency. Fueled by fabrication
process scaling, modern reconfigurable devices come with ever greater quantities of
on-chip resources, allowing a more complex variety of applications to be developed.
Thus, the trend is that technology giants like Microsoft, Amazon, and Baidu now
embrace reconfigurable computing devices likes FPGAs to meet their critical
computing needs. In addition, the capability to autonomously reprogramme these
devices in the field is being exploited for reliability in application domains like
aerospace, defence, military, and nuclear power stations. In such applications, real-time
computing is important and is often a necessity for reliability. As such, applications and
algorithms resident on these devices must be implemented with sufficient
considerations for real-time processing and reliability.
Often, to manage a reconfigurable hardware device as a computing platform for a
multiplicity of homogenous and heterogeneous tasks, reconfigurable operating systems
(ROSes) have been proposed to give a software look to hardware-based computation.
The key requirements of a ROS include partitioning, task scheduling and allocation,
task configuration or loading, and inter-task communication and synchronization.
Existing ROSes have met these requirements to varied extents. However, they are
limited in reliability, especially regarding the flexibility of placing the hardware circuits
of tasks on device’s chip area, the problem arising more from the partitioning
approaches used. Indeed, this problem is deeply rooted in the static nature of the on-chip
inter-communication among tasks, hampering the flexibility of runtime task
relocation for reliability.
This thesis proposes the enabling frameworks for reliable, available, real-time,
efficient, secure, and high-performance reconfigurable computing by providing
techniques and mechanisms for reliable runtime reconfiguration, and dynamic inter-circuit communication and synchronization for circuits on reconfigurable hardware.
This work provides task configuration infrastructures for reliable reconfigurable
computing. Key features, especially reliability-enabling functionalities, which have
been given little or no attention in state-of-the-art are implemented. These features
include internal register read and write for device diagnosis; configuration operation
abort mechanism, and tightly integrated selective-area scanning, which aims to
optimize access to the device’s reconfiguration port for both task loading and error
mitigation.
In addition, this thesis proposes a novel reliability-aware inter-task communication
framework that exploits the availability of dedicated clocking infrastructures in a
typical FPGA to provide inter-task communication and synchronization. The clock
buffers and networks of an FPGA use dedicated routing resources, which are distinct
from the general routing resources. As such, deploying these dedicated resources for
communication sidesteps the restriction of static routes and allows a better relocation
of circuits for reliability purposes.
For evaluation, a case study that uses a NASA/JPL spectrometer data processing
application is employed to demonstrate the improved reliability brought about by the
implemented configuration controller and the reliability-aware dynamic
communication infrastructure. It is observed that up to 74% time saving can be achieved
for selective-area error mitigation when compared to state-of-the-art vendor
implementations. Moreover, an improvement in overall system reliability is observed
when the proposed dynamic communication scheme is deployed in the data processing
application.
Finally, one area of reconfigurable computing that has received insufficient
attention is security. Meanwhile, considering the nature of applications which now turn
to reconfigurable computing for accelerating compute-intensive processes, a high
premium is now placed on security, not only of the device but also of the applications,
from loading to runtime execution. To address security concerns, a novel secure and
efficient task configuration technique for task relocation is also investigated, providing
configuration time savings of up to 32% or 83%, depending on the device; and resource
usage savings in excess of 90% compared to state-of-the-art