Search CORE

96 research outputs found

Power efficient resilient microarchitectures for PVT variability mitigation

Author: Agwa Shady
Publication venue: AUC Knowledge Fountain
Publication date: 01/06/2018
Field of study

Nowadays, the high power density and the process, voltage, and temperature variations became the most critical issues that limit the performance of the digital integrated circuits because of the continuous scaling of the fabrication technology. Dynamic voltage and frequency scaling technique is used to reduce the power consumption while different time relaxation techniques and error recovery microarchitectures are used to tolerate the process, voltage, and temperature variations. These techniques reduce the throughput by scaling down the frequency or flushing and restarting the errant pipeline. This thesis presents a novel resilient microarchitecture which is called ERSUT-based resilient microarchitecture to tolerate the induced delays generated by the voltage scaling or the process, voltage, and temperature variations. The resilient microarchitecture detects and recovers the induced errors without flushing the pipeline and without scaling down the operating frequency. An ERSUT-based resilient 16 Ã— 16 bit MAC unit, implemented using Global Foundries 65 nm technology and ARM standard cells library, is introduced as a case study with 18.26% area overhead and up to 1.5x speedup. At the typical conditions, the maximum frequency of the conventional MAC unit is about 375 MHz while the resilient MAC unit operates correctly at a frequency up to 565 MHz. In case of variations, the resilient MAC unit tolerates induced delays up to 50% of the clock period while keeping its throughput equal to the conventional MAC unitâ€™s maximum throughput. At 375 MHz, the resilient MAC unit is able to scale down the supply voltage from 1.2 V to 1.0 V saving about 29% of the power consumed by the conventional MAC unit. A double-edge-triggered microarchitecture is also introduced to reduce the power consumption extremely by reducing the frequency of the clock tree to the half while preserving the same maximum throughput. This microarchitecture is applied to different ISCASâ€™89 benchmark circuits in addition to the 16x16 bit MAC unit and the average power reduction of all these circuits is 63.58% while the average area overhead is 31.02%. All these circuits are designed using Global Foundries 65nm technology and ARM standard cells library. Towards the full automation of the ERSUT-based resilient microarchitecture, an ERSUT-based algorithm is introduced in C++ to accelerate the design process of the ERSUT-based microarchitecture. The developed algorithm reduces the design-time efforts dramatically and allows the ERSUT-based microarchitecture to be adopted by larger industrial designs. Depending on the ERSUT-based algorithm, a validation study about applying the ERSUT-based microarchitecture on the MAC unit and different ISCASâ€™89 benchmark circuits with different complexity weights is introduced. This study shows that 72% of these circuits tolerates more than 14% of their clock periods and 54.5% of these circuits tolerates more than 20% while 27% of these circuits tolerates more than 30%. Consequently, the validation study proves that the ERSUT-based resilient microarchitecture is a valid applicable solution for different circuits with different complexity weights

AUC Knowledge Fountain (American Univ. in Cairo)

PieceTimer: A Holistic Timing Analysis Framework Considering Setup/Hold Time Interdependency Using A Piecewise Model

Author: Li Bing
Schlichtmann Ulf
Zhang Grace Li
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/05/2017
Field of study

In static timing analysis, clock-to-q delays of flip-flops are considered as constants. Setup times and hold times are characterized separately and also used as constants. The characterized delays, setup times and hold times, are ap- plied in timing analysis independently to verify the perfor- mance of circuits. In reality, however, clock-to-q delays of flip-flops depend on both setup and hold times. Instead of being constants, these delays change with respect to different setup/hold time combinations. Consequently, the simple ab- straction of setup/hold times and constant clock-to-q delays introduces inaccuracy in timing analysis. In this paper, we propose a holistic method to consider the relation between clock-to-q delays and setup/hold time combinations with a piecewise linear model. The result is more accurate than that of traditional timing analysis, and the incorporation of the interdependency between clock-to-q delays, setup times and hold times may also improve circuit performance.Comment: IEEE/ACM International Conference on Computer-Aided Design (ICCAD), November 201

arXiv.org e-Print Archive

Mix & Latch: An Optimization Flow for High-Performance Designs with Single-Clock Mixed-Polarity Latches and Flip-Flops

Author: Filippo Minnella
Jordi Cortadella
Luciano Lavagno
Mario R. Casu
Mihai T. Lazarescu
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2023
Field of study

Flip-flops are the most used sequential elements in synchronous circuits, but designs based on latches can operate at higher frequencies and occupy less area. Techniques to increase the maximum operating frequency of flip-flop based designs, such as time-borrowing, rely on tight hold constraints that are difficult to satisfy using traditional back-end optimization techniques. We propose Mix & Latch , a methodology to increase the operating frequency of synchronous digital circuits using a single clock tree and a mixed distribution of positive- and negative-edge-triggered flops, and positive- and negative-level-sensitive latches. An efficient mathematical model is proposed to optimize the type and location of the sequential elements of the circuit. We ensure that the initial registers are not moved from their initial location, although they may change type, thus allowing the use of equivalence checking and static timing analysis to verify formally the correctness of the transformation. The technique is validated using a 28nm CMOS FDSOI technology, obtaining 1.33X post-layout average operating frequency improvement on a broad set of benchmarks over a standard commercial design flow. Additionally, the circuit area was also reduced by more than 1.19X on average for the same benchmarks, although the overall area reduction is not a goal of the optimization algorithm. To the best of our knowledge, this is the first work that proposes combining mixed-polarity flip-flops and latches to improve the circuit performance

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Timing error detection and correction for power efficiency: an aggressive scaling approach

Author: Kharaz Ahmad H.
Rathnala Prasanthi
Wilmshurst Tim
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2018
Field of study

Low-power consumption has become an important aspect of processors and systems design. Many techniques ranging from architectural to system level are available. Voltage scaling or frequency boosting methods are the most effective to achieve low-power consumption as the dynamic power is proportional to the frequency and to the square of the supply voltage. The basic principle of operation of aggressive voltage scaling is to adjust the supply voltage to the lowest level possible to achieve minimum power consumption while maintaining reliable operations. Similarly, aggressive frequency boosting is to alter the operating frequency to achieve optimum performance improvement. In this study, an aggressive technique which employs voltage or frequency varying hardware circuit with the time-borrowing feature is presented. The proposed technique double samples the data to detect any timing violations as the frequency/voltage is scaled. The detected violations are masked by phase delaying the flip-flop clock to capture the late arrival data. This makes the system timing error tolerant without incurring error correction timing penalty. The proposed technique is implemented in a field programmable gate array using a two-stage arithmetic pipeline. Results on various benchmarks clearly demonstrate the achieved power savings and performance improvement.N/

Voltage stacking for near/sub-threshold operation

Author: Singh Kamlesh
Publication venue: Eindhoven University of Technology
Publication date: 21/10/2021
Field of study

inSense: A Variation and Fault Tolerant Architecture for Nanoscale Devices

Author: Frye John A
Publication venue: RIT Scholar Works
Publication date: 01/09/2015
Field of study

Transistor technology scaling has been the driving force in improving the size, speed, and power consumption of digital systems. As devices approach atomic size, however, their reliability and performance are increasingly compromised due to reduced noise margins, difficulties in fabrication, and emergent nano-scale phenomena. Scaled CMOS devices, in particular, suffer from process variations such as random dopant fluctuation (RDF) and line edge roughness (LER), transistor degradation mechanisms such as negative-bias temperature instability (NBTI) and hot-carrier injection (HCI), and increased sensitivity to single event upsets (SEUs). Consequently, future devices may exhibit reduced performance, diminished lifetimes, and poor reliability. This research proposes a variation and fault tolerant architecture, the inSense architecture, as a circuit-level solution to the problems induced by the aforementioned phenomena. The inSense architecture entails augmenting circuits with introspective and sensory capabilities which are able to dynamically detect and compensate for process variations, transistor degradation, and soft errors. This approach creates ``smart\u27\u27 circuits able to function despite the use of unreliable devices and is applicable to current CMOS technology as well as next-generation devices using new materials and structures. Furthermore, this work presents an automated prototype implementation of the inSense architecture targeted to CMOS devices and is evaluated via implementation in ISCAS \u2785 benchmark circuits. The automated prototype implementation is functionally verified and characterized: it is found that error detection capability (with error windows from

\approx

30-400ps) can be added for less than 2\% area overhead for circuits of non-trivial complexity. Single event transient (SET) detection capability (configurable with target set-points) is found to be functional, although it generally tracks the standard DMR implementation with respect to overheads

RIT Scholar Works

A Survey on Flip Flop Replacement to Latch on Various Design

Author: Murlidharan D
Vasumathi S.P
Publication venue: http://www.ijpam.eu
Publication date: 01/01/2018
Field of study

This paper presents survey for the replacement of flip flop to latches and the advantages of the latch based sequential design Flip flop are the major part of the design a sequential elements and this flip flop has more disadvantages as performance decreases and area increases. An alternate method to increase the performance and reduce the area size latches. Latches are used instead of flip flops in certain places to increase the performance and decrease the area

Linearization of The Timing Analysis and Optimization of Level-Sensitive Circuits

Author: Taskin Baris
Publication venue
Publication date: 08/05/2003
Field of study

This thesis describes a linear programming (LP) formulation applicable to the static timing analysis of large scale synchronous circuits with level-sensitive latches. The automatic timing analysis procedure presented here is composed of deriving the connectivity information, constructing the LP model and solving the clock period minimization problem of synchronous digital VLSI circuits. In synchronous circuits with level-sensitive latches, operation at a reduced clock period (higher clock frequency) is possible by takingadvantage of both non-zero clock skew scheduling and time borrowing. Clock skew schedulingis performed in order to exploit the benefits of nonidentical clock signal delays on circuit timing. The time borrowing property of level-sensitive circuits permits higher operating frequencies compared to edge-sensitivecircuits. Considering time borrowing in the timing analysis, however, introduces non-linearity in this timing analysis. The modified big M (MBM) method is defined in order to transform the non-linear constraints arising in the problem formulation into solvable linear constraints. Equivalent LP model problemsfor single-phase clock synchronization of the ISCAS'89 benchmark circuits are generated and these problems are solved by the industrial LP solver CPLEX. Through the simultaneous application of time borrowing and clock skew scheduling, up to 63% improvements are demonstrated in minimum clock period with respect to zero-skew edge-sensitive synchronous circuits. The timing constraints governing thelevel-sensitive synchronous circuit operation not only solve the clock period minimization problem but also provide a common framework for the general timing analysis of such circuits. The inclusion of additional constraints into the problem formulation in order to meet the timing requirements imposed by specific applicationenvironments is discussed

A Survey on Flip Flop Replacement to Latch on Various Design

Author: Murlidharan D
Vasumathi S.P
Publication venue: http://www.ijpam.eu
Publication date: 01/01/2018
Field of study

Power and area efficient clock stretching and critical path reshaping for error resilience

Author: Chang Alan
Jayakrishnan Mini
Kim Tony
Publication venue
Publication date: 20/01/2019
Field of study

Process, voltage and temperature variations are on the rise with technology scaling. Nano-scale technology requires huge design margins to ensure reliable operation. Worst case design margining consumes significant amount of circuits and systems resources. In-situ error detection or correction is an alternative method for cost effective variation tolerance. However, existing in-situ error detection and correction circuits are power and area hungry since they use speculative error management, which gives less power savings at higher error rates. This paper proposes an error resilience technique utilizing available slack in the design. The proposed method uses a clock stretching circuit to relax timing margins on selected critical paths that has sufficient consecutive stage slack. We also propose a power optimization method which reshapes the critical path logic proportionate to the consecutive stage slack. Experimental results show that the proposed method achieves the power and area savings of 40% and 8% respectively compared to the worst case design approach. When compared to the TIMBER error resilience approach, the proposed method saves power more than 74% and area more than 13% at design time. Document type: Articl

Scipedia