440 research outputs found
Design of variation-tolerant synchronizers for multiple clock and voltage domains
PhD ThesisParametric variability increasingly affects the performance of electronic circuits as
the fabrication technology has reached the level of 32nm and beyond. These
parameters may include transistor Process parameters (such as threshold
voltage), supply Voltage and Temperature (PVT), all of which could have a
significant impact on the speed and power consumption of the circuit, particularly
if the variations exceed the design margins. As systems are designed with more
asynchronous protocols, there is a need for highly robust synchronizers and
arbiters. These components are often used as interfaces between communication
links of different timing domains as well as sampling devices for asynchronous
inputs coming from external components. These applications have created a need
for new robust designs of synchronizers and arbiters that can tolerate process,
voltage and temperature variations.
The aim of this study was to investigate how synchronizers and arbiters should be
designed to tolerate parametric variations. All investigations focused mainly on
circuit-level and transistor level designs and were modeled and simulated in the
UMC90nm CMOS technology process. Analog simulations were used to measure
timing parameters and power consumption along with a “Monte Carlo” statistical
analysis to account for process variations.
Two main components of synchronizers and arbiters were primarily investigated:
flip-flop and mutual-exclusion element (MUTEX). Both components can violate the
input timing conditions, setup and hold window times, which could cause
metastability inside their bistable elements and possibly end in failures. The
mean-time between failures is an important reliability feature of any synchronizer
delay through the synchronizer.
The MUTEX study focused on the classical circuit, in addition to a number of
tolerance, based on increasing internal gain by adding current sources, reducing
the capacitive loading, boosting the transconductance of the latch, compensating
the existing Miller capacitance, and adding asymmetry to maneuver the metastable
point. The results showed that some circuits had little or almost no improvements,
while five techniques showed significant improvements by reducing τ and
maintaining high tolerance.
Three design approaches are proposed to provide variation-tolerant
synchronizers. wagging synchronizer proposed to First, the is significantly
increase reliability over that of the conventional two flip-flop synchronizer. The
robustness of the wagging technique can be enhanced by using robust τ latches or
adding one more cycle of synchronization. The second approach is the
Metastability Auto-Detection and Correction (MADAC) latch which relies on swiftly
detecting a metastable event and correcting it by enforcing the previously stored
logic value. This technique significantly reduces the resolution time down from
uncertain
synchronization technique is proposed to transfer signals between Multiple-
Voltage Multiple-Clock Domains (MVD/MCD) that do not require conventional
level-shifters between the domains or multiple power supplies within each
domain. This interface circuit uses a synchronous set and feedback reset protocol
which provides level-shifting and synchronization of all signals between the
domains, from a wide range of voltage-supplies and clock frequencies.
Overall, synchronizer circuits can tolerate variations to a greater extent by
employing the wagging technique or using a MADAC latch, while MUTEX tolerance
can suffice with small circuit modifications. Communication between MVD/MCD
can be achieved by an asynchronous handshake
without a need for adding level-shifters.The Saudi Arabian Embassy in London,
Umm Al-Qura University, Saudi Arabi
Design of asynchronous microprocessor for power proportionality
PhD ThesisMicroprocessors continue to get exponentially cheaper for end users following Moore’s
law, while the costs involved in their design keep growing, also at an exponential rate.
The reason is the ever increasing complexity of processors, which modern EDA tools
struggle to keep up with. This makes further scaling for performance subject to a high
risk in the reliability of the system. To keep this risk low, yet improve the performance,
CPU designers try to optimise various parts of the processor. Instruction Set Architecture
(ISA) is a significant part of the whole processor design flow, whose optimal design
for a particular combination of available hardware resources and software requirements
is crucial for building processors with high performance and efficient energy utilisation.
This is a challenging task involving a lot of heuristics and high-level design decisions.
Another issue impacting CPU reliability is continuous scaling for power consumption. For
the last decades CPU designers have been mainly focused on improving performance, but
“keeping energy and power consumption in mind”. The consequence of this was a development
of energy-efficient systems, where energy was considered as a resource whose
consumption should be optimised. As CMOS technology was progressing, with feature
size decreasing and power delivered to circuit components becoming less stable, the
energy resource turned from an optimisation criterion into a constraint, sometimes a critical
one. At this point power proportionality becomes one of the most important aspects
in system design. Developing methods and techniques which will address the problem
of designing a power-proportional microprocessor, capable to adapt to varying operating
conditions (such as low or even unstable voltage levels) and application requirements in
the runtime, is one of today’s grand challenges. In this thesis this challenge is addressed
by proposing a new design flow for the development of an ISA for microprocessors, which
can be altered to suit a particular hardware platform or a specific operating mode. This
flow uses an expressive and powerful formalism for the specification of processor instruction
sets called the Conditional Partial Order Graph (CPOG). The CPOG model captures
large sets of behavioural scenarios for a microarchitectural level in a computationally
efficient form amenable to formal transformations for synthesis, verification and automated
derivation of asynchronous hardware for the CPU microcontrol. The feasibility of
the methodology, novel design flow and a number of optimisation techniques was proven
in a full size asynchronous Intel 8051 microprocessor and its demonstrator silicon. The
chip showed the ability to work in a wide range of operating voltage and environmental
conditions. Depending on application requirements and power budget our ASIC supports
several operating modes: one optimised for energy consumption and the other one for
performance. This was achieved by extending a traditional datapath structure with an
auxiliary control layer for adaptable and fault tolerant operation. These and other optimisations
resulted in a reconfigurable and adaptable implementation, which was proven
by measurements, analysis and evaluation of the chip.EPSR
A low-power cache system for high-performance processors
制度:新 ; 報告番号:甲3439号 ; 学位の種類:博士(工学) ; 授与年月日:12-Sep-11 ; 早大学位記番号:新576
Energy-efficient hardware design based on high-level synthesis
This dissertation describes research activities broadly concerning the area of High-level synthesis (HLS), but more specifically, regarding the HLS-based design of energy-efficient hardware (HW) accelerators. HW accelerators, mostly implemented on FPGAs, are integral to the heterogeneous architectures employed in modern high performance computing (HPC) systems due to their ability to speed up the execution while dramatically reducing the energy consumption of computationally challenging portions of complex applications. Hence, the first activity was regarding an HLS-based approach to directly execute an OpenCL code on an FPGA instead of its traditional GPU-based counterpart. Modern FPGAs offer considerable computational capabilities while consuming significantly smaller power as compared to high-end GPUs. Several different implementations of the K-Nearest Neighbor algorithm were considered on both FPGA- and GPU-based platforms and their performance was compared. FPGAs were generally more energy-efficient than the GPUs in all the test cases. Eventually, we were also able to get a faster (in terms of execution time) FPGA implementation by using an FPGA-specific OpenCL coding style and utilizing suitable HLS directives.
The second activity was targeted towards the development of a methodology complementing HLS to automatically derive power optimization directives (also known as "power intent") from a system-level design description and use it to drive the design steps after HLS, by producing a directive file written using the common power format (CPF) to achieve power shut-off (PSO) in case of an ASIC design. The proposed LP-HLS methodology reduces the design effort by enabling designers to infer low power information from the system-level description of a design rather than at the RTL. This methodology required a SystemC description of a generic power management module to describe the design context of a HW module also modeled in SystemC, along with the development of a tool to automatically produce the CPF file to accomplish PSO. Several test cases were considered to validate the proposed methodology and the results demonstrated its ability to correctly extract the low power information and apply it to achieve power optimization in the backend flow
TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge
Extreme edge devices or Internet-of-thing nodes require both ultra-low power
always-on processing as well as the ability to do on-demand sampling and
processing. Moreover, support for IoT applications like voice recognition,
machine monitoring, etc., requires the ability to execute a wide range of ML
workloads. This brings challenges in hardware design to build flexible
processors operating in ultra-low power regime. This paper presents TinyVers, a
tiny versatile ultra-low power ML system-on-chip to enable enhanced
intelligence at the Extreme Edge. TinyVers exploits dataflow reconfiguration to
enable multi-modal support and aggressive on-chip power management for
duty-cycling to enable smart sensing applications. The SoC combines a RISC-V
host processor, a 17 TOPS/W dataflow reconfigurable ML accelerator, a 1.7
W deep sleep wake-up controller, and an eMRAM for boot code and ML
parameter retention. The SoC can perform up to 17.6 GOPS while achieving a
power consumption range from 1.7 W-20 mW. Multiple ML workloads aimed for
diverse applications are mapped on the SoC to showcase its flexibility and
efficiency. All the models achieve 1-2 TOPS/W of energy efficiency with power
consumption below 230 W in continuous operation. In a duty-cycling use
case for machine monitoring, this power is reduced to below 10 W.Comment: Accepted in IEEE Journal of Solid-State Circuit
Design Automation and Application for Emerging Reconfigurable Nanotechnologies
In the last few decades, two major phenomena have revolutionized the electronic industry – the ever-increasing dependence on electronic circuits and the Complementary Metal Oxide Semiconductor (CMOS) downscaling. These two phenomena have been complementing each other in a way that while electronics, in general, have demanded more computations per functional unit, CMOS downscaling has aptly supported such needs. However, while the computational demand is still rising exponentially, CMOS downscaling is reaching its physical limits. Hence, the need to explore viable emerging nanotechnologies is more imperative than ever. This thesis focuses on streamlining the existing design automation techniques for a class of emerging reconfigurable nanotechnologies. Transistors based on this technology exhibit duality in conduction, i.e. they can be configured dynamically either as a p-type or an n-type device on the application of an external bias. Owing to this dynamic reconfiguration, these transistors are also referred to as Reconfigurable Field-Effect Transistors (RFETs).
Exploring and developing new technologies just like CMOS, require tackling two main challenges – first, design automation flow has to be modified to enable tailor- made circuit designs. Second, possible application opportunities should be explored where such technologies can outsmart the existing CMOS technologies. This thesis targets the above two objectives for emerging reconfigurable nanotechnologies by proposing approaches for enabling an Electronic Design Automation (EDA) flow for circuits based on RFETs and exploring hardware security as an application that exploits the transistor-level dynamic reconfiguration offered by this technology.
This thesis explains the bottom-up approach adopted to propose a logic synthesis flow by identifying new logic gates and circuit design paradigms that can particularly exploit the dynamic reconfiguration offered by these novel nanotechnologies. This led to the subsequent need of finding natural Boolean logic abstraction for emerging reconfigurable nanotechnologies as it is shown that the existing abstraction of negative unate logic for CMOS technologies is sub-optimal for RFETs-based circuits. In this direction, it has been shown that duality in Boolean logic is a natural abstraction for this technology and can truly represent the duality in conduction offered by individual transistors. Finding this abstraction paved the way for defining suitable primitives and proposing various algorithms for logic synthesis and technology mapping.
The following step is to explore compatible physical synthesis flow for emerging reconfigurable nanotechnologies. Using silicon nanowire-based RFETs, .lef and .lib files have been provided which can provide an end-to-end flow to generate .GDSII file for circuits exclusively based on RFETs. Additionally, new approaches have been explored to improve placement and routing for circuits based on reconfigurable nanotechnologies. It has been demonstrated how these approaches led to superior results as compared to the native flow meant for CMOS.
Lastly, the unique property of transistor-level reconfiguration offered by RFETs is utilized to implement efficient Intellectual Property (IP) protection schemes against adversarial attacks. The ability to control the conduction of individual transistors can be argued as one of the impactful features of this technology and suitably fits into the paradigm of security measures. Prior security schemes based on CMOS technology often come with large overheads in terms of area, power, and delay. In contrast, RFETs-based hardware security measures such as logic locking, split manufacturing, etc. proposed in this thesis, demonstrate affordable security solutions with low overheads.
Overall, this thesis lays a strong foundation for the two main objectives – design automation, and hardware security as an application, to push emerging reconfigurable nanotechnologies for commercial integration. Additionally, contributions done in this thesis are made available under open-source licenses so as to foster new research directions and collaborations.:Abstract
List of Figures
List of Tables
1 Introduction
1.1 What are emerging reconfigurable nanotechnologies?
1.2 Why does this technology look so promising?
1.3 Electronics Design Automation
1.4 The game of see-saw: key challenges vs benefits for emerging reconfigurable nanotechnologies
1.4.1 Abstracting ambipolarity in logic gate designs
1.4.2 Enabling electronic design automation for RFETs
1.4.3 Enhanced functionality: a suitable fit for hardware security applications
1.5 Research questions
1.6 Entire RFET-centric EDA Flow
1.7 Key Contributions and Thesis Organization
2 Preliminaries
2.1 Reconfigurable Nanotechnology
2.1.1 1D devices
2.1.2 2D devices
2.1.3 Factors favoring circuit-flexibility
2.2 Feasibility aspects of RFET technology
2.3 Logic Synthesis Preliminaries
2.3.1 Circuit Model
2.3.2 Boolean Algebra
2.3.3 Monotone Function and the property of Unateness
2.3.4 Logic Representations
3 Exploring Circuit Design Topologies for RFETs
3.1 Contributions
3.2 Organization
3.3 Related Works
3.4 Exploring design topologies for combinational circuits: functionality-enhanced logic gates
3.4.1 List of Combinational Functionality-Enhanced Logic Gates based on RFETs
3.4.2 Estimation of gate delay using the logical effort theory
3.5 Invariable design of Inverters
3.6 Sequential Circuits
3.6.1 Dual edge-triggered TSPC-based D-flip flop
3.6.2 Exploiting RFET’s ambipolarity for metastability
3.7 Evaluations
3.7.1 Evaluation of combinational logic gates
3.7.2 Novel design of 1-bit ALU
3.7.3 Comparison of the sequential circuit with an equivalent CMOS-based design
3.8 Concluding remarks
4 Standard Cells and Technology Mapping
4.1 Contributions
4.2 Organization
4.3 Related Work
4.4 Standard cells based on RFETs
4.4.1 Interchangeable Pull-Up and Pull-Down Networks
4.4.2 Reconfigurable Truth-Table
4.5 Distilling standard cells
4.6 HOF-based Technology Mapping Flow for RFETs-based circuits
4.6.1 Area adjustments through inverter sharings
4.6.2 Technology Mapping Flow
4.6.3 Realizing Parameters For The Generic Library
4.6.4 Defining RFETs-based Genlib for HOF-based mapping
4.7 Experiments
4.7.1 Experiment 1: Distilling standard-cells from a benchmark suite
4.7.2 Experiment 2A: HOF-based mapping .
4.7.3 Experiment 2B: Using the distilled standard-cells during mapping
4.8 Concluding Remarks
5 Logic Synthesis with XOR-Majority Graphs
5.1 Contributions
5.2 Organization
5.3 Motivation
5.4 Background and Preliminaries
5.4.1 Terminologies
5.4.2 Self-duality in NPN classes
5.4.3 Majority logic synthesis
5.4.4 Earlier work on XMG
5.4.5 Classification of Boolean functions
5.5 Preserving Self-Duality
5.5.1 During logic synthesis
5.5.2 During versatile technology mapping
5.6 Advanced Logic synthesis techniques
5.6.1 XMG resubstitution
5.6.2 Exact XMG rewriting
5.7 Logic representation-agnostic Mapping
5.7.1 Versatile Mapper
5.7.2 Support of supergates
5.8 Creating Self-dual Benchmarks
5.9 Experiments
5.9.1 XMG-based Flow
5.9.2 Experimental Setup
5.9.3 Synthetic self-dual benchmarks
5.9.4 Cryptographic benchmark suite
5.10 Concluding remarks and future research directions
6 Physical synthesis flow and liberty generation
6.1 Contributions
6.2 Organization
6.3 Background and Related Work
6.3.1 Related Works
6.3.2 Motivation
6.4 Silicon Nanowire Reconfigurable Transistors
6.5 Layouts for Logic Gates
6.5.1 Layouts for Static Functional Logic Gates
6.5.2 Layout for Reconfigurable Logic Gate
6.6 Table Model for Silicon Nanowire RFETs
6.7 Exploring Approaches for Physical Synthesis
6.7.1 Using the Standard Place & Route Flow
6.7.2 Open-source Flow
6.7.3 Concept of Driver Cells
6.7.4 Native Approach
6.7.5 Island-based Approach
6.7.6 Utilization Factor
6.7.7 Placement of the Island on the Chip
6.8 Experiments
6.8.1 Preliminary comparison with CMOS technology
6.8.2 Evaluating different physical synthesis approaches
6.9 Results and discussions
6.9.1 Parameters Which Affect The Area
6.9.2 Use of Germanium Nanowires Channels
6.10 Concluding Remarks
7 Polymporphic Primitives for Hardware Security
7.1 Contributions
7.2 Organization
7.3 The Shift To Explore Emerging Technologies For Security
7.4 Background
7.4.1 IP protection schemes
7.4.2 Preliminaries
7.5 Security Promises
7.5.1 RFETs for logic locking (transistor-level locking)
7.5.2 RFETs for split manufacturing
7.6 Security Vulnerabilities
7.6.1 Realization of short-circuit and open-circuit scenarios in an RFET-based inverter
7.6.2 Circuit evaluation on sub-circuits
7.6.3 Reliability concerns: A consequence of short-circuit scenario
7.6.4 Implication of the proposed security vulnerability
7.7 Analytical Evaluation
7.7.1 Investigating the security promises
7.7.2 Investigating the security vulnerabilities
7.8 Concluding remarks and future research directions
8 Conclusion
8.1 Concluding Remarks
8.2 Directions for Future Work
Appendices
A Distilling standard-cells
B RFETs-based Genlib
C Layout Extraction File (.lef) for Silicon Nanowire-based RFET
D Liberty (.lib) file for Silicon Nanowire-based RFET
- …