30 research outputs found
Reclaiming Fault Resilience and Energy Efficiency With Enhanced Performance in Low Power Architectures
Rapid developments of the AI domain has revolutionized the computing industry by the introduction of state-of-art AI architectures. This growth is also accompanied by a massive increase in the power consumption. Near-Theshold Computing (NTC) has emerged as a viable solution by offering significant savings in power consumption paving the way for an energy efficient design paradigm. However, these benefits are accompanied by a deterioration in performance due to the severe process variation and slower transistor switching at Near-Threshold operation. These problems severely restrict the usage of Near-Threshold operation in commercial applications. In this work, a novel AI architecture, Tensor Processing Unit, operating at NTC is thoroughly investigated to tackle the issues hindering system performance. Research problems are demonstrated in a scientific manner and unique opportunities are explored to propose novel design methodologies
Approximate Multiplier based on Low power and reduced latency with Modified LSB design
The devised approximation multiplier can adapt the precision and processing power needed formul triplication sat run-time based on the needs of the user. To decrease error distance, we also suggest a straight forward error compensation circuit. There are two types of approximate multi pliers. Dynamic voltages caling can be used for the first kind, which controls the timing route of the multiplier. If the voltage is lower, the critical path will take longer to complete. As a result, when the time path is violated, errors occurs and approximated results are produced. These cond types involves redesigning precise multiplier circuits like the Wallace Tree Multiplier and Dadda Tree Multiplier in order to change the functional behaviors of multipliers. Most of the earlier research on rebuilding multipliers suggested erroneous m-n compressors, which have m inputs and producen outputs. It dynamically reduces the area covered under the multiplier LSB which enables the MSB in accurate manner and LSB in approximate manner. This convolution al system approach is regarded to sequential cover up more than 32 bit multiplier. Since the accompanied circuit reduce then tire area by10times lesser than original multiplier, this conventional unit is regarded as abled circuit in the segment. Since the process of compressing partial products absorbed the majority of the multiplier energy and resulted in a consider able route delay, these incorrect compressors were utilized to compress the partial products within multiplication. These functionality are over come through our experimental setup
Investigation of Multiple-valued Logic Technologies for Beyond-binary Era
Computing technologies are currently based on the binary logic/number system, which is dependent on the
simple on and off switching mechanism of the prevailing transistors. With the exponential increase of data
processing and storage needs, there is a strong push to move to a higher radix logic/number system that
can eradicate or lessen many limitations of the binary system. Anticipated saturation of Moore’s law and
the necessity to increase information density and processing speed in the future micro and nanoelectronic
circuits and systems provide a strong background and motivation for the beyond-binary logic system. In this
review article, different technologies for Multiple-valued-Logic (MVL) devices and the associated prospects
and constraints are discussed. The feasibility of the MVL system in real-world applications rests on resolving
two major challenges: (i) development of an efficient mathematical approach to implement the MVL logic
using available technologies, and (ii) availability of effective synthesis techniques. This review of different
technologies for the MVL system is intended to perform a comprehensive investigation of various MVL technologies and a comparative analysis of the feasible approaches to implement MVL devices, especially ternary
logic
Building Efficient and Reliable Emerging Technology Systems
The semiconductor industry has been reaping the benefits of Moore’s law powered by Dennard’s voltage scaling for the past fifty years. However, with the end of Dennard scaling, silicon chip manufacturers are facing a widespread plateau in performance improvements. While the architecture community has focused its effort on exploring parallelism, such as with multi-core, many-core and accelerator-based systems, chip manufacturers have been forced to explore beyond-Moore technologies to improve performance while maintaining power density. Examples of such technologies include monolithic 3D integration, carbon nanotube transistors, tunneling-based transistors, spintronics and quantum computing. However, the infancy of the manufacturing process of these new technologies impedes their usage in commercial products. The goal of this dissertation is to combine both architectural and device-level efforts to provide solutions across the computing stack that can overcome the reliability concerns of emerging technologies. This allows for beyond-Moore systems to compete with highly optimized silicon-based processors, thus, enabling faster commercialization of such systems. This dissertation proposes the following key steps: (i) Multifaceted understanding and modeling of variation and yield issues that occur in emerging technologies, such as carbon nanotube transistors (CNFETs). (ii) Design of systems using suitable logic families such as pass transistor logic that provide high performance. (iii) Design of a multi-granular fault-tolerant reconfigurable architecture that enhances yield and performance. (iv) Design of a multi-technology, multi-accelerator heterogeneous system (v) Development of real-time constrained efficient workload scheduling mechanism for heterogeneous systems. This dissertation first presents the use of pass transistor logic family as an alternate to the CMOS logic family for CNFETs to improve performance. It explores various architectural design choices for CNFETs using pass transistor logic (PTL) to create an energy-efficient RISC-V processor. Our results show that while a CNFET RISC-V processor using CMOS logic achieves a 2.9x energy-delay product (EDP) improvement over a silicon design, using PTL along the critical path components of the processor can boost EDP improvement by 5x as well as reduce area by 17% over 16 nm silicon CMOS. This document further builds on providing fault-tolerant and yield enhancing solutions for emerging 3D integration compatible technologies in the context of CNFETs. The proposed framework can efficiently support high-variation technologies by providing protection against manufacturing defects at multiple granularities: module and pipeline-stage levels. Based on the variation observed in a synthesized design, a reliable CNFET-based 3D multi-granular reconfigurable architecture, 3DTUBE, is presented to overcome the manufacturing difficulties. For 0.4-0.7 V, 3DTUBE provides up to 6.0x higher throughput and 3.1x lower EDP compared to a silicon-based multi-core design evaluated at 1 part per billion transistor failure rate, which is 10,000x lower in comparison to CNFET’s failure rate. This dissertation then ventures into building multi-accelerator heterogeneous systems and real-time schedulers that cater to the requirements of the applications while taking advantage of the underlying heterogeneous system. We introduce optimizations like task pruning, hierarchical hetero-ranking and rank update built upon two scheduler policies (MS-static and MS-dynamic), that result in a performance improvement of 3.5x (average) for real-world autonomous vehicle applications, when compared against state-of-the-art schedulers. Adopting insights from the above work, this thesis presents a multi-accelerator, multi-technology heterogeneous system powered by a multi-constrained scheduler that optimizes for varying task requirements to achieve up to 6.1x better energy over a baseline silicon-based system.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169699/1/aporvaa_1.pd
Multiple-valued logic: technology and circuit implementation
Title from PDF of title page, viewed March 1, 2023Dissertation advisors: Masud H. Chowdhury and Yugyung LeeVitaIncludes bibliographical references (pages 91-107)Dissertation (Ph.D.)--Department of Computer Science and Electrical Engineering. University of Missouri--Kansas City, 2021Computing technologies are currently based on the binary logic/number system, which is dependent on the simple on and off switching mechanism of the prevailing transistors. With the exponential increase of data processing and storage needs, there is a strong push to move to a higher radix logic/number system that can eradicate or lessen many limitations of the binary system. Anticipated saturation of Moore's law and the necessity to increase information density and processing speed in the future micro and nanoelectronic circuits and systems provide a strong background and motivation for the beyond-binary logic system. During this project, different technologies for Multiple-Valued-Logic (MVL) devices and the associated prospects and constraints are discussed. The feasibility of the MVL system in real-world applications rests on resolving two major challenges: (i) development of an efficient mathematical approach to implement the MVL logic using available technologies and (ii) availability of effective synthesis techniques. The main part of this project can be divided into two categories: (i) proposing different novel and efficient design for various logic and arithmetic circuits such as inverter, NAND, NOR, adder, multiplexer etc. (ii) proposing different fast and efficient design for various sequential and memory circuits. For the operation of the device, two of the very promising emerging technologies are used: Graphene Nanoribbon Field Effect Transistor (GNRFET) and Carbon Nano Tube Field Effect Transistor (CNTFET). A comparative analysis of the proposed designs and several state-of-the-art designs are also given in all the cases in terms of delay, total power, and power-delay-product (PDP). The simulation and analysis are performed using the H-SPICE tool with a GNRFET model available on the Nanohub website and CNTFET model available from Standford University website.Introduction -- Fundamentals and scope of multiple valued logic -- Technological aspect of multiple valued logic circuit -- Ternary logic gates using Graphene Nano Ribbon Field Effect Transistor (GNRFET) -- Ternary arithmetic circuits using Graphene Nano Ribbon Field Effect Transistor (GNRFET) -- Ternary sequential circuits using Graphene Nano Ribbon Field Effect Transistor (GNRFET) -- Ternary memory circuits using Carbon Nano Tube Field Effect Transistor (CNTFET) -- Conclusions & future wor
Appropriateness of Imperfect CNFET Based Circuits for Error Resilient Computing Systems
With superior device performance consistently reported in extremely scaled dimensions, low dimensional materials (LDMs), including Carbon Nanotube Field Effect Transistor (CNFET) based technology, have shown the potential to outperform silicon for future transistors in advanced technology nodes. Studies have also demonstrated orders of magnitude improvement in energy efficiency possible with LDMs, in comparison to silicon at competing technology nodes. However, the current fabrication processes for these materials suffer from process imperfections and still appear to be inadequate to compete with silicon for the mainstream high volume manufacturing. Among the LDMs, CNFETs are the most widely studied and closest to high volume manufacturing. Recent works have shown a significant increase in the complexity of CNFET based systems, including demonstration of a 16-bit microprocessor. However, the design of such systems has involved significantly wider-than-usual transistors and avoidance of certain logic combinations. The resulting complexity of several thousand transistors in such systems is still far from the requirements of high-performance general-purpose computing systems having billions of transistors. With the current progress of the process to fabricate CNFETs, their introduction in mainstream manufacturing is expected to take several more years. For an earlier technology adoption, CNFETs appear to be suited for error-resilient computing systems where errors during computation can be tolerated to a certain degree. Such systems relax the need for precise circuits and a perfect process while leveraging the potential energy benefits of CNFET technology in comparison to conventional Si technology. In this thesis, we explore the potential applications using an imperfect CNFET process for error-resilient computing systems, including the impact of the process imperfections at the system level and methods to improve it.
The current most widely adopted fabrication process for CNFETs (separation and placement of solution-based CNTs) still suffers from process imperfections, mainly from open CNTs due to missing of CNTs (in trenches connecting source and drain of CNFET). A fair evaluation of the performance of CNFET based circuits should thus take into consideration the effect of open CNTs, resulting in reduced drive currents. At the circuit level, this leads to failures in meeting 1) the minimum frequency requirement (due to an increase in critical path delay), and 2) the noise suppression requirement. We present a methodology to accurately capture the effect of open CNT imperfection in the state-of-the-art CNFET model, for circuit-level performance evaluation (both delay and glitch vulnerability) of CNFET based circuits using SPICE. A Monte Carlo simulation framework is also provided to investigate the statistical effect of open CNT imperfection on circuit-level performance. We introduce essential metrics to evaluate glitch vulnerability and also provide an effective link between glitch vulnerability and circuit topology.
The past few years have observed significant growth of interest in approximate computing for a wide range of applications, including signal processing, data mining, machine learning, image, video processing, etc. In such applications, the result quality is not compromised appreciably, even in the presence of few errors during computation. The ability to tolerate few errors during computation relaxes the need to have precise circuits. Thus the approximate circuits can be designed, with lesser nodes, reduced stages, and reduced capacitance at few nodes. Consequently, the approximate circuits could reduce critical path delays and enhanced noise suppression in comparison to precise circuits. We present a systematic methodology utilizing Reduced Ordered Binary Decision Diagrams (ROBDD) for generating approximate circuits by taking an example of 16-bit parallel prefix CNFET adder. The approximate adder generated using the proposed algorithm has ~ 5x reduction in the average number of nodes failing glitch criteria (along paths to primary output) and 43.4% lesser Energy Delay Product (EDP) even at high open CNT imperfection, in comparison to the ideal case of no open CNT imperfection, at a mean relative error of 3.3%.
The recent boom of deep learning has been made possible by VLSI technology advancement resulting in hardware systems, which can support deep learning algorithms. These hardware systems intend to satisfy the high-energy efficiency requirement of such algorithms. The hardware supporting such algorithms adopts neuromorphic-computing architectures with significantly less energy compared to traditional Von Neumann architectures. Deep Neural Networks (DNNs) belonging to deep learning domain find its use in a wide range of applications such as image classification, speech recognition, etc. Recent hardware systems have demonstrated the implementation of complex neural networks at significantly less power. However, the complexity of applications and depths of DNNs are expected to drastically increase in the future, imposing a demanding requirement in terms of scalability and energy efficiency of hardware technology. CNFET technology can be an excellent alternative to meet the aggressive energy efficiency requirement for future DNNs. However, degradation in circuit-level performance due to open CNT imperfection can result in timing failure, thus distorting the shape of non-linear activation function, leading to a significant degradation in classification accuracy. We present a framework to obtain sigmoid activation function considering the effect of open CNT imperfection. A digital neuron is explored to generate the sigmoid activation function, which deviates from the ideal case under imperfect process and reduced time period (increased clock frequency). The inherent error resilience of DNNs, on the other hand, can be utilized to mitigate the impact of imperfect process and maintain the shape of the activation function. We use pruning of synaptic weights, which, combined with the proposed approximate neuron, significantly reduces the chance of timing failures and helps to maintain the activation function shape even at high process imperfection and higher clock frequencies. We also provide a framework to obtain classification accuracy of Deep Belief Networks (class of DNNs based on unsupervised learning) using the activation functions obtained from SPICE simulations. By using both approximate neurons and pruning of synaptic weights, we achieve excellent system accuracy (only < 0.5% accuracy drop) with 25% improvement in speed, significant EDP advantage (56.7% less) even at high process imperfection, in comparison to a base configuration of the precise neuron and no pruning with the ideal process, at no area penalty.
In conclusion, this thesis provides directions for the potential applicability of CNFET based technology for error-resilient computing systems. For this purpose, we present methodologies, which provide approaches to assess and design CNFET based circuits, considering process imperfections. We accomplish a DBN framework for digit recognition, considering activation functions from SPICE simulations incorporating process imperfections. We demonstrate the effectiveness of using approximate neuron and synaptic weight pruning to mitigate the impact of high process imperfection on system accuracy
Design of Ternary Logic and Arithmetic Circuits Using GNRFET
Multiple valued logic (MVL) can represent an exponentially higher number of data/information compared to the binary logic for the same number of logic bits. Compared to the conventional and other
emerging device technologies, Graphene Nano Ribbon Field Effect Transistor (GNRFET) appears to be very promising for designing MVL logic gates and arithmetic circuits due to some exceptional electrical properties of the GNRFET, e.g., the ability to control the threshold voltage by changing the width of the GNR. Variation of the threshold voltage is one of the prescribed techniques to achieve multiple voltage levels to implement the MVL circuit. This paper introduces a design approach for ternary logic gates and circuits using MOS-type GNRFET. The designs of basic ternary logic gates like inverters, NAND, NOR, and ternary arithmetic circuits like the ternary decoder, 3:1 multiplexer, and ternary half-adder are demonstrated using GNRFET. A comparative analysis of the GNRFET based ternary logic gates and circuits and those based on the conventional CMOS and CNTFET technologies is performed using delay, total power, and power-delay-product (PDP) as the metrics. The simulation and analysis are performed using the H-SPICE tool with a GNRFET model available on the Nanohub website
Heterogeneous Reconfigurable Fabrics for In-circuit Training and Evaluation of Neuromorphic Architectures
A heterogeneous device technology reconfigurable logic fabric is proposed which leverages the cooperating advantages of distinct magnetic random access memory (MRAM)-based look-up tables (LUTs) to realize sequential logic circuits, along with conventional SRAM-based LUTs to realize combinational logic paths. The resulting Hybrid Spin/Charge FPGA (HSC-FPGA) using magnetic tunnel junction (MTJ) devices within this topology demonstrates commensurate reductions in area and power consumption over fabrics having LUTs constructed with either individual technology alone. Herein, a hierarchical top-down design approach is used to develop the HSCFPGA starting from the configurable logic block (CLB) and slice structures down to LUT circuits and the corresponding device fabrication paradigms. This facilitates a novel architectural approach to reduce leakage energy, minimize communication occurrence and energy cost by eliminating unnecessary data transfer, and support auto-tuning for resilience. Furthermore, HSC-FPGA enables new advantages of technology co-design which trades off alternative mappings between emerging devices and transistors at runtime by allowing dynamic remapping to adaptively leverage the intrinsic computing features of each device technology. HSC-FPGA offers a platform for fine-grained Logic-In-Memory architectures and runtime adaptive hardware. An orthogonal dimension of fabric heterogeneity is also non-determinism enabled by either low-voltage CMOS or probabilistic emerging devices. It can be realized using probabilistic devices within a reconfigurable network to blend deterministic and probabilistic computational models. Herein, consider the probabilistic spin logic p-bit device as a fabric element comprising a crossbar-structured weighted array. The Programmability of the resistive network interconnecting p-bit devices can be achieved by modifying the resistive states of the array\u27s weighted connections. Thus, the programmable weighted array forms a CLB-scale macro co-processing element with bitstream programmability. This allows field programmability for a wide range of classification problems and recognition tasks to allow fluid mappings of probabilistic and deterministic computing approaches. In particular, a Deep Belief Network (DBN) is implemented in the field using recurrent layers of co-processing elements to form an n x m1 x m2 x ::: x mi weighted array as a configurable hardware circuit with an n-input layer followed by i ≥ 1 hidden layers. As neuromorphic architectures using post-CMOS devices increase in capability and network size, the utility and benefits of reconfigurable fabrics of neuromorphic modules can be anticipated to continue to accelerate
Novel Ternary Logic Gates Design in Nanoelectronics
In this paper, standard ternary logic gates are initially designed to considerably reduce static power consumption. This study proposes novel ternary gates based on two supply voltages in which the direct current is eliminated and the leakage current is reduced considerably. In addition, ST-OR and ST-AND are generated directly instead of ST-NAND and ST-NOR. The proposed gates have a high noise margin near V_(DD)/4. The simulation results indicated that the power consumption and PDP underwent a~sharp decrease and noise margin showed a considerable increase in comparison to both one supply and two supply based designs in previous works. PDP is improved in the proposed OR, as compared to one supply and two supply based previous works about 83% and 63%, respectively. Also, a memory cell is designed using the proposed STI logic gate, which has a considerably lower static power to store logic ‘1’ and the static noise margin, as compared to other designs