324 research outputs found

    Energy consumption by reversible circuits in the 130 nm and 65 nm nodes

    Get PDF
    We show that both 130 nm and 65 nm technologies are suitable for reversible computation

    Power-efficient design of 16-bit mixed-operand multipliers

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 53).Multiplication is an expensive and slow arithmetic operation, which plays an important role in many DSP algorithms. It usually lies in the critical-delay paths, having an effect on performance of the system as well as consuming large power. Consequently, significant improvements in both power and performance can be achieved in the overall DSP system by carefully designing and optimizing power and performance of the multiplier. This thesis explores several circuit-level techniques for power-efficiently designing multipliers, including supply voltage reduction, efficient multiplication algorithms, low power circuit logic styles, and transistor sizing using dynamic and static tuners. Based on these techniques, several 16-bit multipliers have been successfully designed and implemented in 0.13[micro]m CMOS technology at the supply voltage of 1.5V and 0.9V. The multipliers are modified to handle multiplications of two 16-bit operands in which each can be either signed magnitude or two's complement formats. Examining power-performance characteristics of these multipliers reveals that both array and tree structures are feasible solutions for designing 16-bit multipliers, and complementary CMOS and single-ended CPL-TG logics are promising candidates for power-efficient design. The appropriate choices of structures and logic styles depend on power and performance constraints of the particular design.by Sataporn Pornpromlikit.M.Eng

    On The Design Of Low-Complexity High-Speed Arithmetic Circuits In Quantum-Dot Cellular Automata Nanotechnology

    Get PDF
    For the last four decades, the implementation of very large-scale integrated systems has largely based on complementary metal-oxide semiconductor (CMOS) technology. However, this technology has reached its physical limitations. Emerging nanoscale technologies such as quantum-dot cellular automata (QCA), single electron tunneling (SET), and tunneling phase logic (TPL) are major candidate for possible replacements of CMOS. These nanotechnologies use majority and/or minority logic and inverters as circuit primitives. In this dissertation, a comprehensive methodology for majority/minority logic networks synthesis is developed. This method is capable of processing any arbitrary multi-output Boolean function to nd its equivalent optimal majority logic network targeting to optimize either the number of gates or levels. The proposed method results in different primary equivalent majority expression networks. However, the most optimized network will be generated as a nal solution. The obtained results for 15 MCNC benchmark circuits show that when the number of majority gates is the rst optimization priority, there is an average reduction of 45.3% in the number of gates and 15.1% in the number of levels. They also show that when the rst priority is the number of levels, an average reduction of 23.5% in the number of levels and 43.1% in the number of gates is possible, compared to the majority AND/OR mapping method. These results are better compared to those obtained from the best existing methods. In this dissertation, our approach is to exploit QCA technology because of its capability to implement high-density, very high-speed switching and tremendously lowpower integrated systems and is more amenable to digital circuits design. In particular, we have developed algorithms for the QCA designs of various single- and multi-operation arithmetic arrays. Even though, majority/minority logic are the basic units in promising nanotechnologies, an XOR function can be constructed in QCA as a single device. The basic cells of the proposed arrays are developed based on the fundamental logic devices in QCA and a single-layer structure of the three-input XOR function. This process leads to QCA arithmetic circuits with better results in view of dierent aspects such as cell count, area, and latency, compared to their best counterparts. The proposed arrays can be formed in a pipeline manner to perform the arithmetic operations for any number of bits which could be quite valuable while considering the future design of large-scale QCA circuits

    A Study on Efficient Designs of Approximate Arithmetic Circuits

    Get PDF
    Approximate computing is a popular field where accuracy is traded with energy. It can benefit applications such as multimedia, mobile computing and machine learning which are inherently error resilient. Error introduced in these applications to a certain degree is beyond human perception. This flexibility can be exploited to design area, delay and power efficient architectures. However, care must be taken on how approximation compromises the correctness of results. This research work aims to provide approximate hardware architectures with error metrics and design metrics analyzed and their effects in image processing applications. Firstly, we study and propose unsigned array multipliers based on probability statistics and with approximate 4-2 compressors, full adders and half adders. This work deals with a new design approach for approximation of multipliers. The partial products of the multiplier are altered to introduce varying probability terms. Logic complexity of approximation is varied for the accumulation of altered partial products based on their probability. The proposed approximation is utilized in two variants of 16-bit multipliers. Synthesis results reveal that two proposed multipliers achieve power savings of 72% and 38% respectively compared to an exact multiplier. They have better precision when compared to existing approximate multipliers. Mean relative error distance (MRED) figures are as low as 7.6% and 0.02% for the proposed approximate multipliers, which are better than the previous state-of-the-art works. Performance of the proposed multipliers is evaluated with geometric mean filtering application, where one of the proposed models achieves the highest peak signal to noise ratio (PSNR). Second, approximation is proposed for signed Booth multiplication. Approximation is introduced in partial product generation and partial product accumulation circuits. In this work, three multipliers (ABM-M1, ABM-M2, and ABM-M3) are proposed in which the modified Booth algorithm is approximated. In all three designs, approximate Booth partial product generators are designed with different variations of approximation. The approximations are performed by reducing the logic complexity of the Booth partial product generator, and the accumulation of partial products is slightly modified to improve circuit performance. Compared to the exact Booth multiplier, ABM-M1 achieves up to 15% reduction in power consumption with an MRED value of 7.9 × 10-4. ABM-M2 has power savings of up to 60% with an MRED of 1.1 × 10-1. ABM-M3 has power savings of up to 50% with an MRED of 3.4 × 10-3. Compared to existing approximate Booth multipliers, the proposed multipliers ABM-M1 and ABM-M3 achieve up to a 41% reduction in power consumption while exhibiting very similar error metrics. Image multiplication and matrix multiplication are used as case studies to illustrate the high performance of the proposed approximate multipliers. Third, distributed arithmetic based sum of products units approximation is analyzed. Sum of products units are key elements in many digital signal processing applications. Three approximate sum of products models which are based on distributed arithmetic are proposed. They are designed for different levels of accuracy. First model of approximate sum of products achieves an improvement up to 64% on area and 70% on power, when compared to conventional unit. Other two models provide an improvement of 32% and 48% on area and 54% and 58% on power, respectively, with a reduced error rate compared to the first model. Third model achieves MRED and normalized mean error distance (NMED) as low as 0.05% and 0.009%. Performance of approximate units is evaluated with a noisy image smoothing application, where the proposed models are capable of achieving higher PSNR than existing state of the art techniques. Fourth, approximation is applied in division architecture. Two approximation models are proposed for restoring divider. In the first design, approximation is performed at circuit level, where approximate divider cells are utilized in place of exact ones by simplifying the logic equations. In the second model, restoring divider is analyzed strategically and number of restoring divider cells are reduced by finding the portions of divisor and dividend with significant information. An approximation factor pp is used in both designs. In model 1, the design with p=8 has a 58% reduction in both area and power consumption compared to exact design, with a Q-MRED of 1.909 × 10-2 and Q-NMED of 0.449 × 10-2. The second model with an approximation factor p=4 has 54% area savings and 62% power savings compared to exact design. The proposed models are found to have better error metrics compared to existing designs, with better performance at similar error values. A change detection image processing application is used for real time assessment of proposed and existing approximate dividers and one of the models achieves a PSNR of 54.27 dB

    MULTIPAC, a multiple pool processor and computer for a spacecraft central data system, phase 2 Final report

    Get PDF
    MULTIPAC, multiple pool processor and computer for deep space probe central data syste

    Building Efficient and Reliable Emerging Technology Systems

    Full text link
    The semiconductor industry has been reaping the benefits of Moore’s law powered by Dennard’s voltage scaling for the past fifty years. However, with the end of Dennard scaling, silicon chip manufacturers are facing a widespread plateau in performance improvements. While the architecture community has focused its effort on exploring parallelism, such as with multi-core, many-core and accelerator-based systems, chip manufacturers have been forced to explore beyond-Moore technologies to improve performance while maintaining power density. Examples of such technologies include monolithic 3D integration, carbon nanotube transistors, tunneling-based transistors, spintronics and quantum computing. However, the infancy of the manufacturing process of these new technologies impedes their usage in commercial products. The goal of this dissertation is to combine both architectural and device-level efforts to provide solutions across the computing stack that can overcome the reliability concerns of emerging technologies. This allows for beyond-Moore systems to compete with highly optimized silicon-based processors, thus, enabling faster commercialization of such systems. This dissertation proposes the following key steps: (i) Multifaceted understanding and modeling of variation and yield issues that occur in emerging technologies, such as carbon nanotube transistors (CNFETs). (ii) Design of systems using suitable logic families such as pass transistor logic that provide high performance. (iii) Design of a multi-granular fault-tolerant reconfigurable architecture that enhances yield and performance. (iv) Design of a multi-technology, multi-accelerator heterogeneous system (v) Development of real-time constrained efficient workload scheduling mechanism for heterogeneous systems. This dissertation first presents the use of pass transistor logic family as an alternate to the CMOS logic family for CNFETs to improve performance. It explores various architectural design choices for CNFETs using pass transistor logic (PTL) to create an energy-efficient RISC-V processor. Our results show that while a CNFET RISC-V processor using CMOS logic achieves a 2.9x energy-delay product (EDP) improvement over a silicon design, using PTL along the critical path components of the processor can boost EDP improvement by 5x as well as reduce area by 17% over 16 nm silicon CMOS. This document further builds on providing fault-tolerant and yield enhancing solutions for emerging 3D integration compatible technologies in the context of CNFETs. The proposed framework can efficiently support high-variation technologies by providing protection against manufacturing defects at multiple granularities: module and pipeline-stage levels. Based on the variation observed in a synthesized design, a reliable CNFET-based 3D multi-granular reconfigurable architecture, 3DTUBE, is presented to overcome the manufacturing difficulties. For 0.4-0.7 V, 3DTUBE provides up to 6.0x higher throughput and 3.1x lower EDP compared to a silicon-based multi-core design evaluated at 1 part per billion transistor failure rate, which is 10,000x lower in comparison to CNFET’s failure rate. This dissertation then ventures into building multi-accelerator heterogeneous systems and real-time schedulers that cater to the requirements of the applications while taking advantage of the underlying heterogeneous system. We introduce optimizations like task pruning, hierarchical hetero-ranking and rank update built upon two scheduler policies (MS-static and MS-dynamic), that result in a performance improvement of 3.5x (average) for real-world autonomous vehicle applications, when compared against state-of-the-art schedulers. Adopting insights from the above work, this thesis presents a multi-accelerator, multi-technology heterogeneous system powered by a multi-constrained scheduler that optimizes for varying task requirements to achieve up to 6.1x better energy over a baseline silicon-based system.PHDElectrical and Computer EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169699/1/aporvaa_1.pd

    Memristor-Based Digital Systems Design and Architectures

    Get PDF
    Memristor is considered as a suitable alternative solution to resolve the scaling limitation of CMOS technology. In recent years, the use of memristors in circuits design has rapidly increased and attracted researcher’s interest. Advances have been made to both size and complexity of memristor designs. The development of CMOS transistors shows major concerns, such as, increased leakage power, reduced reliability, and high fabrication cost. These factors have affected chip manufacturing process and functionality severely. Therefore, the demand for new devices is increasing. Memristor, is considered as one of the key element in memory and information processing design due to its small size, long-term data storage, low power, and CMOS compatibility. The main objective in this research is to design memristor-based arithmetic circuits and to overcome some of the Memristor based logic design issues. In this thesis, a fast, low area and low power hybrid CMOS memristor based digital circuit design were implemented. Small and large-scale memristor based digital circuits are implemented and provided a solutions for overcoming the memristor degradation and fan-out challenges. As an example, a 4- bit LFSR has been implemented by using MRL scheme with 64 CMOS devices and 64 memristors. The proposed design is more efficient in terms of the area when compared with CMOS- based LFSR circuits. The simulation results proves the functionality of the design. This approach presents acceptable speed in comparison with CMOS-based design and it is faster than IMPLY-based memrisitive LFSR. The propped LFSR has 841 ps de-lay. Furthermore, the proposed design has a significant power reduction of over 66% less than CMOS-based approach. This thesis proposes implementation of memristive 2-D median filter and extends previously published works on memristive Filter design to include this emerging technology characteristics in image processing. The proposed circuit was designed based on Pt/TaOx/Ta redox-based device and Memristor Ratioed Logic (MRL). The proposed filter is designed in Cadence and the memristive median approved tested circuit is translated to Verilog-XL as a behavioral model. Different 512 _ 512 pixels input images contain salt and pepper noise with various noise density ratios are applied to the proposed median filter and the design successfully has substantially removed the noise. The implementation results in comparison with the conventional filters, it gives better Peak Signal to Noise Ratio (PSNR) and Mean Absolute Error (MAE) for different images with different noise density ratios while it saves more area as compared to CMOS-based design. This dissertation proposes a comprehensive framework for design, mapping and synthesis of large-scale memristor-CMOS circuits. This framework provides a synthesis approach that can be applied to all memristor-based digital logic designs. In particular, it is a proposal for a characterization methodology of memristor-based logic cells to generate a standard cell library for large scale simulation. The proposed framework is implemented in the Cadence Virtuoso schematic-level environment and was veri_ed with Verilog-XL, MATLAB, and the Electronic Design Automation (EDA) Synopses compiler after being translated to the behavioral level. The proposed method can be applied to implement any digital logic design. The frame work is deployed for design of the memristor-based parallel 8-bit adder/subtractor and a 2-D memristive-based median filter

    A full-custom digital-signal-processing unit for real-time cortical blood flow monitoring

    Get PDF
    Master'sMASTER OF ENGINEERIN
    corecore