429 research outputs found

    Low power predictable memory and processing architectures

    Get PDF
    Great demand in power optimized devices shows promising economic potential and draws lots of attention in industry and research area. Due to the continuously shrinking CMOS process, not only dynamic power but also static power has emerged as a big concern in power reduction. Other than power optimization, average-case power estimation is quite significant for power budget allocation but also challenging in terms of time and effort. In this thesis, we will introduce a methodology to support modular quantitative analysis in order to estimate average power of circuits, on the basis of two concepts named Random Bag Preserving and Linear Compositionality. It can shorten simulation time and sustain high accuracy, resulting in increasing the feasibility of power estimation of big systems. For power saving, firstly, we take advantages of the low power characteristic of adiabatic logic and asynchronous logic to achieve ultra-low dynamic and static power. We will propose two memory cells, which could run in adiabatic and non-adiabatic mode. About 90% dynamic power can be saved in adiabatic mode when compared to other up-to-date designs. About 90% leakage power is saved. Secondly, a novel logic, named Asynchronous Charge Sharing Logic (ACSL), will be introduced. The realization of completion detection is simplified considerably. Not just the power reduction improvement, ACSL brings another promising feature in average power estimation called data-independency where this characteristic would make power estimation effortless and be meaningful for modular quantitative average case analysis. Finally, a new asynchronous Arithmetic Logic Unit (ALU) with a ripple carry adder implemented using the logically reversible/bidirectional characteristic exhibiting ultra-low power dissipation with sub-threshold region operating point will be presented. The proposed adder is able to operate multi-functionally

    Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

    Get PDF
    How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

    Comparative Study on Performance and Variation Tolerance of Low Power Circuit

    Get PDF
    The demand for low-power electronic devices is increasing rapidly in current VLSI technology. Instead of conventional CMOS circuit operating at nominal supply voltage, several kinds of circuits are brought about with the goal of reducing power consumption. This research is mainly focused on evaluating performance, power and variation tolerance of near/sub-threshold computing and adiabatic logic circuits. Arithmetic logic units (ALUs) are designed with 15nm FinFET process technologies for these circuit styles. The evaluation is carried out by simulations on these ALU designs. The variation model considers ambient temperature variations and power supply fluctuations that emulate wireless sensor node applications. The results shows that conventional static CMOS circuit operating in near-threshold region exhibits similar power efficiency with adiabatic logic circuit operating in the same region, while at the same time it bears better temperature and voltage variation tolerance in most of the cases. The study results provide helpful guidance to low-power electronic system designs

    Adiabatic Approach for Low-Power Passive Near Field Communication Systems

    Get PDF
    This thesis tackles the need of ultra-low power electronics in the power limited passive Near Field Communication (NFC) systems. One of the techniques that has proven the potential of delivering low power operation is the Adiabatic Logic Technique. However, the low power benefits of the adiabatic circuits come with the challenges due to the absence of single opinion on the most energy efficient adiabatic logic family which constitute appropriate trade-offs between computation time, area and complexity based on the circuit and the power-clocking schemes. Therefore, five energy efficient adiabatic logic families working in single-phase, 2-phase and 4-phase power-clocking schemes were chosen. Since flip-flops are the basic building blocks of any sequential circuit and the existing flip-flops are MUX-based (having more transistors) design, therefore a novel single-phase, 2-phase and 4-phase reset based flip-flops were proposed. The performance of the multi-phase adiabatic families was evaluated and compared based on the design examples such as 2-bit ring counter, 3-bit Up-Down counter and 16-bit Cyclic Redundancy Check (CRC) circuit (benchmark circuit) based on ISO 14443-3A standard. Several trade-offs, design rules, and an appropriate range for the supply voltage scaling for multi-phase adiabatic logic are proposed. Furthermore, based on the NFC standard (ISO 14443-3A), data is frequently encoded using Manchester coding technique before transmitting it to the reader. Therefore, if Manchester encoding can be implemented using adiabatic logic technique, energy benefits are expected. However, adiabatic implementation of Manchester encoding presents a challenge. Therefore, a novel method for implementing Manchester encoding using adiabatic logic is proposed overcoming the challenges arising due to the AC power-clock. Other challenges that come with the dynamic nature of the adiabatic gates and the complexity of the 4-phase power-clocking scheme is in synchronizing the power-clock v phases and the time spent in designing, validation and debugging of errors. This requires a specific modelling approach to describe the adiabatic logic behaviour at the higher level of abstraction. However, describing adiabatic logic behaviour using Hardware Description Languages (HDLs) is a challenging problem due to the requirement of modelling the AC power-clock and the dual-rail inputs and outputs. Therefore, a VHDL-based modelling approach for the 4-phase adiabatic logic technique is developed for functional simulation, precise timing analysis and as an improvement over the previously described approaches

    A 2x2 bit multiplier using hybrid 13t full adder with vedic mathematics method

    Get PDF
    Various arithmetic circuits such as multipliers require full adder (FA) as the main block for the circuit to operate. Speed and energy consumption become very vital in design consideration for a low power adder. In this paper, a 2x2 bit Vedic multiplier using hybrid full adder (HFA) with 13 transistors (13T) had been designed successfully. The design was simulated using Synopsys Custom Tools in General Purpose Design Kit (GPDK) 90 nm CMOS technology process. In this design, four AND gates and two hybrid FA (HFAs) are cascaded together and each HFA is constructed from three modules. The cascaded module is arranged in the Vedic mathematics algorithm. This algorithm satisfied the requirement of a fast multiplication operation because of the vertical and crosswise architecture from the Urdhva Triyakbyam Sutra which reduced the number of partial products compared to the conventional multiplication algorithm. With the combination of hybrid full adder and Vedic mathematics, a new combination of multiplier method with low power and low delay is produced. Performance parameters such as power consumption and delay were compared to some of the existing designs. With a 1V voltage supply, the average power consumption of the proposed multiplier was found to be 22.96 µW and a delay of 161 ps

    Adiabatic Circuits for Power-Constrained Cryptographic Computations

    Get PDF
    This thesis tackles the need for ultra-low power operation in power-constrained cryptographic computations. An example of such an application could be smartcards. One of the techniques which has proven to have the potential of rendering ultra-low power operation is ‘Adiabatic Logic Technique’. However, the adiabatic circuits has associated challenges due to high energy dissipation of the Power-Clock Generator (PCG) and complexity of the multi-phase power-clocking scheme. Energy efficiency of the adiabatic system is often degraded due to the high energy dissipation of the PCG. In this thesis, nstep charging strategy using tank capacitors is considered for the power-clock generation and several design rules and trade-offs between the circuit complexity and energy efficiency of the PCG using n-step charging circuits have been proposed. Since pipelining is inherent in adiabatic logic design, careful selection of architecture is essential, as otherwise overhead in terms of area and energy due to synchronization buffers is induced specifically, in the case of adiabatic designs using 4-phase power-clocking scheme. Several architectures for the Montgomery multiplier using adiabatic logic technique are implemented and compared. An architecture which constitutes an appropriate trade-off between energy efficiency and throughput is proposed along with its methodology. Also, a strategy to reduce the overhead due to synchronization buffers is proposed. A modification in the Montgomery multiplication algorithm is proposed. Furthermore, a problem due to the application of power-clock gating in cascade stages of adiabatic logic is identified. The problem degrades the energy savings that would otherwise be obtained by the application of power-clock gating. A solution to this problem is proposed. Cryptographic implementations also present an obvious target for Power Analysis Attacks (PAA). There are several existing secure adiabatic logic designs which are proposed as a countermeasure against PAA. Shortcomings of the existing logic designs are identified, and two novel secure adiabatic logic designs are proposed as the countermeasures against PAA and improvement over the existing logic designs

    Predicting power scalability in a reconfigurable platform

    Get PDF
    This thesis focuses on the evolution of digital hardware systems. A reconfigurable platform is proposed and analysed based on thin-body, fully-depleted silicon-on-insulator Schottky-barrier transistors with metal gates and silicide source/drain (TBFDSBSOI). These offer the potential for simplified processing that will allow them to reach ultimate nanoscale gate dimensions. Technology CAD was used to show that the threshold voltage in TBFDSBSOI devices will be controllable by gate potentials that scale down with the channel dimensions while remaining within appropriate gate reliability limits. SPICE simulations determined that the magnitude of the threshold shift predicted by TCAD software would be sufficient to control the logic configuration of a simple, regular array of these TBFDSBSOI transistors as well as to constrain its overall subthreshold power growth. Using these devices, a reconfigurable platform is proposed based on a regular 6-input, 6-output NOR LUT block in which the logic and configuration functions of the array are mapped onto separate gates of the double-gate device. A new analytic model of the relationship between power (P), area (A) and performance (T) has been developed based on a simple VLSI complexity metric of the form ATσ = constant. As σ defines the performance “return” gained as a result of an increase in area, it also represents a bound on the architectural options available in power-scalable digital systems. This analytic model was used to determine that simple computing functions mapped to the reconfigurable platform will exhibit continuous power-area-performance scaling behavior. A number of simple arithmetic circuits were mapped to the array and their delay and subthreshold leakage analysed over a representative range of supply and threshold voltages, thus determining a worse-case range for the device/circuit-level parameters of the model. Finally, an architectural simulation was built in VHDL-AMS. The frequency scaling described by σ, combined with the device/circuit-level parameters predicts the overall power and performance scaling of parallel architectures mapped to the array

    Design and Analysis of Improved Domino Logic with Noise Tolerance and High Performance

    Get PDF
    The demands of upcoming computing, as well as the challenges of nanometer-era of VLSI design necessitate new digital logic techniques and styles that are at the same time high performance, energy efficient and robust to noise and variation. Dynamic CMOS logic gates are broadly used to design high performance circuits due to their high speed. Conversely, the vital demerit of dynamic logic style is its high noise sensitivity. The main reason for this is the sub-threshold leakage current flowing through the pull down network. With continuous technology scaling, this problem is getting more and more severe. In this thesis, a new noise tolerant dynamic CMOS circuit technique is proposed. In the proposed work, we have enhanced the behavior of the domino CMOS logic. This technique also gets benefit in terms of delay and power. This thesis describes the new low power, noise tolerant and high speed domino logic technique and presents a comparison result of this logic with previously reported schemes. Simulation results prove that, in 180 nm CMOS technology when we used this logic style to realize wide fan-in logic gates, it could achieve maximum level of noise robustness as compared to its basic counterpart. In addition, the logic also works efficiently with sequential circuits. The feasibility of this new technique is demonstrated by means of a real hardware, we have built a custom test-chip in the UMC 180 nm process technology with an ALU core, using the proposed domino logic style for each design block. In this thesis, we have also described the design and implementation of this chip. In addition to this, we have also presented initial power and delay performance comparisons between the circuit level simulated ALU and test-chip implemented in the proposed domino logic style. Finally we conclude that, the thesis contributes a very efficient logic style for wide fan-in gates, which is not only noise robust but also energy efficient and high speed

    Side Channel Information Leakage: Design and Implementation of Hardware Countermeasure

    Get PDF
    Deployment of Dynamic Differential Logics (DDL) appears to be a promising choice for providing resistance against leakage of side channel information. However, the resistance provided by these logics is too costly for widespread area-constrained applications. Implementation of a secure DDL-based countermeasure also requires a complex layout methodology for balancing the load at the differential outputs. This thesis, unlike previous logic level approaches, presents a novel exploitation of static and single-ended logic for designing the side channel countermeasure. The proposed technique is used in the implementation of a protected crypto core consisting of the AES “AddRoundKey” and “SubByte” transformation. The test chip including the protected and unprotected crypto cores is fabricated in 180nm CMOS technology. A correlation analysis on the unprotected core results in revealing the key at the output of the combinational networks and the registers. The quality of the measurements is further improved by introducing an enhanced data capturing method that inserts a minimum power consuming input as a reference vector. In comparison, no key-related information is leaked from the protected core even with an order of magnitude increase in the number of averaged traces. For the first time, fabricated chip results are used to validate a new logic level side channel countermeasure that offers lower area and reduced circuit design complexity compared to the DDL-based countermeasures. This thesis also provides insight into the side channel vulnerability of cryptosystems in sub-90nm CMOS technology nodes. In particular, data dependency of leakage power is analyzed. The number of traces to disclose the key is seen to decrease by 35% from 90nm to 45nm CMOS technology nodes. Analysis shows that the temperature dependency of the subthreshold leakage has an important role in increasing the ability to attack future nanoscale crypto cores. For the first time, the effectiveness of a circuit-based leakage reduction technique is examined for side channel security. This investigation demonstrates that high threshold voltage transistor assignment improves resistance against information leakage. The analysis initiated in this thesis is crucial for rolling out the guidelines of side channel security for the next generation of Cryptosystem.1 yea

    Challenges and solutions for large-scale integration of emerging technologies

    Get PDF
    Title from PDF of title page viewed June 15, 2021Dissertation advisor: Mostafizur RahmanVitaIncludes bibliographical references (pages 67-88)Thesis (Ph.D.)--School of Computing and Engineering and Department of Physics and Astronomy. University of Missouri--Kansas City, 2021The semiconductor revolution so far has been primarily driven by the ability to shrink devices and interconnects proportionally (Moore's law) while achieving incremental benefits. In sub-10nm nodes, device scaling reaches its fundamental limits, and the interconnect bottleneck is dominating power and performance. As the traditional way of CMOS scaling comes to an end, it is essential to find an alternative to continue this progress. However, an alternative technology for general-purpose computing remains elusive; currently pursued research directions face adoption challenges in all aspects from materials, devices to architecture, thermal management, integration, and manufacturing. Crosstalk Computing, a novel emerging computing technique, addresses some of the challenges and proposes a new paradigm for circuit design, scaling, and security. However, like other emerging technologies, Crosstalk Computing also faces challenges like designing large-scale circuits using existing CAD tools, scalability, evaluation and benchmarking of large-scale designs, experimentation through commercial foundry processes to compete/co-exist with CMOS for digital logic implementations. This dissertation addresses these issues by providing a methodology for circuit synthesis customizing the existing EDA tool flow, evaluating and benchmarking against state-of-the-art CMOS for large-scale circuits designed at 7nm from MCNC benchmark suits. This research also presents a study on Crosstalk technology's scalability aspects and shows how the circuits' properties evolve from 180nm to 7nm technology nodes. Some significant results are for primitive Crosstalk gate, designed in 180nm, 65nm, 32nm, and 7nm technology nodes, the average reduction in power is 42.5%, and an average improvement in performance is 34.5% comparing to CMOS for all mentioned nodes. For benchmarking large-scale circuits designed at 7nm, there are 48%, 57%, and 10% improvements against CMOS designs in terms of density, power, and performance, respectively. An experimental demonstration of a proof-of-concept prototype chip for Crosstalk Computing at TSMC 65nm technology is also presented in this dissertation, showing the Crosstalk gates can be realized using the existing manufacturing process. Additionally, the dissertation also provides a fine-grained thermal management approach for emerging technologies like transistor-level 3-D integration (Monolithic 3-D, Skybridge, SN3D), which holds the most promise beyond 2-D CMOS technology. However, such 3-D architectures within small form factors increase hotspots and demand careful consideration of thermal management at all integration levels. This research proposes a new direction for fine-grained thermal management approach for transistor-level 3-D integrated circuits through the insertion of architected heat extraction features that can be part of circuit design, and an integrated methodology for thermal evaluation of 3-D circuits combining different simulation outcomes at advanced nodes, which can be integrated to traditional CAD flow. The results show that the proposed heat extraction features effectively reduce the temperature from a heated location. Thus, the dissertation provides a new perspective to overcome the challenges faced by emerging technologies where the device, circuit, connectivity, heat management, and manufacturing are addressed in an integrated manner.Introduction and motivation -- Cross talk computing overview -- Logic simplification approach for Crosstalk circuit design -- Crostalk computing scalability study: from 180 nm to 7 nm -- Designing large*scale circuits in Crosstalk at 7 nm -- Comparison and benchmarking -- Experimental demonstration of Crosstalk computing -- Thermal management challenges and mitigation techniques for transistor-level- 3D integratio
    corecore