Search CORE

40 research outputs found

CAD Tool Design for NCL and MTNCL Asynchronous Circuits

Author: Pillai Vijay Mani
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2013
Field of study

This thesis presents an implementation of a method developed to readily convert Boolean designs into an ultra-low power asynchronous design methodology called MTNCL, which combines multi-threshold CMOS (MTCMOS) with NULL Convention Logic (NCL) systems. MTNCL provides the leakage power advantages of an all high-Vt implementation with a reasonable speed penalty compared to the all low-Vt implementation, and has negligible area overhead. The proposed tool utilizes industry-standard CAD tools. This research also presents an Automated Gate-Level Pipelining with Bit-Wise Completion (AGLPBW) method to maximize throughput of delay-insensitive full-word pipelined NCL circuits. These methods have been integrated into the Mentor Graphics and Synopsis CAD tools, using a C-program, which performs the majority of the computations, such that the method can be easily ported to other CAD tool suites. Both methods have been successfully tested on circuits, including a 4-bit × 4-bit multiplier, an unsigned Booth2 multiplier, and a 4-bit/8-operation arithmetic logic unit (ALU

ScholarWorks@UARK

UARK (University of Arkansas )

Design and Analysis of an Asynchronous Microcontroller

Author: Hinds Michael
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2016
Field of study

This dissertation presents the design of the most complex MTNCL circuit to date. A fully functional MTNCL MSP430 microcontroller is designed and benchmarked against an open source synchronous MSP430. The designs are compared in terms of area, active energy, and leakage energy. Techniques to reduce MTNCL pipeline activity and improve MTNCL register file area and power consumption are introduced. The results show the MTNCL design to have superior leakage power characteristics. The area and active energy comparisons highlight the need for better MTNCL logic synthesis techniques

ScholarWorks@UARK

UARK (University of Arkansas )

Design Methodologies and CAD Tools for Leakage Power Optimization in FPGAs

Author: Hassan Hassan
Publication venue: 'University of Waterloo'
Publication date: 04/07/2008
Field of study

The scaling of the CMOS technology has precipitated an exponential increase in both subthreshold and gate leakage currents in modern VLSI designs. Consequently, the contribution of leakage power to the total chip power dissipation for CMOS designs is increasing rapidly, which is estimated to be 40% for the current technology generations and is expected to exceed 50% by the 65nm CMOS technology. In FPGAs, the power dissipation problem is further aggravated when compared to ASIC designs because FPGA use more transistors per logic function when compared to ASIC designs. Consequently, solving the leakage power problem is pivotal to devising power-aware FPGAs in the nanometer regime. This thesis focuses on devising both architectural and CAD techniques for leakage mitigation in FPGAs. Several CAD and architectural modifications are proposed to reduce the impact of leakage power dissipation on modern FPGAs. Firstly, multi-threshold CMOS (MTCMOS) techniques are introduced to FPGAs to permanently turn OFF the unused resources of the FPGA, FPGAs are characterized with low utilization percentages that can reach 60%. Moreover, such architecture enables the dynamic shutting down of the FPGA idle parts, thus reducing the standby leakage significantly. Employing the MTCMOS technique in FPGAs requires several changes to the FPGA architecture, including the placement and routing of the sleep signals and the MTCMOS granularity. On the CAD level, the packing and placement stages are modified to allow the possibility of dynamically turning OFF the idle parts of the FPGA. A new activity generation algorithm is proposed and implemented that aims to identify the logic blocks in a design that exhibit similar idleness periods. Several criteria for the activity generation algorithm are used, including connectivity and logic function. Several versions of the activity generation algorithm are implemented to trade power savings with runtime. A newly developed packing algorithm uses the resulting activities to minimize leakage power dissipation by packing the logic blocks with similar or close activities together. By proposing an FPGA architecture that supports MTCMOS and developing a CAD tool that supports the new architecture, an average power savings of 30% is achieved for a 90nm CMOS process while incurring a speed penalty of less than 5%. This technique is further extended to provide a timing-sensitive version of the CAD flow to vary the speed penalty according to the criticality of each logic block. Secondly, a new technique for leakage power reduction in FPGAs based on the use of input dependency is developed. Both subthreshold and gate leakage power are heavily dependent on the input state. In FPGAs, the effect of input dependency is exacerbated due to the use of pass-transistor multiplexer logic, which can exhibit up to 50% variation in leakage power due to the input states. In this thesis, a new algorithm is proposed that uses bit permutation to reduce subthreshold and gate leakage power dissipation in FPGAs. The bit permutation algorithm provides an average leakage power reduction of 40% while having less than 2% impact on the performance and no penalty on the design area. Thirdly, an accurate probabilistic power model for FPGAs is developed to quantify the savings from the proposed leakage power reduction techniques. The proposed power model accounts for dynamic, short circuit, and leakage power (including both subthreshold and gate leakage power) dissipation in FPGAs. Moreover, the power model accounts for power due to glitches, which accounts for almost 20% of the dynamic power dissipation in FPGAs. The use of probabilities in the power model makes it more computationally efficient than the other FPGA power models in the literature that rely on long input sequence simulations. One of the main advantages of the proposed power model is the incorporation of spatial correlation while estimating the signal probability. Other probabilistic FPGA power models assume spatial independence among the design signals, thus overestimating the power calculations. In the proposed model, a probabilistic model is proposed for spatial correlations among the design signals. Moreover, a different variation is proposed that manages to capture most of the spatial correlations with minimum impact on runtime. Furthermore, the proposed power model accounts for the input dependency of subthreshold and gate leakage power dissipation. By comparing the proposed power model to HSpice simulation, the estimated power is within 8% and is closer to HSpice simulations than other probabilistic FPGA power models by an average of 20%

University of Waterloo's Institutional Repository

Optimal MTCMOS reactivation under power supply noise and performance constraints

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Crossref

Analysis of Parameter Tuning on Energy Efficiency in Asynchronous Circuits

Author: Roark Justin Thomas
Publication venue: ScholarWorks@UARK
Publication date: 01/08/2013
Field of study

Power and energy consumption are the primary concern of the digital integrated circuit (IC) industry. Asynchronous logic, in the past several years, has increased in popularity due to its low power nature. This thesis analyzes a collection of array multipliers with different parameters to compare two asynchronous design paradigms, NULL Convention Logic (NCL) and Multi-Threshold NULL Convention Logic (MTNCL). Several commercially available pieces of software and custom scripts are used to analyze the asynchronous circuits and their components to provide the energy consumption estimation on various parts of each circuit. The analysis of the software results revealed that MTNCL circuits are more energy efficient for any size provided the number of pipeline stages does not become too great. Otherwise NCL would consume less energy. A combinational logic gate count to register gate count ratio of 3 was given to help determine when an MTNCL circuit would have too many pipeline stages for circuits designed with IBM\u27s 130nm 8RF-DM design kit

ScholarWorks@UARK

UARK (University of Arkansas )

ULTRA ENERGY-EFFICIENT SUB-/NEAR-THRESHOLD COMPUTING: PLATFORM AND METHODOLOGY

Author: ZHAO WENFENG
Publication venue
Publication date: 03/01/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

A Micro Power Hardware Fabric for Embedded Computing

Author: Mehta Gayatri
Publication venue
Publication date: 25/09/2009
Field of study

Field Programmable Gate Arrays (FPGAs) mitigate many of the problemsencountered with the development of ASICs by offering flexibility, faster time-to-market, and amortized NRE costs, among other benefits. While FPGAs are increasingly being used for complex computational applications such as signal and image processing, networking, and cryptology, they are far from ideal for these tasks due to relatively high power consumption and silicon usage overheads compared to direct ASIC implementation. A reconfigurable device that exhibits ASIC-like power characteristics and FPGA-like costs and tool support is desirable to fill this void. In this research, a parameterized, reconfigurable fabric model named as domain specific fabric (DSF) is developed that exhibits ASIC-like power characteristics for Digital Signal Processing (DSP) style applications. Using this model, the impact of varying different design parameters on power and performance has been studied. Different optimization techniques like local search and simulated annealing are used to determine the appropriate interconnect for a specific set of applications. A design space exploration tool has been developed to automate and generate a tailored architectural instance of the fabric.The fabric has been synthesized on 160 nm cell-based ASIC fabrication process from OKI and 130 nm from IBM. A detailed power-performance analysis has been completed using signal and image processing benchmarks from the MediaBench benchmark suite and elsewhere with comparisons to other hardware and software implementations. The optimized fabric implemented using the 130 nm process yields energy within 3X of a direct ASIC implementation, 330X better than a Virtex-II Pro FPGA and 2016X better than an Intel XScale processor

D-Scholarship@Pitt

Power Management for Deep Submicron Microprocessors

Author: Youssef Ahmed
Publication venue: 'University of Waterloo'
Publication date: 07/07/2008
Field of study

As VLSI technology scales, the enhanced performance of smaller transistors comes at the expense of increased power consumption. In addition to the dynamic power consumed by the circuits there is a tremendous increase in the leakage power consumption which is further exacerbated by the increasing operating temperatures. The total power consumption of modern processors is distributed between the processor core, memory and interconnects. In this research two novel power management techniques are presented targeting the functional units and the global interconnects. First, since most leakage control schemes for processor functional units are based on circuit level techniques, such schemes inherently lack information about the operational profile of higher-level components of the system. This is a barrier to the pivotal task of predicting standby time. Without this prediction, it is extremely difficult to assess the value of any leakage control scheme. Consequently, a methodology that can predict the standby time is highly beneficial in bridging the gap between the information available at the application level and the circuit implementations. In this work, a novel Dynamic Sleep Signal Generator (DSSG) is presented. It utilizes the usage traces extracted from cycle accurate simulations of benchmark programs to predict the long standby periods associated with the various functional units. The DSSG bases its decisions on the current and previous standby state of the functional units to accurately predict the length of the next standby period. The DSSG presents an alternative to Static Sleep Signal Generation (SSSG) based on static counters that trigger the generation of the sleep signal when the functional units idle for a prespecified number of cycles. The test results of the DSSG are obtained by the use of a modified RISC superscalar processor, implemented by SimpleScalar, the most widely accepted open source vehicle for architectural analysis. In addition, the results are further verified by a Simultaneous Multithreading simulator implemented by SMTSIM. Leakage saving results shows an increase of up to 146% in leakage savings using the DSSG versus the SSSG, with an accuracy of 60-80% for predicting long standby periods. Second, chip designers in their effort to achieve timing closure, have focused on achieving the lowest possible interconnect delay through buffer insertion and routing techniques. This approach, though, taxes the power budget of modern ICs, especially those intended for wireless applications. Also, in order to achieve more functionality, die sizes are constantly increasing. This trend is leading to an increase in the average global interconnect length which, in turn, requires more buffers to achieve timing closure. Unconstrained buffering is bound to adversely affect the overall chip performance, if the power consumption is added as a major performance metric. In fact, the number of global interconnect buffers is expected to reach hundreds of thousands to achieve an appropriate timing closure. To mitigate the impact of the power consumed by the interconnect buffers, a power-efficient multi-pin routing technique is proposed in this research. The problem is based on a graph representation of the routing possibilities, including buffer insertion and identifying the least power path between the interconnect source and set of sinks. The novel multi-pin routing technique is tested by applying it to the ISPD and IBM benchmarks to verify the accuracy, complexity, and solution quality. Results obtained indicate that an average power savings as high as 32% for the 130-nm technology is achieved with no impact on the maximum chip frequency

University of Waterloo's Institutional Repository

Ultra-Low Power Circuit Design for Cubic-Millimeter Wireless Sensor Platform.

Author: Lee Yoonmyung
Publication venue
Publication date
Field of study

Modern daily life is surrounded by smaller and smaller computing devices. As Bell’s Law predicts, the research community is now looking at tiny computing platforms and mm3-scale sensor systems are drawing an increasing amount of attention since they can create a whole new computing environment. Designing mm3-scale sensor nodes raises various circuit and system level challenges and we have addressed and proposed novel solutions for many of these challenges to create the first complete 1.0mm3 sensor system including a commercial microprocessor. We demonstrate a 1.0mm3 form factor sensor whose modular die-stacked structure allows maximum volume utilization. Low power I2C communication enables inter-layer serial communication without losing compatibility to standard I2C communication protocol. A dual microprocessor enables concurrent computation for the sensor node control and measurement data processing. A multi-modal power management unit allowed energy harvesting from various harvesting sources. An optical communication scheme is provided for initial programming, synchronization and re-programming after recovery from battery discharge. Standby power reduction techniques are investigated and a super cut-off power gating scheme with an ultra-low power charge pump reduces the standby power of logic circuits by 2-19× and memory by 30%. Different approaches for designing low-power memory for mm3-scale sensor nodes are also presented in this work. A dual threshold voltage gain cell eDRAM design achieves the lowest eDRAM retention power and a 7T SRAM design based on hetero-junction tunneling transistors reduces the standby power of SRAM by 9-19× with only 15% area overhead. We have paid special attention to the timer for the mm3-scale sensor systems and propose a multi-stage gate-leakage-based timer to limit the standard deviation of the error in hourly measurement to 196ms and a temperature compensation scheme reduces temperature dependency to 31ppm/°C. These techniques for designing ultra-low power circuits for a mm3-scale sensor enable implementation of a 1.0mm3 sensor node, which can be used as a skeleton for future micro-sensor systems in variety of applications. These microsystems imply the continuation of the Bell’s Law, which also predicts the massive deployment of mm3-scale computing systems and emergence of even smaller and more powerful computing systems in the near future.Ph.D.Electrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91438/1/sori_1.pd

Deep Blue Documents at the University of Michigan