1,763 research outputs found
Approximate Early Output Asynchronous Adders Based on Dual-Rail Data Encoding and 4-Phase Return-to-Zero and Return-to-One Handshaking
Approximate computing is emerging as an alternative to accurate computing due
to its potential for realizing digital circuits and systems with low power
dissipation, less critical path delay, and less area occupancy for an
acceptable trade-off in the accuracy of results. In the domain of computer
arithmetic, several approximate adders and multipliers have been designed and
their potential have been showcased versus accurate adders and multipliers for
practical digital signal processing applications. Nevertheless, in the existing
literature, almost all the approximate adders and multipliers reported
correspond to the synchronous design method. In this work, we consider robust
asynchronous i.e. quasi-delay-insensitive realizations of approximate adders by
employing delay-insensitive codes for data representation and processing, and
the 4-phase handshake protocols for data communication. The 4-phase handshake
protocols used are the return-to-zero and the return-to-one protocols.
Specifically, we consider the implementations of 32-bit approximate adders
based on the return-to-zero and return-to-one handshake protocols by adopting
the delay-insensitive dual-rail code for data encoding. We consider a range of
approximations varying from 4-bits to 20-bits for the least significant
positions of the accurate 32-bit asynchronous adder. The asynchronous adders
correspond to early output (i.e. early reset) type, which are based on the
well-known ripple carry adder architecture. The experimental results show that
approximate asynchronous adders achieve reductions in the design metrics such
as latency, cycle time, average power dissipation, and silicon area compared to
the accurate asynchronous adders. Further, the reductions in the design metrics
are greater for the return-to-one protocol compared to the return-to-zero
protocol. The design metrics were estimated using a 32/28nm CMOS technology.Comment: arXiv admin note: text overlap with arXiv:1711.0233
Custom Cell Placement Automation for Asynchronous VLSI
Asynchronous Very-Large-Scale-Integration (VLSI) integrated circuits have demonstrated many advantages over their synchronous counterparts, including low power consumption, elastic pipelining, robustness against manufacturing and temperature variations, etc. However, the lack of dedicated electronic design automation (EDA) tools, especially physical layout automation tools, largely limits the adoption of asynchronous circuits. Existing commercial placement tools are optimized for synchronous circuits, and require a standard cell library provided by semiconductor foundries to complete the physical design. The physical layouts of cells in this library have the same height to simplify the placement problem and the power distribution network. Although the standard cell methodology also works for asynchronous designs, the performance is inferior compared with counterparts designed using the full-custom design methodology. To tackle this challenge, we propose a gridded cell layout methodology for asynchronous circuits, in which the cell height and cell width can be any integer multiple of two grid values. The gridded cell approach combines the shape regularity of standard cells with the size flexibility of full-custom layouts. Therefore, this approach can achieve a better space utilization ratio and lower wire length for asynchronous designs. Experiments have shown that the gridded cell placement approach reduces area without impacting the routability. We have also used this placer to tape out a chip in a 65nm process technology, demonstrating that our placer generates design-rule clean results
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
Performance analysis and design optimization of parallel-type slew-rate enhancers for switched-capacitor applications
The design of single-stage OTAs for accurate switched-capacitor circuits involves challenging trade-offs between speed and power consumption. The addition of a Slew-Rate Enhancer (SRE) circuit placed in parallel to the main OTA (parallel-type SRE) constitutes a viable solution to reduce the settling time, at the cost of low-power overhead and no modifications of the main OTA. In this work, a practical analytical model has been developed to predict the settling time reduction achievable with OTA/SRE systems and to show the effect of the various design parameters. The model has been applied to a real case, consisting of the combination of a standard folded-cascode OTA with an existing parallel-type SRE solution. Simulations performed on a circuit designed with a commercial 180-nm CMOS technology revealed that the actual settling-time reduction was significantly smaller than predicted by the model. This discrepancy was explained by taking into account the internal delays of the SRE, which is exacerbated when a high output current gain is combined with high power efficiency. To overcome this problem, we propose a simple modification of the original SRE circuit, consisting in the addition of a single capacitor which temporarily boosts the OTA/SRE currents reducing the internal turn-on delay. With the proposed approach a settling-time reduction of 57% has been demonstrated with an SRE that introduces only a 10% power-overhead with respect of the single OTA solution. The robustness of the results have been validated by means of Monte-Carlo simulations
Design of sigma-delta modulators for analog-to-digital conversion intensively using passive circuits
This thesis presents the analysis, design implementation and experimental evaluation of passiveactive discrete-time and continuous-time Sigma-Delta (ΣΔ) modulators (ΣΔMs) analog-todigital converters (ADCs).
Two prototype circuits were manufactured. The first one, a discrete-time 2nd-order ΣΔM, was designed in a 130 nm CMOS technology. This prototype confirmed the validity of the ultra incomplete settling (UIS) concept used for implementing the passive integrators. This circuit, clocked at 100 MHz and consuming 298 μW, achieves DR/SNR/SNDR of 78.2/73.9/72.8 dB, respectively, for a signal bandwidth of 300 kHz. This results in a Walden FoMW of 139.3 fJ/conv.-step and Schreier FoMS of 168 dB.
The final prototype circuit is a highly area and power efficient ΣΔM using a combination of a cascaded topology, a continuous-time RC loop filter and switched-capacitor feedback paths. The modulator requires only two low gain stages that are based on differential pairs. A systematic design methodology based on genetic algorithm, was used, which allowed decreasing the circuit’s sensitivity to the circuit components’ variations. This continuous-time, 2-1 MASH ΣΔM has been designed in a 65 nm CMOS technology and it occupies an area of just 0.027 mm2. Measurement results show that this modulator achieves a peak SNR/SNDR of 76/72.2 dB and DR of 77dB for an input signal bandwidth of 10 MHz, while dissipating 1.57 mW from a 1 V power supply voltage. The ΣΔM achieves a Walden FoMW of 23.6 fJ/level and a Schreier FoMS of 175 dB. The innovations proposed in this circuit result, both, in the reduction of the power consumption and of the chip size. To the best of the author’s knowledge the circuit achieves the lowest Walden FOMW for ΣΔMs operating at signal bandwidth from 5 MHz to 50 MHz reported to date
- …