164 research outputs found
Evaluation and Analysis of NULL Convention Logic Circuits
Integrated circuit (IC) designers face many challenges in utilizing state-of-the-art technology nodes, such as the increased effects of process variation on timing analysis and heterogeneous multi-die architectures that span across multiple technologies while simultaneously increasing performance and decreasing power consumption. These challenges provide opportunity for utilization of asynchronous design paradigms due to their inherent flexibility and robustness.
While NULL Convention Logic (NCL) has been implemented in a variety of applications, current literature does not fully encompass the intricacies of NCL power performance across a variety of applications, technology nodes, circuit scale, and voltage scaling, thereby preventing further adoption and utilization of this design paradigm.
This dissertation evaluates the nominal dynamic energy, voltage-scaled dynamic energy, and static power consumption of NCL across variations in circuit type, circuit scale, and technology node, including 130 nm, 90 nm, and 45 nm processes. These results are compared with synchronous counterparts and analyzed for a range of trends in order to identify and quantify advantages and disadvantages of NCL across a variety of applications. By providing an evaluation of a broad range of circuits and characteristics, an IC designer may effectively predict the advantages or disadvantages of an NCL implementation for their application
High-Performance Silicon Nanowire Electronics
This thesis explores 10-nm wide Si nanowire (SiNW) field-effect transistors (FETs) for logic applications via the fabrication and testing of SiNW-based ring oscillators. Both SiNW surface treatments and dielectric annealing are reported for producing SiNW FETs that exhibit high performance in terms of large on/off-state current ratio (~108), low drain-induced barrier lowering (~30 mV), high carrier mobilities (~269 cm2/V•s), and low subthreshold swing (~80 mV/dec). The performance of inverter and ring-oscillator circuits fabricated from these nanowire FETs is explored as well. The inverter demonstrates the highest voltage gain (~148) reported for a SiNW-based NOT gate, and the ring oscillator exhibits near rail-to-rail oscillation centered at 13.4 MHz. The static and dynamic characteristics of these NW devices indicate that these SiNW-based FET circuits are excellent candidates for various high-performance nanoelectronic applications.
A set of novel charge-trap non-volatile memory devices based on high-performance SiNW FETs are well investigated. These memory devices integrate Fe2O3 quantum dots (FeO QDs) as charge storage elements. A template-assisted assembly technique is used to align FeO QDs into a close-packed, ordered matrix within the trenches that separate highly aligned SiNWs, and thus store injected charges. A Fowler-Nordheim tunneling mechanism describes both the program and erase operations. The memory prototype demonstrates promising characteristics in terms of large threshold voltage shift (~1.3 V) and long data retention time (~3 × 106 s), and also allows for key components to be systematically varied. For example, varying the size of the QDs indicates that larger diameter QDs exhibit a larger memory window, suggesting the QD charging energy plays an important role in the carrier transport. The device temperature characteristics reveal an optimal window for device performance between 275K and 350K.
The flexibility of integrating the charge-trap memory devices with the SiNW logic devices offers a low-cost embedded non-volatile memory solution. A building block for a SiNW-based field-programmable gate array (FPGA) is proposed in the future work.</p
Recommended from our members
Network-on-Chip Synchronization
Technology scaling has enabled the number of cores within a System on Chip (SoC) to increase significantly. Globally Asynchronous Locally Synchronous (GALS) systems using Dynamic Voltage and Frequency Scaling (DVFS) operate each of these cores on distinct and dynamic clock domains. The main communication method between these cores is increasingly more likely to be a Network-on-Chip (NoC). Typically, the interfaces between these clock domains experience multi-cycle synchronization latencies due to their use of “brute-force” synchronizers. This dissertation aims to improve the performance of NoCs and thereby SoCs as a whole by reducing this synchronization latency.
First, a survey of NoC improvement techniques is presented. One such improvement technique: a multi-layer NoC, has been successfully simulated. Given how one of the most commonly used techniques is DVFS, a thorough analysis and simulation of brute-force synchronizer circuits in both current and future process technologies is presented. Unfortunately, a multi-cycle latency is unavoidable when using brute-force synchronizers, so predictive synchronizers which require only a single cycle of latency have been proposed.
To demonstrate the impact of these predictive synchronizer circuits at a high level, multi-core system simulations incorporating these circuits have been completed. Multiple forms of GALS NoC configurations have been simulated, including multi-synchronous, NoC-synchronous, and single-synchronizer. Speedup on the SPLASH benchmark suite was measured to directly quantify the performance benefit of predictive synchronizers in a full system. Additionally, Mean Time Between Failures (MTBF) has been calculated for each NoC synchronizer configuration to determine the reliability benefit possible when using predictive synchronizers
Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency
abstract: Static CMOS logic has remained the dominant design style of digital systems for
more than four decades due to its robustness and near zero standby current. Static
CMOS logic circuits consist of a network of combinational logic cells and clocked sequential
elements, such as latches and flip-flops that are used for sequencing computations
over time. The majority of the digital design techniques to reduce power, area, and
leakage over the past four decades have focused almost entirely on optimizing the
combinational logic. This work explores alternate architectures for the flip-flops for
improving the overall circuit performance, power and area. It consists of three main
sections.
First, is the design of a multi-input configurable flip-flop structure with embedded
logic. A conventional D-type flip-flop may be viewed as realizing an identity function,
in which the output is simply the value of the input sampled at the clock edge. In
contrast, the proposed multi-input flip-flop, named PNAND, can be configured to
realize one of a family of Boolean functions called threshold functions. In essence,
the PNAND is a circuit implementation of the well-known binary perceptron. Unlike
other reconfigurable circuits, a PNAND can be configured by simply changing the
assignment of signals to its inputs. Using a standard cell library of such gates, a technology
mapping algorithm can be applied to transform a given netlist into one with
an optimal mixture of conventional logic gates and threshold gates. This approach
was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier
in 65nm LP technology. Simulation and chip measurements show more than 30%
improvement in dynamic power and more than 20% reduction in core area.
The functional yield of the PNAND reduces with geometry and voltage scaling.
The second part of this research investigates the use of two mechanisms to improve
the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM
devices for low voltage operation.
The third part of this research focused on the design of flip-flops with non-volatile
storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated
with both conventional D-flipflop and the PNAND circuits to implement non-volatile
logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of
system locally when a power interruption occurs. However, manufacturing variations
in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading
to an overly pessimistic design and consequently, higher energy consumption. A
detailed analysis of the design trade-offs in the driver circuitry for performing backup
and restore, and a novel method to design the energy optimal driver for a given yield is
presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented,
in which the backup time is determined on a per-chip basis, resulting in minimizing
the energy wastage and satisfying the yield constraint. To achieve a yield of 98%,
the conventional approach would have to expend nearly 5X more energy than the
minimum required, whereas the proposed tunable approach expends only 26% more
energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are
designed with the same backup and restore circuitry in 65nm technology. The embedded
logic in NV-TLFF compensates performance overhead of NVL. This leads to the
possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and-
accumulate (MAC) unit is designed to demonstrate the performance benefits of the
proposed architecture. Based on the results of HSPICE simulations, the MAC circuit
with the proposed NV-TLFF cells is shown to consume at least 20% less power and
area as compared to the circuit designed with conventional DFFs, without sacrificing
any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Custom versus Cell-Based ASIC Design for Many-Channel Correlators
While ASICs are efficient in terms of area utilization, performance, and power dissipation, ASIC design requires significant development resources. We compare two approaches to implementing ASIC correlators for interferometric imagers and spectrometers: The first approach, custom design, gives very high performance and area utilization, but is complex and time consuming. The second approach, cell-based design, reduces design time, but leads to lower performance and area utilization. In our evaluation, we consider two different correlator architectures: Autocorrelators for spectrometry, and cross-correlators for synthetic aperture imaging. Based on both 65-nm CMOS and 28-nm FD-SOI process technologies, our results show that for implementations for a limited number of channels, the cell-based approach may prove useful since it offers relatively short development time while still providing acceptable area utilization and performance. For larger designs, however, the area overhead of cell-based design becomes a major concern, especially for autocorrelator architectures
CMOS Data Converters for Closed-Loop mmWave Transmitters
With the increased amount of data consumed in mobile communication systems, new solutions for the infrastructure are needed. Massive multiple input multiple output (MIMO) is seen as a key enabler for providing this increased capacity. With the use of a large number of transmitters, the cost of each transmitter must be low. Closed-loop transmitters, featuring high-speed data converters is a promising option for achieving this reduced unit cost.In this thesis, both digital-to-analog (D/A) and analog-to-digital (A/D) converters suitable for wideband operation in millimeter wave (mmWave) massive MIMO transmitters are demonstrated. A 2
76 bit radio frequency digital-to-analog converter (RF-DAC)-based in-phase quadrature (IQ) modulator is demonstrated as a compact building block, that to a large extent realizes the transmit path in a closed-loop mmWave transmitter. The evaluation of an successive-approximation register (SAR) analog-to-digital converter (ADC) is also presented in this thesis. Methods for connecting simulated and measured performance has been studied in order to achieve a better understanding about the alternating comparator topology.These contributions show great potential for enabling closed-loop mmWave transmitters for massive MIMO transmitter realizations
Recommended from our members
Architectures and Circuits Leveraging Injection-Locked Oscillators for Ultra-Low Voltage Clock Synthesis and Reference-less Receivers for Dense Chip-to-Chip Communications
High performance computing is critical for the needs of scientific discovery and economic competitiveness. An extreme-scale computing system at 1000x the performance of today’s petaflop machines will exhibit massive parallelism on multiple vertical fronts, from thousands of computational units on a single processor to thousands of processors in a single data center. To facilitate such a massively-parallel extreme-scale computing, a key challenge is power. The challenge is not power associated with base computation but rather the problem of transporting data from one chip to another at high enough rates. This thesis presents architectures and techniques to achieve low power and area footprint while achieving high data rates in a dense very-short reach (VSR) chip-to-chip (C2C) communication network. High-speed serial communication operating at ultra-low supplies improves the energy-efficiency and lowers the power envelop of a system doing an exaflop of loops. One focus area of this thesis is clock synthesis for such energy-efficient interconnect applications operating at high speeds and ultra-low supplies. A sub-integer clockfrequency synthesizer is presented that incorporates a multi-phase injection-locked ring-oscillator-based prescaler for operation at an ultra-low supply voltage of 0.5V, phase-switching based programmable division for sub-integer clock-frequency synthesis, and automatic calibration to ensure injection lock. A record speed of 9GHz has been demonstrated at 0.5V in 45nm SOI CMOS. It consumes 3.5mW of power at 9.12GHz and 0.052 of area, while showing an output phase noise of -100dBc/Hz at 1MHz offset and RMS jitter of 325fs; it achieves a net of -186.5 in a 45-nm SOI CMOS process. This thesis also describes a receiver with a reference-less clocking architecture for high-density VSR-C2C links. This architecture simplifies clock-tree planning in dense extreme-scaling computing environments and has high-bandwidth CDR to enable SSC for suppressing EMI and to mitigate TX jitter requirements. It features clock-less DFE and a high-bandwidth CDR based on master-slave ILOs for phase generation/rotation. The RX is implemented in 14nm CMOS and characterized at 19Gb/s. It is 1.5x faster that previous reference-less embedded-oscillator based designs with greater than 100MHz jitter tolerance bandwidth and recovers error-free data over VSR-C2C channels. It achieves a power-efficiency of 2.9pJ/b while recovering error-free data (BER 200MHz and the INL of the ILO-based phase-rotator (32- Steps/UI) is <1-LSB. Lastly, this thesis develops a time-domain delay-based modeling of injection locking to describe injection-locking phenomena in nonharmonic oscillators. The model is used to predict the locking bandwidth, and the locking dynamics of the locked oscillator. The model predictions are verified against simulations and measurements of a four-stage differential ring oscillator. The model is further used to predict the injection-locking behavior of a single-ended CMOS inverter based ring oscillator, the lock range of a multi-phase injection-locked ring-oscillator-based prescaler, as well as the dynamics of tracking injection phase perturbations in injection-locked masterslave oscillators; demonstrating its versatility in application to any nonharmonic oscillator
- …