143 research outputs found
Implementation of arithmetic primitives using truly deep submicron technology (TDST)
The invention of the transistor in 1947 at Bell Laboratories revolutionised the electronics industry and created a powerful platform for emergence of new industries. The quest to increase the number of devices per chip over the last four decades has resulted in rapid transition from Small-Scale-Integration (SSI) and Large-Scale-lntegration (LSI), through to the Very-Large-Scale-Integration (VLSI) technologies, incorporating approximately 10 to 100 million devices per chip. The next phase in this evolution is the Ultra-Large-Scale-Integration (ULSI) aiming to realise new application domains currently not accessible to CMOS technology. Although technology is continuously evolving to produce smaller systems with minimised power dissipation, the IC industry is facing major challenges due to constraints on power density (W/cm2) and high dynamic (operating) and static (standby) power dissipation. Mobile multimedia communication and optical based technologies have rapidly become a significant area of research and development challenging a variety of technological fronts. The future emergence or 4G (4th Generation) wireless communications networks is further driving this development, requiring increasing levels of media rich content. The processing requirements for capture, conversion, compression, decompression, enhancement and display of higher quality multimedia, place heavy demands on current ULSI systems. This is also apparent for mobile applications and intelligent optical networks where silicon chip area and power dissipation become primary considerations. In addition to the requirements for very low power, compact size and real-time processing, the rapidly evolving nature of telecommunication networks means that flexible soft programmable systems capable of adaptation to support a number of different standards and/or roles become highly desirable. In order to fully realise the capabilities promised by the 4G and supporting intelligent networks, new enabling technologies arc needed to facilitate the next generation of personal communications devices. Most of the current solutions to meet these challenges are based on various implementations of conventional architectures. For decades, silicon has been the main platform of computing, however it is slow, bulky, runs too hot, and is too expensive. Thus, new approaches to architectures, driving multimedia and future telecommunications systems, are needed in order to extend the life cycle of silicon technology. The emergence of Truly Deep Submicron Technology (TDST) and related 3-D interconnection technologies have provided potential alternatives from conventional architectures to 3-D system solutions, through integration of IDST, Vertical Software Mapping and Intelligent Interconnect Technology (IIT). The concept of Soft-Chip Technology (SCT) entails integration of Soft• Processing Circuits with Soft-Configurable Circuits . This concept can effectively manipulate hardware primitives through vertical integration of control and data. Thus the notion of 3-D Soft-Chip emerges as a new design algorithm for content-rich multimedia, telecommunication and intelligent networking system applications. 3•D architectures (design algorithms used suitable for 3-D soft-chip technology), are driven by three factors. The first is development of new device technology (TDST) that can support new architectures with complexities of 100M to 1000M devices. The second is development of advanced wafer bonding techniques such as Indium bump and the more futuristic optical interconnects for 3-D soft-chip mapping. The third is related to improving the performance of silicon CMOS systems as devices continue to scale down in dimensions. One of the fundamental building blocks of any computer system is the arithmetic component. Optimum performance of the system is determined by the efficiency of each individual component, as well as the network as a whole entity. Development of configurable arithmetic primitives is the fundamental focus in 3-D architecture design where functionality can be implemented through soft configurable hardware elements. Therefore the ability to improve the performance capability of a system is of crucial importance for a successful design. Important factors that predict the efficiency of such arithmetic components are: • The propagation delay of the circuit, caused by the gate, diffusion and wire capacitances within !he circuit, minimised through transistor sizing. and • Power dissipation, which is generally based on node transition activity. [2] Although optimum performance of 3-D soft-chip systems is primarily established by the choice of basic primitives such as adders and multipliers, the interconnecting network also has significant degree of influence on !he efficiency of the system. 3-D superposition of devices can decrease interconnect delays by up to 60% compared to a similar planar architecture. This research is based on development and implementation of configurable arithmetic primitives, suitable to the 3-D architecture, and has these foci: • To develop a variety of arithmetic components such as adders and multipliers with particular emphasis on minimum area and compatible with 3-D soft-chip design paradigm. • To explore implementation of configurable distributed primitives for arithmetic processing. This entails optimisation of basic primitives, and using them as part of array processing. In this research the detailed designs of configurable arithmetic primitives are implemented using TDST O.l3µm (130nm) technology, utilising CAD software such as Mentor Graphics and Cadence in Custom design mode, carrying through design, simulation and verification steps
Asynchronous design of a multi-dimensional logarithmic number system processor for digital hearing instruments.
This thesis presents an asynchronous Multi-Dimensional Logarithmic Number System (MDLNS) processor that exhibits very low power dissipation. The target application is for a hearing instrument DSP. The MDLNS is a newly developed number system that has the advantage of reducing hardware complexity compared to the classical Logarithmic Number System (LNS). A synchronous implementation of a 2-digit 2DLNS filterbank, using the MDLNS to construct a FIR filterbank, has successfully proved that this novel number representation can benefit this digital hearing instrument application in the requirement of small size and low power. In this thesis we demonstrate that the combination of using the MDLNS, along with an asynchronous design methodology, produces impressive power savings compared to the previous synchronous design. A 4-phase bundled-data full-handshaking protocol is applied to the asynchronous control design. We adopt the Differential Cascade Voltage Switch Logic (DCVSL) circuit family for the design of the computation cells in this asynchronous MDLNS processor. Besides the asynchronous design methodology, we also use finite ring calculations to reduce adder bit-width to provide improvements compared to the previous MDLNS filterbank architecture. Spectre power simulation results from simulations of this asynchronous MDLNS processor demonstrate that over 70 percent power savings have been achieved compared to the synchronous design. This full-custom asynchronous MDLNS processor has been submitted for fabrication in the TSMC 0.18mum CMOS technology. A further contribution in this thesis is the development of a novel synchronizing method of design for testability (DfT), which is offered as a possible solution for asynchronous DfT methods.Dept. of Electrical and Computer Engineering. Paper copy at Leddy Library: Theses & Major Papers - Basement, West Bldg. / Call Number: Thesis2004 .W85. Source: Masters Abstracts International, Volume: 43-01, page: 0288. Advisers: G. A. Jullien; W. C. Miller. Thesis (M.A.Sc.)--University of Windsor (Canada), 2004
Recommended from our members
Design Techniques for High-Performance SAR A/D Converters
The design of electronics needs to account for the non-ideal characteristics of the device technologies used to realize practical circuits. This is particularly important in mixed analog-digital design since the best device technologies are very different for digital compared to analog circuits. One solution for this problem is to use a calibration correction approach to remove the errors introduced by devices, but this adds complexity and power dissipation, as well as reducing operation speed, and so must be optimised. This thesis addresses such an approach to improve the performance of certain types of analog-to-digital converter (ADC) used in advanced telecommunications, where speed, accuracy and power dissipation currently limit applications. The thesis specifically focuses on the design of compensation circuits for use in successive approximation register (SAR) ADCs.
ADCs are crucial building blocks in communication systems, in general, and for mobile networks, in particular. The recently launched fifth generation of mobile networks (5G) has required new ADC circuit techniques to meet the higher speed and lower power dissipation requirements for 5G technology. The SAR has become one of the most favoured architectures for designing high-performance ADCs, but the successive nature of the circuit operation makes it difficult to reach ∼GS/s sampling rates at reasonable power consumption.
Here, two calibration techniques for high-performance SAR ADCs are presented. The first uses an on-chip stochastic-based mismatch calibration technique that is able to accurately compute and compensate for the mismatch of a capacitive DAC in a SAR ADC. The stochastic nature of the proposed calibration method enables determination of the mismatch of the CAPDAC with a resolution much better than that of the DAC. This allows the unit capacitor to scale down to as low as 280aF for a 9-bit DAC. Since the CAP-DAC causes a large part of the overall dynamic power consumption and directly determines both the sizes of the driving and sampling switches and the size of the input capacitive load of the ADC and the kT/C noise power, a small CAP-DAC helps the power efficiency. To validate the proposed calibration idea, a 10-bit asynchronous SAR ADC was fabricated in 28-nm CMOS. Measurement results show that the proposed stochastic calibration improves the ADC’s SFDR and SNDR by 14.9 dB, 11.5 dB, respectively. After calibration, the fabricated SAR ADC achieves an ENOB of 9.14 bit at a sampling rate of 85 MS/s, resulting in a Walden FoM of 10.9 fJ/c-s.
The second calibration technique is a timing-skew calibration for a time-interleaved (TI) SAR ADC that calibrates/computes the inter-channel timing and offset mismatch simultaneously. Simulation results show the effectiveness of this calibration method. When used together, the proposed mismatch calibration technique and the timing-skew
calibration technique enables a TI SAR ADC to be designed that can achieve a sampling rate of ∼GS/s with 10-bit resolution and a power consumption as low as ∼10mW; specifications that satisfy the requirements of 5G technology
Nimbus B PCM telemetry subsystem, volume 1, sections 1 through 8 Final report,
Design, manufacture, and testing of Nimbus B PCM telemetry subsystem, bench test equipment, and ground station
Recommended from our members
Development of a Portable CMOS Time-Domain Fluorescence Lifetime Imager
Modern laboratory equipments to measure the excited-state lifetime of fluorophores usually include an expensive picosecond pulsed-laser excitation source, a fragile photomultiplier tube, and a large instrument body for optics. A portable and robust device to make fluorescence lifetime measurement in nanosecond scale is of great attraction for chemists and biologists.
This dissertation reports the development of a portable LED time-domain fluorimeter from an all-solid-state discrete-component prototype to its advanced CMOS integrated circuit implementation. The motivation of the research is to develop a multiplexed fluorimeter for point-of-care diagnosis. Instruments developed by this novel method have higher fill factor, are more portable, and are fabricated at lower cost
Synchronous meteorological satellite system description document, volume 3
The structural design, analysis, and mechanical integration of the synchronous meteorological satellite system are presented. The subjects discussed are: (1) spacecraft configuration, (2) structural design, (3) static load tests, (4) fixed base sinusoidal vibration survey, (5) flight configuration sinusoidal vibration tests, (6) spacecraft acoustic test, and (7) separation and shock test. Descriptions of the auxiliary propulsion subsystem, the apogee boost motor, communications system, and thermal control subsystem are included
Null convention logic circuits for asynchronous computer architecture
For most of its history, computer architecture has been able to benefit from a rapid scaling in semiconductor technology, resulting in continuous improvements to CPU design. During that period, synchronous logic has dominated because of its inherent ease of design and abundant tools. However, with the scaling of semiconductor processes into deep sub-micron and then to nano-scale dimensions, computer architecture is hitting a number of roadblocks such as high power and increased process variability. Asynchronous techniques can potentially offer many advantages compared to conventional synchronous design, including average case vs. worse case performance, robustness in the face of process and operating point variability and the ready availability of high performance, fine grained pipeline architectures. Of the many alternative approaches to asynchronous design, Null Convention Logic (NCL) has the advantage that its quasi delay-insensitive behavior makes it relatively easy to set up complex circuits without the need for exhaustive timing analysis. This thesis examines the characteristics of an NCL based asynchronous RISC-V CPU and analyses the problems with applying NCL to CPU design. While a number of university and industry groups have previously developed small 8-bit microprocessor architectures using NCL techniques, it is still unclear whether these offer any real advantages over conventional synchronous design. A key objective of this work has been to analyse the impact of larger word widths and more complex architectures on NCL CPU implementations. The research commenced by re-evaluating existing techniques for implementing NCL on programmable devices such as FPGAs. The little work that has been undertaken previously on FPGA implementations of asynchronous logic has been inconclusive and seems to indicate that asynchronous systems cannot be easily implemented in these devices. However, most of this work related to an alternative technique called bundled data, which is not well suited to FPGA implementation because of the difficulty in controlling and matching delays in a 'bundle' of signals. On the other hand, this thesis clearly shows that such applications are not only possible with NCL, but there are some distinct advantages in being able to prototype complex asynchronous systems in a field-programmable technology such as the FPGA. A large part of the value of NCL derives from its architectural level behavior, inherent pipelining, and optimization opportunities such as the merging of register and combina- tional logic functions. In this work, a number of NCL multiplier architectures have been analyzed to reveal the performance trade-offs between various non-pipelined, 1D and 2D organizations. Two-dimensional pipelining can easily be applied to regular architectures such as array multipliers in a way that is both high performance and area-efficient. It was found that the performance of 2D pipelining for small networks such as multipliers is around 260% faster than the equivalent non-pipelined design. However, the design uses 265% more transistors so the methodology is mainly of benefit where performance is strongly favored over area. A pipelined 32bit x 32bit signed Baugh-Wooley multiplier with Wallace-Tree Carry Save Adders (CSA), which is representative of a real design used for CPUs and DSPs, was used to further explore this concept as it is faster and has fewer pipeline stages compared to the normal array multiplier using Ripple-Carry adders (RCA). It was found that 1D pipelining with ripple-carry chains is an efficient implementation option but becomes less so for larger multipliers, due to the completion logic for which the delay time depends largely on the number of bits involved in the completion network. The average-case performance of ripple-carry adders was explored using random input vectors and it was observed that it offers little advantage on the smaller multiplier blocks, but this particular timing characteristic of asynchronous design styles be- comes increasingly more important as word size grows. Finally, this research has resulted in the development of the first 32-Bit asynchronous RISC-V CPU core. Called the Redback RISC, the architecture is a structure of pipeline rings composed of computational oscillations linked with flow completeness relationships. It has been written using NELL, a commercial description/synthesis tool that outputs standard Verilog. The Redback has been analysed and compared to two approximately equivalent industry standard 32-Bit synchronous RISC-V cores (PicoRV32 and Rocket) that are already fabricated and used in industry. While the NCL implementation is larger than both commercial cores it has similar performance and lower power compared to the PicoRV32. The implementation results were also compared against an existing NCL design tool flow (UNCLE), which showed how much the results of these implementation strategies differ. The Redback RISC has achieved similar level of throughput and 43% better power and 34% better energy compared to one of the synchronous cores with the same benchmark test and test condition such as input sup- ply voltage. However, it was shown that area is the biggest drawback for NCL CPU design. The core is roughly 2.5× larger than synchronous designs. On the other hand its area is still 2.9× smaller than previous designs using UNCLE tools. The area penalty is largely due to the unavoidable translation into a dual-rail topology when using the standard NCL cell library
Hybrid FPGA: Architecture and Interface
Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources
with different granularities, together with domain-specific coarse-grained units. This thesis proposes
a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to
improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture,
we examine three aspects to optimise the speed and area for domain-specific applications.
First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained
elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs,
(2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of
EBs, and (5) location of additional embedded elements such as memory.
Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity
EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications.
We then propose three routing optimisation methods to meet the additional routing demand introduced
by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on
EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We
study and compare the trade-offs in delay, area and routability of these three optimisation methods.
Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors,
multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed
point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs
in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking
into account both architectural and system-level issues. Furthermore, we investigate the trade-offs
between granularities and performance by composing small FPUs into a large FPU.
The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements,
by optimising for speed, area or a combination of speed and area
Radiation Hardened by Design Methodologies for Soft-Error Mitigated Digital Architectures
abstract: Digital architectures for data encryption, processing, clock synthesis, data transfer, etc. are susceptible to radiation induced soft errors due to charge collection in complementary metal oxide semiconductor (CMOS) integrated circuits (ICs). Radiation hardening by design (RHBD) techniques such as double modular redundancy (DMR) and triple modular redundancy (TMR) are used for error detection and correction respectively in such architectures. Multiple node charge collection (MNCC) causes domain crossing errors (DCE) which can render the redundancy ineffectual. This dissertation describes techniques to ensure DCE mitigation with statistical confidence for various designs. Both sequential and combinatorial logic are separated using these custom and computer aided design (CAD) methodologies.
Radiation vulnerability and design overhead are studied on VLSI sub-systems including an advanced encryption standard (AES) which is DCE mitigated using module level coarse separation on a 90-nm process with 99.999% DCE mitigation. A radiation hardened microprocessor (HERMES2) is implemented in both 90-nm and 55-nm technologies with an interleaved separation methodology with 99.99% DCE mitigation while achieving 4.9% increased cell density, 28.5 % reduced routing and 5.6% reduced power dissipation over the module fences implementation. A DMR register-file (RF) is implemented in 55 nm process and used in the HERMES2 microprocessor. The RF array custom design and the decoders APR designed are explored with a focus on design cycle time. Quality of results (QOR) is studied from power, performance, area and reliability (PPAR) perspective to ascertain the improvement over other design techniques.
A radiation hardened all-digital multiplying pulsed digital delay line (DDL) is designed for double data rate (DDR2/3) applications for data eye centering during high speed off-chip data transfer. The effect of noise, radiation particle strikes and statistical variation on the designed DDL are studied in detail. The design achieves the best in class 22.4 ps peak-to-peak jitter, 100-850 MHz range at 14 pJ/cycle energy consumption. Vulnerability of the non-hardened design is characterized and portions of the redundant DDL are separated in custom and auto-place and route (APR). Thus, a range of designs for mission critical applications are implemented using methodologies proposed in this work and their potential PPAR benefits explored in detail.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Recommended from our members
Utilizing digital design techniques and circuits to improve energy and design efficiency of analog and mixed-signal circuits
Technology scaling has long driven large growth in the electronics market. With each successive technology generation, digital circuits become more power and area efficient. The large performance increases realized for digital circuits due to digital scaling have not translated to similar performance improvements for analog circuits. First, noise-limited analog circuits are not capable of leveraging the reduced parasitics of advanced processes, since capacitor sizes are generally set by noise requirements. Second, analog circuit performance is closely tied to the achievable device intrinsic gain, which degrades as process sizes shrink. Reduced supply voltages further exacerbate this issue, as the achievable gain per stage is limited by the number of devices that can be stacked while maintaining all devices in saturation. Finally, process variation increases with decreased feature sizes, so analog circuits have deal with increased mismatch and wider variations in threshold voltages, increasing the time required to design a circuit that is robust across process, voltage, and temperature (PVT) variation. This work seeks to address the limitations of analog circuits in advanced technologies by leveraging digital techniques and digital-like circuits that offer improved scalability. The first half of this dissertation investigates replacing the traditional closed-loop residue amplifier in a pipeline analog-to-digital converter (ADC) with an open loop dynamic amplifier. Previous works incorporating dynamic amplifiers have struggled to achieve large gains and have suffered from offset mismatch between the comparator and amplifier, which will only get worse in more advanced technologies. We propose the usage of a residue amplifier that combines an integration stage, to ensure low noise operation, with a positive feedback stage, to ensure high gain and high speed operation. By utilizing this topology, the proposed amplifier was the first dynamic amplifier to achieve a high gain of 32. Additionally, the proposed amplifier can reuse existing comparator hardware in the ADC, removing all offset mismatch between comparator and amplifier. Digital calibration techniques were applied to ensure a constant gain across PVT. The next part of this dissertation tries to overcome the scaling challenges for noise-limited ADCs with band-limited input signals. By leveraging digital filtering techniques to generate a prediction of the band-limited signal, the conversion can be limited to a range that is a fraction of the total ADC input range, allowing for significant decreases in reference and comparator power consumption. This work extends previous works by enabling accurate predictions for any band-limited signal characteristic. Previous works only focused on accurate predictions for low-activity signals. Finally, the large compute power enabled by modern technology scaling is leveraged to improve the design efficiency of analog circuits. A new automated circuit sizing tool is proposed that can achieve better performance than manual designs done by experts in a much shorter amount of time. All of these techniques help to alleviate the power and design efficiency limitations caused by technology scaling.Electrical and Computer Engineerin
- …