309 research outputs found
Recommended from our members
Noise shaping Asynchronous SAR ADC based time to digital converter
Time-to-digital converters (TDCs) are key elements for the digitization of timing information in modern mixed-signal circuits such as digital PLLs, DLLs, ADCs, and on-chip jitter-monitoring circuits. Especially, high-resolution TDCs are increasingly employed in on-chip timing tests, such as jitter and clock skew measurements, as advanced fabrication technologies allow fine on-chip time resolutions. Its main purpose is to quantize the time interval of a pulse signal or the time interval between the rising edges of two clock signals. Similarly to ADCs, the performance of TDCs are also primarily characterized by Resolution, Sampling Rate, FOM, SNDR, Dynamic Range and DNL/INL. This work proposes and demonstrates 2nd order noise shaping Asynchronous SAR ADC based TDC architecture with highest resolution of 0.25 ps among current state of art designs with respect to post-layout simulation results. This circuit is a combination of low power/High Resolution 2nd Order Noise Shaped Asynchronous SAR ADC backend with simple Time to Amplitude converter (TAC) front-end and is implemented in 40nm CMOS technology. Additionally, special emphasis is given on the discussion on various current state of art TDC architectures.Electrical and Computer Engineerin
Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing
Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat
Architectures and implementations for the Polynomial Ring Engine over small residue rings
This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic
building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis.
As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4.
Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption
The Fifth NASA Symposium on VLSI Design
The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design
A VCO-based CMOS readout circuit for capacitive MEMS microphones
Microelectromechanical systems (MEMS) microphone sensors have significantly improved in the past years, while the readout electronic is mainly implemented using switched-capacitor technology. The development of new battery powered 'always-on” applications increasingly requires a low power consumption. In this paper, we show a new readout circuit approach which is based on a mostly digital Sigma Delta (SigmaDelta) analog-to-digital converter (ADC). The operating principle of the readout circuit consists of coupling the MEMS sensor to an impedance converter that modulates the frequency of a stacked-ring oscillator—a new voltage-controlled oscillator (VCO) circuit featuring a good trade-off between phase noise and power consumption. The frequency coded signal is then sampled and converted into a noise-shaped digital sequence by a time-to-digital converter (TDC). A time-efficient design methodology has been used to optimize the sensitivity of the oscillator combined with the phase noise induced by 1/𝑓 and thermal noise. The circuit has been prototyped in a 130 nm CMOS process and directly bonded to a standard MEMS microphone. The proposed VCO-based analog-to-digital converter (VCO-ADC) has been characterized electrically and acoustically. The peak signal-to-noise and distortion ratio (SNDR) obtained from measurements is 77.9 dB-A and the dynamic range (DR) is 100 dB-A. The current consumption is 750 muA at 1.8 V and the effective area is 0.12 mm2. This new readout circuit may represent an enabling advance for low-cost digital MEMS microphones.This research was funded by project TEC2017-82653-R of CICYT, Spain
Design of Analog-to-Digital Converters with Embedded Mixing for Ultra-Low-Power Radio Receivers
In the field of radio receivers, down-conversion methods usually rely on one (or more)
explicit mixing stage(s) before the analog-to-digital converter (ADC). These stages not
only contribute to the overall power consumption but also have an impact on area and can
compromise the receiver’s performance in terms of noise and linearity. On the other hand,
most ADCs require some sort of reference signal in order to properly digitize an analog
input signal. The implementation of this reference signal usually relies on bandgap
circuits and reference buffers to generate a constant, stable, dc signal. Disregarding this
conventional approach, the work developed in this thesis aims to explore the viability
behind the usage of a variable reference signal. Moreover, it demonstrates that not only
can an input signal be properly digitized, but also shifted up and down in frequency,
effectively embedding the mixing operation in an ADC. As a result, ADCs in receiver
chains can perform double-duty as both a quantizer and a mixing stage. The lesser known
charge-sharing (CS) topology, within the successive approximation register (SAR) ADCs,
is used for a practical implementation, due to its feature of “pre-charging” the reference
signal prior to the conversion. Simulation results from an 8-bit CS-SAR ADC designed in
a 0.13 μm CMOS technology validate the proposed technique
Algorithms and VLSI architectures for parametric additive synthesis
A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable
Recommended from our members
Concurrent error detection in 2-D separable linear transform
As process technology continues to scale to smaller geometries and reduces the supply voltage, reliability of the resulting semiconductor becomes a greater concern. The effect of deep submicron noise, soft errors, variation, and aging degradation pose challenges on the functional correctness of VLSI systems and places roadblocks on reductions in scale. On the other side, as computing moves toward mobile, the energy efficiency of digital systems becomes one of the most important design metrics. However, reliability and energy efficiency are contradicting design requirements. Adding a voltage guard band is the most common method to mitigate the reliability impacts in such instances. Low power design technique like voltage over-scaling (VOS) even reduces the power by scaling the supply voltage just before data-dependant timing errors start to appear. Concurrent error detection is the solution to tackle reliability and energy-efficiency in a unified manner. Fault tolerance can be deployed at different design hierarchies. Given its low overhead, algorithm level error detection is an attractive approach. In this work, a generic weighted checksum code based error detection algorithm targeted generic 2-D separable linear transform is proposed. This technique encodes the input array at the 2-D linear trans- formation level, and algorithms are designed to operate on encoded data and produce encoded output data. The proposed error detection technique is a system-level method and therefore can be used in existing hardware or software 2-D linear transformation architectures with low overhead. The mathematic proof of the algorithm is provided within the scope of this dissertation. The checksum weighting vector for several common transforms are derived as examples, error detection cost and algorithm effectiveness are analyzed. In traditional fault tolerance study, the error is often evaluated at the boolean level. Many DSP applications, like 2-D linear transformation used in the multimedia compression system, do not require exactly correct results, but rather that the quality of the output is within the acceptable range. A generic quality aware error detection in the 2-D separable linear transform is proposed by extending the above property and defining the errors at the functional level. As an example, the quality-aware error detection technique is deployed on a low-power wavelet lifting transform architecture in JPEG2000. A low-cost Signal to Noise Ratio (SNR) aware detection logic based on proposed scheme is integrated into the discrete wavelet lifting transform architecture. This detection logic checks whether the image quality degradation caused by voltage over-scaling induced timing errors is acceptable and determines the optimal voltage set point in operating conditions at run time. This novel quality-based error detection approach is significantly different from traditional error detection schemes which look for exact data equivalence. A simulation result for one design shows that the supply voltage can be scaled down to 75% of the nominal voltage in typical process corner without significant image quality degradation, which translates to 9.15mW power consumption (44% power saving).Electrical and Computer Engineerin
Low Noise, Jitter Tolerant Continuous-Time Sigma-Delta Modulator
The demand for higher data rates in receivers with carrier aggregation (CA) such as LTE, increases the efforts to integrate large number of wireless services into single receiving path, so it needs to digitize the signal in intermediate or high frequencies. It relaxes most of the front-end blocks but makes the design of ADC very challenging. Solving the bottleneck associated with ADC in receiver architecture is a major focus of many ongoing researches. Recently, continuous time Sigma-Delta analog-to-digital converters (ADCs) are getting more attention due to their inherent filtering properties, lower power consumption and wider input bandwidth. But, it suffers from several non-idealities such as clock jitter and ELD which decrease the ADC performance.
This dissertation presents two projects that address CT-ΣΔ modulator non-idealities. One of the projects is a CT- ΣΔ modulator with 10.9 Effective Number of Bits (ENOB) with Gradient Descent (GD) based calibration technique. The GD algorithm is used to extract loop gain transfer function coefficients. A quantization noise reduction technique is then employed to improve the Signal to Quantization Noise Ratio (SQNR) of the modulator using a 7-bit embedded quantizer. An analog fast path feedback topology is proposed which uses an analog differentiator in order to compensate excess loop delay. This approach relaxes the requirements of the amplifier placed in front of the quantizer. The modulator is implemented using a third order loop filter with a feed-forward compensation paths and a 3-bit quantizer in the feedback loop. In order to save power and improve loop linearity a two-stage class-AB amplifier is developed. The prototype modulator is implemented in 0.13μm CMOS technology, which achieves peak Signal to Noise and Distortion Ratio (SNDR) of 67.5dB while consuming total power of 8.5-mW under a 1.2V supply with an over sampling ratio of 10 at 300MHz sampling frequency. The prototype achieves Walden's Figure of Merit (FoM) of 146fJ/step.
The second project addresses clock jitter non-ideality in Continuous Time Sigma Delta modulators (CT- ΣΔM), the modulator suffer from performance degradation due to uncertainty in timing of clock at digital-to-analog converter (DAC). This thesis proposes to split the loop filter into two parts, analog and digital part to reduce the sensitivity of feedback DAC to clock jitter. By using the digital first-order filter after the quantizer, the effect of clock jitter is reduced without changing signal transfer function (STF). On the other hand, as one pole of the loop filter is implemented digitally, the power and area are reduced by minimizing active analog elements. Moreover, having more digital blocks in the loop of CT- ΣΔM makes it less sensitive to process, voltage, and temperature variations. We also propose the use of a single DAC with a current divider to implement feedback coefficients instead of two DACs to decrease area and clock routing. The prototype is implemented in TSMC 40 nm technology and occupies 0.06 mm^2 area; the proposed solution consumes 6.9 mW, and operates at 500 MS/s. In a 10 MHz bandwidth, the measured dynamic range (DR), peak signal-to-noise-ratio (SNR), and peak signal-to-noise and distortion (SNDR) ratios in presence of 4.5 ps RMS clock jitter (0.22% clock period) are 75 dB, 68 dB, and 67 dB, respectively. The proposed structure is 10 dB more tolerant to clock jitter when compared to the conventional ΣΔM design for similar loop filter
- …