309 research outputs found

    Approximate and timing-speculative hardware design for high-performance and energy-efficient video processing

    Get PDF
    Since the end of transistor scaling in 2-D appeared on the horizon, innovative circuit design paradigms have been on the rise to go beyond the well-established and ultraconservative exact computing. Many compute-intensive applications – such as video processing – exhibit an intrinsic error resilience and do not necessarily require perfect accuracy in their numerical operations. Approximate computing (AxC) is emerging as a design alternative to improve the performance and energy-efficiency requirements for many applications by trading its intrinsic error tolerance with algorithm and circuit efficiency. Exact computing also imposes a worst-case timing to the conventional design of hardware accelerators to ensure reliability, leading to an efficiency loss. Conversely, the timing-speculative (TS) hardware design paradigm allows increasing the frequency or decreasing the voltage beyond the limits determined by static timing analysis (STA), thereby narrowing pessimistic safety margins that conventional design methods implement to prevent hardware timing errors. Timing errors should be evaluated by an accurate gate-level simulation, but a significant gap remains: How these timing errors propagate from the underlying hardware all the way up to the entire algorithm behavior, where they just may degrade the performance and quality of service of the application at stake? This thesis tackles this issue by developing and demonstrating a cross-layer framework capable of performing investigations of both AxC (i.e., from approximate arithmetic operators, approximate synthesis, gate-level pruning) and TS hardware design (i.e., from voltage over-scaling, frequency over-clocking, temperature rising, and device aging). The cross-layer framework can simulate both timing errors and logic errors at the gate-level by crossing them dynamically, linking the hardware result with the algorithm-level, and vice versa during the evolution of the application’s runtime. Existing frameworks perform investigations of AxC and TS techniques at circuit-level (i.e., at the output of the accelerator) agnostic to the ultimate impact at the application level (i.e., where the impact is truly manifested), leading to less optimization. Unlike state of the art, the framework proposed offers a holistic approach to assessing the tradeoff of AxC and TS techniques at the application-level. This framework maximizes energy efficiency and performance by identifying the maximum approximation levels at the application level to fulfill the required good enough quality. This thesis evaluates the framework with an 8-way SAD (Sum of Absolute Differences) hardware accelerator operating into an HEVC encoder as a case study. Application-level results showed that the SAD based on the approximate adders achieve savings of up to 45% of energy/operation with an increase of only 1.9% in BD-BR. On the other hand, VOS (Voltage Over-Scaling) applied to the SAD generates savings of up to 16.5% in energy/operation with around 6% of increase in BD-BR. The framework also reveals that the boost of about 6.96% (at 50°) to 17.41% (at 75° with 10- Y aging) in the maximum clock frequency achieved with TS hardware design is totally lost by the processing overhead from 8.06% to 46.96% when choosing an unreliable algorithm to the blocking match algorithm (BMA). We also show that the overhead can be avoided by adopting a reliable BMA. This thesis also shows approximate DTT (Discrete Tchebichef Transform) hardware proposals by exploring a transform matrix approximation, truncation and pruning. The results show that the approximate DTT hardware proposal increases the maximum frequency up to 64%, minimizes the circuit area in up to 43.6%, and saves up to 65.4% in power dissipation. The DTT proposal mapped for FPGA shows an increase of up to 58.9% on the maximum frequency and savings of about 28.7% and 32.2% on slices and dynamic power, respectively compared with stat

    Serial-data computation in VLSI

    Get PDF

    Architectures and implementations for the Polynomial Ring Engine over small residue rings

    Get PDF
    This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis. As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4. Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption

    The Fifth NASA Symposium on VLSI Design

    Get PDF
    The fifth annual NASA Symposium on VLSI Design had 13 sessions including Radiation Effects, Architectures, Mixed Signal, Design Techniques, Fault Testing, Synthesis, Signal Processing, and other Featured Presentations. The symposium provides insights into developments in VLSI and digital systems which can be used to increase data systems performance. The presentations share insights into next generation advances that will serve as a basis for future VLSI design

    A VCO-based CMOS readout circuit for capacitive MEMS microphones

    Get PDF
    Microelectromechanical systems (MEMS) microphone sensors have significantly improved in the past years, while the readout electronic is mainly implemented using switched-capacitor technology. The development of new battery powered 'always-on” applications increasingly requires a low power consumption. In this paper, we show a new readout circuit approach which is based on a mostly digital Sigma Delta (SigmaDelta) analog-to-digital converter (ADC). The operating principle of the readout circuit consists of coupling the MEMS sensor to an impedance converter that modulates the frequency of a stacked-ring oscillator—a new voltage-controlled oscillator (VCO) circuit featuring a good trade-off between phase noise and power consumption. The frequency coded signal is then sampled and converted into a noise-shaped digital sequence by a time-to-digital converter (TDC). A time-efficient design methodology has been used to optimize the sensitivity of the oscillator combined with the phase noise induced by 1/𝑓 and thermal noise. The circuit has been prototyped in a 130 nm CMOS process and directly bonded to a standard MEMS microphone. The proposed VCO-based analog-to-digital converter (VCO-ADC) has been characterized electrically and acoustically. The peak signal-to-noise and distortion ratio (SNDR) obtained from measurements is 77.9 dB-A and the dynamic range (DR) is 100 dB-A. The current consumption is 750 muA at 1.8 V and the effective area is 0.12 mm2. This new readout circuit may represent an enabling advance for low-cost digital MEMS microphones.This research was funded by project TEC2017-82653-R of CICYT, Spain

    Design of Analog-to-Digital Converters with Embedded Mixing for Ultra-Low-Power Radio Receivers

    Get PDF
    In the field of radio receivers, down-conversion methods usually rely on one (or more) explicit mixing stage(s) before the analog-to-digital converter (ADC). These stages not only contribute to the overall power consumption but also have an impact on area and can compromise the receiver’s performance in terms of noise and linearity. On the other hand, most ADCs require some sort of reference signal in order to properly digitize an analog input signal. The implementation of this reference signal usually relies on bandgap circuits and reference buffers to generate a constant, stable, dc signal. Disregarding this conventional approach, the work developed in this thesis aims to explore the viability behind the usage of a variable reference signal. Moreover, it demonstrates that not only can an input signal be properly digitized, but also shifted up and down in frequency, effectively embedding the mixing operation in an ADC. As a result, ADCs in receiver chains can perform double-duty as both a quantizer and a mixing stage. The lesser known charge-sharing (CS) topology, within the successive approximation register (SAR) ADCs, is used for a practical implementation, due to its feature of “pre-charging” the reference signal prior to the conversion. Simulation results from an 8-bit CS-SAR ADC designed in a 0.13 μm CMOS technology validate the proposed technique

    Algorithms and VLSI architectures for parametric additive synthesis

    Get PDF
    A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable

    Low Noise, Jitter Tolerant Continuous-Time Sigma-Delta Modulator

    Get PDF
    The demand for higher data rates in receivers with carrier aggregation (CA) such as LTE, increases the efforts to integrate large number of wireless services into single receiving path, so it needs to digitize the signal in intermediate or high frequencies. It relaxes most of the front-end blocks but makes the design of ADC very challenging. Solving the bottleneck associated with ADC in receiver architecture is a major focus of many ongoing researches. Recently, continuous time Sigma-Delta analog-to-digital converters (ADCs) are getting more attention due to their inherent filtering properties, lower power consumption and wider input bandwidth. But, it suffers from several non-idealities such as clock jitter and ELD which decrease the ADC performance. This dissertation presents two projects that address CT-ΣΔ modulator non-idealities. One of the projects is a CT- ΣΔ modulator with 10.9 Effective Number of Bits (ENOB) with Gradient Descent (GD) based calibration technique. The GD algorithm is used to extract loop gain transfer function coefficients. A quantization noise reduction technique is then employed to improve the Signal to Quantization Noise Ratio (SQNR) of the modulator using a 7-bit embedded quantizer. An analog fast path feedback topology is proposed which uses an analog differentiator in order to compensate excess loop delay. This approach relaxes the requirements of the amplifier placed in front of the quantizer. The modulator is implemented using a third order loop filter with a feed-forward compensation paths and a 3-bit quantizer in the feedback loop. In order to save power and improve loop linearity a two-stage class-AB amplifier is developed. The prototype modulator is implemented in 0.13μm CMOS technology, which achieves peak Signal to Noise and Distortion Ratio (SNDR) of 67.5dB while consuming total power of 8.5-mW under a 1.2V supply with an over sampling ratio of 10 at 300MHz sampling frequency. The prototype achieves Walden's Figure of Merit (FoM) of 146fJ/step. The second project addresses clock jitter non-ideality in Continuous Time Sigma Delta modulators (CT- ΣΔM), the modulator suffer from performance degradation due to uncertainty in timing of clock at digital-to-analog converter (DAC). This thesis proposes to split the loop filter into two parts, analog and digital part to reduce the sensitivity of feedback DAC to clock jitter. By using the digital first-order filter after the quantizer, the effect of clock jitter is reduced without changing signal transfer function (STF). On the other hand, as one pole of the loop filter is implemented digitally, the power and area are reduced by minimizing active analog elements. Moreover, having more digital blocks in the loop of CT- ΣΔM makes it less sensitive to process, voltage, and temperature variations. We also propose the use of a single DAC with a current divider to implement feedback coefficients instead of two DACs to decrease area and clock routing. The prototype is implemented in TSMC 40 nm technology and occupies 0.06 mm^2 area; the proposed solution consumes 6.9 mW, and operates at 500 MS/s. In a 10 MHz bandwidth, the measured dynamic range (DR), peak signal-to-noise-ratio (SNR), and peak signal-to-noise and distortion (SNDR) ratios in presence of 4.5 ps RMS clock jitter (0.22% clock period) are 75 dB, 68 dB, and 67 dB, respectively. The proposed structure is 10 dB more tolerant to clock jitter when compared to the conventional ΣΔM design for similar loop filter
    corecore