375 research outputs found

    Design and implementation of DA FIR filter for bio-inspired computing architecture

    Get PDF
    This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability

    VLSI signal processing through bit-serial architectures and silicon compilation

    Get PDF

    REAL-TIME ADAPTIVE PULSE COMPRESSION ON RECONFIGURABLE, SYSTEM-ON-CHIP (SOC) PLATFORMS

    Get PDF
    New radar applications need to perform complex algorithms and process a large quantity of data to generate useful information for the users. This situation has motivated the search for better processing solutions that include low-power high-performance processors, efficient algorithms, and high-speed interfaces. In this work, hardware implementation of adaptive pulse compression algorithms for real-time transceiver optimization is presented, and is based on a System-on-Chip architecture for reconfigurable hardware devices. This study also evaluates the performance of dedicated coprocessors as hardware accelerator units to speed up and improve the computation of computing-intensive tasks such matrix multiplication and matrix inversion, which are essential units to solve the covariance matrix. The tradeoffs between latency and hardware utilization are also presented. Moreover, the system architecture takes advantage of the embedded processor, which is interconnected with the logic resources through high-performance buses, to perform floating-point operations, control the processing blocks, and communicate with an external PC through a customized software interface. The overall system functionality is demonstrated and tested for real-time operations using a Ku-band testbed together with a low-cost channel emulator for different types of waveforms

    Low Power Adaptive Equaliser Architectures for Wireless LMMSE Receivers

    Get PDF
    Power consumption requires critical consideration during system design for portable wireless communication devices as it has a direct influence on the battery weight and volume required for operation. Wideband Code Division Multiple Access (W-CDMA) techniques are favoured for use in future generation mobile communication systems. This thesis investigates novel low power techniques for use in system blocks within a W-CDMA adaptive linear minimum mean squared error (LMMSE) receiver architecture. Two low power techniques are presented for reducing power dissipation in the LMS adaptive filter, this being the main power consuming block within this receiver. These low power techniques are namely the decorrelating transform, this is a differential coefficient technique, and the variable length update algorithm which is a dynamic tap-length optimisation technique. The decorrelating transform is based on the principle of reducing the wordlength of filter coefficients by using the computed difference between adjacent coefficients in calculation of the filter output. The effect of reducing the wordlength of filter coefficients being presented to multipliers in the filter is a reduction in switching activity within the multiplier thus reducing power consumed. In the case of the LMS adaptive filter, with coefficients being continuously updated, the decorrelating transform is applied to these calculated coefficients with minimal hardware or computational overhead. The correlation between filter coefficients is exploited to achieve a wordlength reduction from 16 bits down to 10 bits in the FIR filter block. The variable length update algorithm is based on the principle of optimising the number of operational filter taps in the LMS adaptive filter according to operating conditions. The number of taps in operation can be increased or decreased dynamically according to the mean squared error at the output of the filter. This algorithm is used to exploit the fact that when the SNR in the channel is low the minimum mean squared error of the short equaliser is almost the same as that of the longer equaliser. Therefore, minimising the length of the equaliser will not result in poorer MSE performance and there is no disadvantage in having fewer taps in operation. If fewer taps are in operation then switching will not only be reduced in the arithmetic blocks but also in the memory blocks required by the LMS algorithm and FIR filter process. This reduces the power consumed by both these computation intensive functional blocks. Power results are obtained for equaliser lengths from 73 to 16 taps and for operation with varying input SNR. This thesis then proposes that the variable length LMS adaptive filter is applied in the adaptive LMMSE receiver to create a low power implementation. Power consumption in the receiver is reduced by the dynamic optimisation of the LMS receiver coefficient calculation. A considerable power saving is seen to be achieved when moving from a fixed length LMS implementation to the variable length design. All design architectures are coded in Verilog hardware description language at register transfer level (RTL). Once functional specification of the design is verified, synthesis is carried out using either Synopsys DesignCompiler or Cadence BuildGates to create a gate level netlist. Power consumption results are determined at the gate level and estimated using the Synopsys DesignPower tool

    Multiplierless CSD techniques for high performance FPGA implementation of digital filters.

    Get PDF
    I leverage FastCSD to develop a new, high performance iterative multiplierless structure based on a novel real-time CSD recoding, so that more zero partial products are introduced. Up to 66.7% zero partial products occur compared to 50% in the traditional modified Booth's recoding. Also, this structure reduces the non-zero partial products to a minimum. As a result, the number of arithmetic operations in the carry-save structure is reduced. Thus, an overall speed-up, as well as low-power consumption can be achieved. Furthermore, because the proposed structure involves real time CSD recoding and does not require a fixed value for the multiplier input to be known a priori, the proposed multiplier can be applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters.My work is based on a dramatic new technique for converting between 2's complement and CSD number systems, and results in high-performance structures that are particularly effective for implementing adaptive systems in reconfigurable logic.My research focus is on two key ideas for improving DSP performance: (1) Develop new high performance, efficient shift-add techniques ("multiplierless") to implement the multiply-add operations without the need for a traditional multiplier structure. (2) There is a growing trend toward design prototyping and even production in FPGAs as opposed to dedicated DSP processors or ASICs; leverage this trend synergistically with the new multiplierless structures to improve performance.Implementation of digital signal processing (DSP) algorithms in hardware, such as field programmable gate arrays (FPGAs), requires a large number of multipliers. Fast, low area multiply-adds have become critical in modern commercial and military DSP applications. In many contemporary real-time DSP and multimedia applications, system performance is severely impacted by the limitations of currently available speed, energy efficiency, and area requirement of an onboard silicon multiplier.I also introduce a new multi-input Canonical Signed Digit (CSD) multiplier unit, which requires fewer shift/add/subtract operations and reduced CSD number conversion overhead compared to existing techniques. This results in reduced power consumption and area requirements in the hardware implementation of DSP algorithms. Furthermore, because all the products are produced simultaneously, the multiplication speed and thus the throughput are improved. The multi-input multiplier unit is applied to implement digital filters with non-fixed filter coefficients, such as adaptive filters. The implementation cost of these digital filters can be further reduced by limiting the wordlength of the input signal with little or no sacrifice to the filter performance, which is confirmed by my simulation results. The proposed multiplier unit can also be applied to other DSP algorithms, such as digital filter banks or matrix and vector multiplications.Finally, the tradeoff between filter order and coefficient length in the design and implementation of high-performance filters in Field Programmable Gate Arrays (FPGAs) is discussed. Non-minimum order FIR filters are designed for implementation using Canonical Signed Digit (CSD) multiplierless implementation techniques. By increasing the filter order, the length of the coefficients can be decreased without reducing the filter performance. Thus, an overall hardware savings can be achieved.Adaptive system implementations require real-time conversion of coefficients to Canonical Signed Digit (CSD) or similar representations to benefit from multiplierless techniques for implementing filters. Multiplierless approaches are used to reduce the hardware and increase the throughput. This dissertation introduces the first non-iterative hardware algorithm to convert 2's complement numbers to their CSD representations (FastCSD) using a fixed number of shift and logic operations. As a result, the power consumption and area requirements required for hardware implementation of DSP algorithms in which the coefficients are not known a priori can be greatly reduced. Because all CSD digits are produced simultaneously, the conversion speed and thus the throughput are improved when compared to overlap-and-scan techniques such as Booth's recoding

    Low power digital signal processing

    Get PDF

    Realization of Delayed Least Mean Square Adaptive Algorithm using Verilog HDL for EEG Signals

    Get PDF
    An efficient architecture for the implementation of delayed least mean square (DLMS) adaptive filter is presented in this paper. It is shown that the proposed architectures reduces the register complexity and also supports the faster convergence. Compared to transpose form, the direct form LMS adaptive filter has fast convergence but both has most similar critical path. Further it is shown that in most of the practical cases, very small adaptation delay is sufficient enough to implement a direct-form LMS adaptive filter where in normal cases a very high sampling rate is required and also it shows that no pipelining approach is necessary. From the above discussed estimations three different architectures of LMS adaptive filter has been designed. They are, first design comprise of zero delays i.e., with no adaptation delays, second design comprises of only single delay i.e., with only one adaptation delay, and lastly the third design comprises of two adaptation delays. Among all the three designs zero adaptation delay structure gives efficient performance comparatively. Design with zero adaptation delay involves the minimum energy per sample (EPS) and also minimum area compared to other two designs. The aim of this thesis is to design an efficient filter structures to create a system-on-chip (SoC) solution by using an optimized code for solving various adaptive filtering problems in the system. In this thesis our main focus is on interference cancellation in electroencephalogram (EEG) applications by using the proposed filter structures. Modern field programmable gate arrays (FPGAs) have the resources that are required to design an effective adaptive filtering structures. The designs are evaluated in terms of design time, area and delays
    corecore