4,395 research outputs found

    A Comparative Performance of Discrete Wavelet Transform Implementations Using Multiplierless

    Get PDF
    Using discrete wavelet transform (DWT) in high-speed signal-processing applications imposes a high degree of care to hardware resource availability, latency, and power consumption. In this chapter, the design aspects and performance of multiplierless DWT is analyzed. We presented the two key multiplierless approaches, namely the distributed arithmetic algorithm (DAA) and the residue number system (RNS). We aim to estimate the performance requirements and hardware resources for each approach, allowing for the selection of proper algorithm and implementation of multi-level DAA- and RNS-based DWT. The design has been implemented and synthesized in Xilinx Virtex 6 ML605, taking advantage of Virtex 6’s embedded block RAMs (BRAMs)

    Mixed-Signal Neural Network Implementation with Programmable Neuron

    Get PDF
    This thesis introduces implementation of mixed-signal building blocks of an artificial neural network; namely the neuron and the synaptic multiplier. This thesis, also, investigates the nonlinear dynamic behavior of a single artificial neuron and presents a Distributed Arithmetic (DA)-based Finite Impulse Response (FIR) filter. All the introduced structures are designed and custom laid out

    Design and implementation of DA FIR filter for bio-inspired computing architecture

    Get PDF
    This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability

    A high-performance inner-product processor for real and complex numbers.

    Get PDF
    A novel, high-performance fixed-point inner-product processor based on a redundant binary number system is investigated in this dissertation. This scheme decreases the number of partial products to 50%, while achieving better speed and area performance, as well as providing pipeline extension opportunities. When modified Booth coding is used, partial products are reduced by almost 75%, thereby significantly reducing the multiplier addition depth. The design is applicable for digital signal and image processing applications that require real and/or complex numbers inner-product arithmetic, such as digital filters, correlation and convolution. This design is well suited for VLSI implementation and can also be embedded as an inner-product core inside a general purpose or DSP FPGA-based processor. Dynamic control of the computing structure permits different computations, such as a variety of inner-product real and complex number computations, parallel multiplication for real and complex numbers, and real and complex number division. The same structure can also be controlled to accept redundant binary number inputs for multiplication and inner-product computations. An improved 2's-complement to redundant binary converter is also presented

    Designing a Modular DSP Core For Real-Time Audio Performance

    Get PDF
    This project provides an overview for building a Digital Signal Processing (DSP) core on a Digilent Nexys2 FPGA board. The DSP core is designed to give Cal Poly students interested in DSP and its applications to audio engineering a usable platform to perform signal processing and analytics. The processes of the DSP core are modular, allowing students to design their own implementations of various adder and multiplier functions. Infinite impulse response (IIR) filters and finite impulse response (FIR) filters using both cascade and parallel implementations are the primary processing tools in the core, and all output can be visually and aurally presented using an oscilloscope and acoustic speakers, respectively. The completed project is intended to act as a platform and guide for students to design their own filters and I/O modules. The completed filters show the correct inputs and outputs necessary to properly implement filters to work with the I/O controllers and external pmod controllers. Overall, the project successfully samples and processes digital signals, and provides a visual tool for understanding how various audio effects physically work

    An Scalable matrix computing unit architecture for FPGA and SCUMO user design interface

    Get PDF
    High dimensional matrix algebra is essential in numerous signal processing and machine learning algorithms. This work describes a scalable square matrix-computing unit designed on the basis of circulant matrices. It optimizes data flow for the computation of any sequence of matrix operations removing the need for data movement for intermediate results, together with the individual matrix operations' performance in direct or transposed form (the transpose matrix operation only requires a data addressing modification). The allowed matrix operations are: matrix-by-matrix addition, subtraction, dot product and multiplication, matrix-by-vector multiplication, and matrix by scalar multiplication. The proposed architecture is fully scalable with the maximum matrix dimension limited by the available resources. In addition, a design environment is also developed, permitting assistance, through a friendly interface, from the customization of the hardware computing unit to the generation of the final synthesizable IP core. For N x N matrices, the architecture requires N ALU-RAM blocks and performs O(N*N), requiring N*N +7 and N +7 clock cycles for matrix-matrix and matrix-vector operations, respectively. For the tested Virtex7 FPGA device, the computation for 500 x 500 matrices allows a maximum clock frequency of 346 MHz, achieving an overall performance of 173 GOPS. This architecture shows higher performance than other state-of-the-art matrix computing units

    Serial-data computation in VLSI

    Get PDF

    CABE : a cloud-based acoustic beamforming emulator for FPGA-based sound source localization

    Get PDF
    Microphone arrays are gaining in popularity thanks to the availability of low-cost microphones. Applications including sonar, binaural hearing aid devices, acoustic indoor localization techniques and speech recognition are proposed by several research groups and companies. In most of the available implementations, the microphones utilized are assumed to offer an ideal response in a given frequency domain. Several toolboxes and software can be used to obtain a theoretical response of a microphone array with a given beamforming algorithm. However, a tool facilitating the design of a microphone array taking into account the non-ideal characteristics could not be found. Moreover, generating packages facilitating the implementation on Field Programmable Gate Arrays has, to our knowledge, not been carried out yet. Visualizing the responses in 2D and 3D also poses an engineering challenge. To alleviate these shortcomings, a scalable Cloud-based Acoustic Beamforming Emulator (CABE) is proposed. The non-ideal characteristics of microphones are considered during the computations and results are validated with acoustic data captured from microphones. It is also possible to generate hardware description language packages containing delay tables facilitating the implementation of Delay-and-Sum beamformers in embedded hardware. Truncation error analysis can also be carried out for fixed-point signal processing. The effects of disabling a given group of microphones within the microphone array can also be calculated. Results and packages can be visualized with a dedicated client application. Users can create and configure several parameters of an emulation, including sound source placement, the shape of the microphone array and the required signal processing flow. Depending on the user configuration, 2D and 3D graphs showing the beamforming results, waterfall diagrams and performance metrics can be generated by the client application. The emulations are also validated with captured data from existing microphone arrays.</jats:p
    corecore