514 research outputs found

    Precision-Energy-Throughput Scaling Of Generic Matrix Multiplication and Convolution Kernels Via Linear Projections

    Get PDF
    Generic matrix multiplication (GEMM) and one-dimensional convolution/cross-correlation (CONV) kernels often constitute the bulk of the compute- and memory-intensive processing within image/audio recognition and matching systems. We propose a novel method to scale the energy and processing throughput of GEMM and CONV kernels for such error-tolerant multimedia applications by adjusting the precision of computation. Our technique employs linear projections to the input matrix or signal data during the top-level GEMM and CONV blocking and reordering. The GEMM and CONV kernel processing then uses the projected inputs and the results are accumulated to form the final outputs. Throughput and energy scaling takes place by changing the number of projections computed by each kernel, which in turn produces approximate results, i.e. changes the precision of the performed computation. Results derived from a voltage- and frequency-scaled ARM Cortex A15 processor running face recognition and music matching algorithms demonstrate that the proposed approach allows for 280%~440% increase of processing throughput and 75%~80% decrease of energy consumption against optimized GEMM and CONV kernels without any impact in the obtained recognition or matching accuracy. Even higher gains can be obtained if one is willing to tolerate some reduction in the accuracy of the recognition and matching applications

    A Low-memory Spectral-correlation Analyzer For Digital Qam-srrc Waveforms

    Get PDF
    Cyclostationary signal processing (CSP) provides the ability to estimate received waveforms’ statistical features blindly. Quadrature amplitude modulated (QAM) waveforms, when filtered by the square-root-raised cosine (SRRC) pulse shape function, have cyclic features that CSP can exploit to detect waveform parameters such as symbol rate (SR) and center frequency (CF). The estimation of these SR-CF pairs enables a cognitive radio (CR) to perform spectrum sensing techniques such as spectrum sharing and interference mitigation. Here, we investigate a field-programmable gate array (FPGA) application of a blind symbol rate-center frequency estimator. First, this study provides a background on the theory behind the cyclic spectral density function (CSD), spectral correlation analyzers (SCA), and spectrum sensing. Following this is a discussion on the motivation for CubeSat spectrum sensing. An SCA implementation for low-memory devices, such as FPGA-based CubeSat, is then describes. The paper concludes by reporting the performance characteristics of the newly developed streaming-based SCA

    ASIC Implementation of Multiplexer Based DAA

    Get PDF
    ABSTRACT: In Digital Image Processing Point, Line and Edge detection are performed through software approach. The proposed Architecture performs these operations through hardware approach using Distributed Arithmetic. Distributed arithmetic (DA) has been widely used to implement inner product computations with fixed inputs. Conventional ROM-based DA suffers from large ROM requirements. To reduce the memory requirements, Adder based DA uses pre-defined structure for computation. But both the methods are suitable only if at least one input is constant. This project aims to implement a new Distributed Arithmetic Architecture for point detection, line detection and edge detection in DIP when both the inputs are variable. The new architecture is termed as Multiplexer based Distributed Arithmetic (MUX based DA). The proposed architecture takes the advantage of Multiplexer and DA for inner product computations when both the inputs are variable. In addition it reduces ROM requirement and complexity in constructing Adder based architecture for higher order inputs. Here, the performance of proposed Architecture with ROM based DA, Adder based DA and with multiplier based implementation are compared. The MUX based DA reduces power up to 81% and needs 40% of area as compared with multiplier based implementation. KEYWORDS: ROM based DA,ADDER based DA,MULTIPLEXER based DA, CADENCE 180nm Technology. I.INTRODUCTION Distributed Arithmetic (DA) has been widely adopted for its computational efficiency in many digital signal processing applications. The most frequently used form of computation in digital signal processing is a sum of products which is dot-product or inner-product generation. DA is generally abit-serial computation operation that forms a product of two vectors in one clock cycle. The typical applications include DCT, DFT (Discrete Fourier Transform), FIR (Finite Impulse Response), and DHT (Discrete Hartley Transform) which can be found in main stream multimedia standards and telecommunication protocols. The advantage of DA is its special non multiplication mechanization which uses adder replacing multiplication and therefore simplifies the hardware implementation. The idea behind the conventional DA, called ROM based, is to replace multiplication operations by pre-computing all possible values and storing these in a ROM. The Adder based DA uses a fixed architecture which can be obtained by distributing fixed variable is used for inner product computation. The DA technique distributes arithmetic operation rather than lumps themas multipliers do. Conventional DA called ROM based DA decomposes the variable input of the inner product into bit level to generate pre-computed data.ROM based DA uses a ROM table to store the pre-computed data, which makesit regular and efficient in silicon area in VLSI implementation. However, when the size of the inner product increases the ROM area increases exponentially and becomes impractically large, even using ROM partition. In contrast to conventional DA, Adder based DA decomposes the other operand of inner product into bit level, distributes the multiplication operation, and shares the common summation terms .The adder based DA exploits the distribution of binary value pattern and may maximize the hardware sharing possibility in the implementation. Although the Adder based DA requires less hardware area and smaller computation cycle time than ROM based DA, both the existing method operates only on one input as fixed but the proposed MUX base DA computes result with both the input as variable as same as MAC. The direct implementation of the filter requires more number of resources, to reduce the number of resources Distributed Arithmetic came into existence which replaces multiplications by additions and siftings. The proposed DA algorithm came into existence which uses multiplexers to remove the usage of ROM memory and complexity in constructing fixed architecture for higher order inputs. The proposed MUX based D

    Efficient Fast-Convolution-Based Waveform Processing for 5G Physical Layer

    Get PDF
    This paper investigates the application of fast-convolution (FC) filtering schemes for flexible and effective waveform generation and processing in the fifth generation (5G) systems. FC-based filtering is presented as a generic multimode waveform processing engine while, following the progress of 5G new radio standardization in the Third-Generation Partnership Project, the main focus is on efficient generation and processing of subband-filtered cyclic prefix orthogonal frequency-division multiplexing (CP-OFDM) signals. First, a matrix model for analyzing FC filter processing responses is presented and used for designing optimized multiplexing of filtered groups of CP-OFDM physical resource blocks (PRBs) in a spectrally well-localized manner, i.e., with narrow guardbands. Subband filtering is able to suppress interference leakage between adjacent subbands, thus supporting independent waveform parametrization and different numerologies for different groups of PRBs, as well as asynchronous multiuser operation in uplink. These are central ingredients in the 5G waveform developments, particularly at sub-6-GHz bands. The FC filter optimization criterion is passband error vector magnitude minimization subject to a given subband band-limitation constraint. Optimized designs with different guardband widths, PRB group sizes, and essential design parameters are compared in terms of interference levels and implementation complexity. Finally, extensive coded 5G radio link simulation results are presented to compare the proposed approach with other subband-filtered CP-OFDM schemes and time-domain windowing methods, considering cases with different numerologies or asynchronous transmissions in adjacent subbands. Also the feasibility of using independent transmitter and receiver processing for CP-OFDM spectrum control is demonstrated

    Quantum Image Processing and Its Application to Edge Detection: Theory and Experiment

    Full text link
    Processing of digital images is continuously gaining in volume and relevance, with concomitant demands on data storage, transmission and processing power. Encoding the image information in quantum-mechanical systems instead of classical ones and replacing classical with quantum information processing may alleviate some of these challenges. By encoding and processing the image information in quantum-mechanical systems, we here demonstrate the framework of quantum image processing, where a pure quantum state encodes the image information: we encode the pixel values in the probability amplitudes and the pixel positions in the computational basis states. Our quantum image representation reduces the required number of qubits compared to existing implementations, and we present image processing algorithms that provide exponential speed-up over their classical counterparts. For the commonly used task of detecting the edge of an image, we propose and implement a quantum algorithm that completes the task with only one single-qubit operation, independent of the size of the image. This demonstrates the potential of quantum image processing for highly efficient image and video processing in the big data era.Comment: 13 pages, including 9 figures and 5 appendixe
    • …
    corecore