370 research outputs found

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    High-Throughput FPGA Implementation of QR Decomposition

    Get PDF
    Munoz, S.D.; Hormigo, J. "High-Throughput FPGA Implementation of QR Decomposition" IEEE Transactions on in Circuits and Systems II: Express Briefs,vol.62, no.9, pp.861-865, Sept. 2015 doi: 10.1109/TCSII.2015.2435753This brief presents a hardware design to achieve high-throughput QR decomposition, using Givens Rotation Method. It utilizes a new two-dimensional systolic array architecture with pipelined processing elements, which are based on the COordinate Rotation DIgital Computer (CORDIC) algorithm. CORDIC computes vector rotations through shifts and additions. This approach allows a continuous computation of QR factorizations with simple hardware. A fixed-point FPGA architecture for 4 x 4 matrices has been optimized by balancing the number of CORDIC iterations with the final error. As a result, compared to other previous proposals for FPGA, our design achieves at least 50% more throughput, and much less resource utilization.Ministry of Education and Science of Spain and Junta of Andalucia under contracts TIN2013-42253-P and P07-TIC-02630, respectively

    Efficient floating-point givens rotation unit

    Get PDF
    This is a post-peer-review, pre-copyedit version of an article published in Circuits, Systems, and Signal Processing.High-throughput QR decomposition is a key operation in many advanced signal processing and communication applications. For some of these applications, using floating-point computation is becoming almost compulsory. However, there are scarce works in hardware implementations of floating-point QR decomposition for embedded systems. In this paper, we propose a very efficient high-throughput floating-point Givens rotation unit for QR decomposition. Moreover, the initial proposed design for conventional number formats is enhanced by using the new Half-Unit Biased format. The provided error analysis shows the effectiveness of our proposals and the trade-off of different implementation parameters. We also present FPGA implementation results and a thorough comparison between both approaches. These implementation results also reveal outstanding improvements compared to other previous similar designs in terms of area, latency, and throughput.This work was supported in part by following Spanish projects: TIN2016-80920-R, and JA2012 P12-TIC-169

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Ultrasound Beamforming on a FPGA

    Get PDF

    FPGA-BASED IMPLEMENTATION OF DUAL-FREQUENCY PATTERN SCHEME FOR 3-D SHAPE MEASUREMENT

    Get PDF
    Structured Light Illumination (SLI) is the process where spatially varied patterns are projected onto a 3-D surface and based on the distortion by the surface topology, phase information can be calculated and a 3D model constructed. Phase Measuring Profilometry (PMP) is a particular type of SLI that requires three or more patterns temporarily multiplexed. High speed PMP attempts to scan moving objects whose motion is small so as to have little impact on the 3-D model. Given that practically all machine vision cameras and high speed cameras employ a Field Programmable Gate Array (FPGA) interface directly to the image sensors, the opportunity exists to do the processing on camera. This thesis focuses on the design, implementation, testing, and evaluation of a camera-projector system to implement a PMP dual-frequency scheme for 3-D shape measurement on a single FPGA chip. The processor architecture is implemented and tested using the Xilinx Spartan 3 FPGA chip on an Opal Kelly development board. The hardware is described using VHDL and Verilog Hardware Description Languages (HDLs)

    A review of parallel processing approaches to robot kinematics and Jacobian

    Get PDF
    Due to continuously increasing demands in the area of advanced robot control, it became necessary to speed up the computation. One way to reduce the computation time is to distribute the computation onto several processing units. In this survey we present different approaches to parallel computation of robot kinematics and Jacobian. Thereby, we discuss both the forward and the reverse problem. We introduce a classification scheme and classify the references by this scheme

    FPGA-Based Co-processor for Singular Value Array Reconciliation Tomography

    Get PDF
    This thesis describes a co-processor system that has been designed to accelerate computations associated with Singular Value Array Reconciliation Tomography (SART), a method for locating a wide-band RF source which may be positioned within an indoor environment, where RF propagation characteristics make source localization very challenging. The co-processor system is based on field programmable gate array (FPGA) technology, which offers a low-cost alternative to customized integrated circuits, while still providing the high performance, low power, and small size associated with a custom integrated solution. The system has been developed in VHDL, and implemented on a Virtex-4 SX55 FPGA development platform. The system is easy to use, and may be accessed through a C program or MATLAB script. Compared to a Pentium 4 CPU running at 3 GHz, use of the co-processor system provides a speed-up of about 6 times for the current signal matrix size of 128-by-16. Greater speed-ups may be obtained by using multiple devices in parallel. The system is capable of computing the SART metric to an accuracy of about -145 dB with respect to its true value. This level of accuracy, which is shown to be better than that obtained using single precision floating point arithmetic, allows even relatively weak signals to make a meaningful contribution to the final SART solution
    corecore