626 research outputs found

    FPGA implementation of an OFDM-based WLAN receiver

    Full text link
    This paper deals with the design and implementation on FPGA of a receiver for OFDM-based WLAN. The circuit is particularized for IEEE 802.11a/g standards. The system includes frame detection, time and frequency synchronization, demodulation, equalization and phase tracking. The algorithms to be implemented for each task are selected taking into account performance, hardware cost and latency. Also, a fixed point analysis is made for each algorithm. Our objective is to maintain the PER loss below 0.5 dB for a PER = 10 -2, 64-QAM and error correction. The whole system is composed of two main blocks (correlator and CORDIC) that are reused in different time intervals to perform all the necessary operations, so the required hardware resources are minimized. To verify it, the receiver is physically implemented and tested. © 2011 Elsevier B.V. All rights reserved.This work was supported by the Spanish Ministerio de Educacion y Ciencia under grant TEC2008-06787.Canet Subiela, MJ.; Valls Coquillat, J.; Almenar Terré, V.; Marín-Roig Ramón, J. (2012). FPGA implementation of an OFDM-based WLAN receiver. Microprocessors and Microsystems. 36(3):232-244. https://doi.org/10.1016/j.micpro.2011.11.004S23224436

    New virtually scaling free adaptive CORDIC rotator

    No full text
    In this article we propose a novel CORDIC rotator algorithm that eliminates the problems of scale factor compensation and limited range of convergence associated with the classical CORDIC algorithm. In our scheme, depending on the target angle or the initial coordinate of the vector, a scaling by 1 or 1/?2 is needed that can be realised with minimal hardware. The proposed CORDIC rotator adaptively selects appropriate iteration steps and converges to the final result by executing 50% less number of iterations on an average compared to that required for the classical CORDIC. Unlike classical CORDIC, the final value of the scale factor is completely independent of number of executed iterations. Based on the proposed algorithm, a 16-bit pipelined CORDIC rotator implementation has been described. The silicon area of the fabricated pipelined CORDIC rotator core is 2.73 mm2. This is equivalent to 38 k inverter gates in IHP in-house 0.25 ?m BiCMOS technology. The average dynamic power consumption of the fabricated CORDIC rotator is 17 mW @ 2.5 V supply and 20Msps throughput. Currently, this CORDIC rotator is used as a part of the baseband processor for a project that aims to design a single-chip wireless modem compliant with IEEE 802.11a and Hiperlan/2

    A CORDIC based QR Decomposition Technique for MIMO Detection

    Get PDF
    CORDIC based improved real and complex QR Decomposition (QRD) for channel pre-processing operations in (Multiple-Input Multiple-Output) MIMO detectors are presented in this paper. The proposed design utilizes pipelining and parallel processing techniques and reduces the latency and hardware complexity of the module respectively. Computational complexity analysis report shows the superiority of our module by 16% compared to literature. The implementation results reveal that the proposed QRD takes shorter latency compared to literature. The power consumption of 2x2 real channel matrix and 2x2 complex channel matrix was found to be 12mW and 44mW respectively on the state-of-the-art Xilinx Virtex 5 FPGA

    An approach to the application of shift-and-add algorithms on engineering and industrial processes

    Get PDF
    Different kinds of algorithms can be chosen so as to compute elementary functions. Among all of them, it is worthwhile mentioning the shift-and-add algorithms due to the fact that they have been specifically designed to be very simple and to save computer resources. In fact, almost the only operations usually involved with these methods are additions and shifts, which can be easily and efficiently performed by a digital processor. Shift-and-add algorithms allow fairly good precision with low cost iterations. The most famous algorithm belonging to this type is CORDIC. CORDIC has the capability of approximating a wide variety of functions with only the help of a slight change in their iterations. In this paper, we will analyze the requirements of some engineering and industrial problems in terms of type of operands and functions to approximate. Then, we will propose the application of shift-and-add algorithms based on CORDIC to these problems. We will make a comparison between the different methods applied in terms of the precision of the results and the number of iterations required.This research was supported by the Conselleria de Educacion of the Valencia Region Government under grant number GV/2011/043

    A study and comparison of COordinate Rotation DIgital Computer (CORDIC) architectures

    Full text link
    Most of the digital signal processing applications performs operations like multiplication, addition, square-root calculation, solving linear equations etc. The physical implementation of these operations consumes a lot of hardware and, software implementation consumes large memory. Even if they are implemented in hardware, they do not provide high speed, and due to this reason, even today the software implementation dominates hardware. For realizing operations from basic to very complex ones with less hardware, a Co-ordinate Rotation Digital Computer (CORDIC) proves beneficial. It is capable of performing mathematical operations right from addition to highly complex functions with the help of arithmetic unit and shifters only. This paper gives a brief overview of various existing CORDIC architectures, their working principle, application domain and a comparison of these architectures. Different designs are available as per the target, i.e. high accuracy and precision, low area, low latency, hardware efficient, low power, reconfigurability, etc. that can be used as per the application in which the architecture needs to be employed

    Energy-efficient embedded machine learning algorithms for smart sensing systems

    Get PDF
    Embedded autonomous electronic systems are required in numerous application domains such as Internet of Things (IoT), wearable devices, and biomedical systems. Embedded electronic systems usually host sensors, and each sensor hosts multiple input channels (e.g., tactile, vision), tightly coupled to the electronic computing unit (ECU). The ECU extracts information by often employing sophisticated methods, e.g., Machine Learning. However, embedding Machine Learning algorithms poses essential challenges in terms of hardware resources and energy consumption because of: 1) the high amount of data to be processed; 2) computationally demanding methods. Leveraging on the trade-off between quality requirements versus computational complexity and time latency could reduce the system complexity without affecting the performance. The objectives of the thesis are to develop: 1) energy-efficient arithmetic circuits outperforming state of the art solutions for embedded machine learning algorithms, 2) an energy-efficient embedded electronic system for the \u201celectronic-skin\u201d (e-skin) application. As such, this thesis exploits two main approaches: Approximate Computing: In recent years, the approximate computing paradigm became a significant major field of research since it is able to enhance the energy efficiency and performance of digital systems. \u201cApproximate Computing\u201d(AC) turned out to be a practical approach to trade accuracy for better power, latency, and size . AC targets error-resilient applications and offers promising benefits by conserving some resources. Usually, approximate results are acceptable for many applications, e.g., tactile data processing,image processing , and data mining ; thus, it is highly recommended to take advantage of energy reduction with minimal variation in performance . In our work, we developed two approximate multipliers: 1) the first one is called \u201cMETA\u201d multiplier and is based on the Error Tolerant Adder (ETA), 2) the second one is called \u201cApproximate Baugh-Wooley(BW)\u201d multiplier where the approximations are implemented in the generation of the partial products. We showed that the proposed approximate arithmetic circuits could achieve a relevant reduction in power consumption and time delay around 80.4% and 24%, respectively, with respect to the exact BW multiplier. Next, to prove the feasibility of AC in real world applications, we explored the approximate multipliers on a case study as the e-skin application. The e-skin application is defined as multiple sensing components, including 1) structural materials, 2) signal processing, 3) data acquisition, and 4) data processing. Particularly, processing the originated data from the e-skin into low or high-level information is the main problem to be addressed by the embedded electronic system. Many studies have shown that Machine Learning is a promising approach in processing tactile data when classifying input touch modalities. In our work, we proposed a methodology for evaluating the behavior of the system when introducing approximate arithmetic circuits in the main stages (i.e., signal and data processing stages) of the system. Based on the proposed methodology, we first implemented the approximate multipliers on the low-pass Finite Impulse Response (FIR) filter in the signal processing stage of the application. We noticed that the FIR filter based on (Approx-BW) outperforms state of the art solutions, while respecting the tradeoff between accuracy and power consumption, with an SNR degradation of 1.39dB. Second, we implemented approximate adders and multipliers respectively into the Coordinate Rotational Digital Computer (CORDIC) and the Singular Value Decomposition (SVD) circuits; since CORDIC and SVD take a significant part of the computationally expensive Machine Learning algorithms employed in tactile data processing. We showed benefits of up to 21% and 19% in power reduction at the cost of less than 5% accuracy loss for CORDIC and SVD circuits when scaling the number of approximated bits. 2) Parallel Computing Platforms (PCP): Exploiting parallel architectures for near-threshold computing based on multi-core clusters is a promising approach to improve the performance of smart sensing systems. In our work, we exploited a novel computing platform embedding a Parallel Ultra Low Power processor (PULP), called \u201cMr. Wolf,\u201d for the implementation of Machine Learning (ML) algorithms for touch modalities classification. First, we tested the ML algorithms at the software level; for RGB images as a case study and tactile dataset, we achieved accuracy respectively equal to 97% and 83.5%. After validating the effectiveness of the ML algorithm at the software level, we performed the on-board classification of two touch modalities, demonstrating the promising use of Mr. Wolf for smart sensing systems. Moreover, we proposed a memory management strategy for storing the needed amount of trained tensors (i.e., 50 trained tensors for each class) in the on-chip memory. We evaluated the execution cycles for Mr. Wolf using a single core, 2 cores, and 3 cores, taking advantage of the benefits of the parallelization. We presented a comparison with the popular low power ARM Cortex-M4F microcontroller employed, usually for battery-operated devices. We showed that the ML algorithm on the proposed platform runs 3.7 times faster than ARM Cortex M4F (STM32F40), consuming only 28 mW. The proposed platform achieves 15 7 better energy efficiency than the classification done on the STM32F40, consuming 81mJ per classification and 150 pJ per operation

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements
    • 

    corecore