29 research outputs found

    Improving Energy Efficiency of OFDM Using Adaptive Precision Reconfigurable FFT

    Get PDF
    International audienceBeing an essential issue in digital systems, especially battery-powered devices, energy efficiency has been the subject of intensive research. In this research, a multi-precision FFT module with dynamic run-time reconfigurability is proposed to trade off accuracy with the energy efficiency of OFDM in an SDR-based architecture. To support variable-size FFT, a reconfigurable memory-based architecture is investigated. It is revealed that the radix-4 FFT has the minimum computational complexity in this architecture. Regarding implementation constraints such as fixed-width memory, a noise model is exploited to statistically analyze the proposed architecture. The required FFT word-lengths for different criteria—namely BER, modulation scheme, FFT size, and SNR—are computed analytically and confirmed by simulations in AWGN and Rayleigh fading channels. At run-time, the most energy-efficient word-length is chosen and the FFT is reconfigured until the required application-specific BER is met. Evaluations show that the implementation area and the number of memory accesses are reduced. The results obtained from synthesizing basic operators of the proposed design on an FPGA show energy consumption experienced a saving of over 80 %

    Efficient multiplier-less VLSI architectures for folded pipelined complex FFT core

    Get PDF
    Fast Fourier transform (FFT) has become ubiquitous in many engineering applications. FFT is one of the most employed blocks in many communication and signal processing systems. Efficient algorithms are being designed to improve the architecture of FFT. Higher radix FFT algorithms have the traditional advantage of using less number of computational elements and are more suitable for calculating FFT of long data sequence. Among the different proposed algorithms, the split-radix FFT has shown considerable improvement in terms of reducing hardware complexity of the architecture compared to radix-2 and radix-4 FFT algorithms. Here radix-4, radix-8, and split-radix algorithms have been used in the design of different proposed complex FFT cores. The growing popularity of adopting virtual instrumentation (modular, customizable, software-defined instrumentation) has only became possible due to the use of LabVIEW with a highly interactive process known as graphical system design. The CompactRIO programmable automation controller is an advanced embedded control and data acquisition system designed for applications that require high performance and reliability. The work explains the real-time implementation of 256-point FFT and finding the power spectrum using LabVIEW and CompactRIO. New distributed arithmetic (NEDA) is one of the most used techniques in implementing multiplier-less architectures of many digital systems. In this thesis, four architectures for different FFT cores have been proposed: • Real-time implementation of FFT using CompactRIO • 32-Point Complex FFT Core Using Split-Radix Algorithm • 64-Point Complex FFT Core Using Radix-4 Algorithm • 64-Point Complex FFT Core Using Radix-8 Algorithm The proposed designs have implemented in both FPGA as well as ASIC design flows. 180nm process technology is being used for ASIC implementation. The results show the improvements of proposed designs compared to the other existing architectures

    Multiplierless Unity-Gain SDF FFTs

    Full text link

    Selected Papers from IEEE ICASI 2019

    Get PDF
    The 5th IEEE International Conference on Applied System Innovation 2019 (IEEE ICASI 2019, https://2019.icasi-conf.net/), which was held in Fukuoka, Japan, on 11–15 April, 2019, provided a unified communication platform for a wide range of topics. This Special Issue entitled “Selected Papers from IEEE ICASI 2019” collected nine excellent papers presented on the applied sciences topic during the conference. Mechanical engineering and design innovations are academic and practical engineering fields that involve systematic technological materialization through scientific principles and engineering designs. Technological innovation by mechanical engineering includes information technology (IT)-based intelligent mechanical systems, mechanics and design innovations, and applied materials in nanoscience and nanotechnology. These new technologies that implant intelligence in machine systems represent an interdisciplinary area that combines conventional mechanical technology and new IT. The main goal of this Special Issue is to provide new scientific knowledge relevant to IT-based intelligent mechanical systems, mechanics and design innovations, and applied materials in nanoscience and nanotechnology

    Architectural implementation of cordic unit and its applications

    Get PDF
    The ubiquity of DSP has made increasing demand to develop area efficient and accurate architectures in carrying out many nonlinear arithmetic operations. One such architecture is CORDIC unit which has many applications in the field of DSP including implementing transforms based on Fourier basis. This report presents architecture of CORDIC, embedded with a scaling unit that has only minimal number of adders and shifters. It can be implemented in rotation mode as well as vectoring mode. The purpose of the design is to get a scaling free CORDIC unit preserving the design of original algorithm. The proposed design has a considerable reduction in hardware when compared with other scaling free architectures. The analysis of error for different word lengths and different input ranges for fixed word length gives a better choice to choose the parameters. The error in rotation mode for 16 bit data path, obtained for Y equivalent input is 0.073% and for X equivalent input is 0.067%. We also report architecture of a DFT core that is implemented using low latency CORDIC. A scaling unit has been included to get scaled outputs. The reported DFT core architecture has 22 adders in total, in addition to 2 CORDIC units. DDS or NCO are nowadays prominently used in the applications of RF signal processing, satellite communications, etc. This report also brings out the FPGA implementation of one such DDS which has quadrature outputs. The proposed DDS design, which is based on pipelined CORDIC, has considerable improvement in terms of SFDR compared to other existing designs at reduced hardware. This report also proposes multiplier-less architecture for the implementation of radix-2^2 folded pipelined complex FFT core based on CORDIC technique. The number of points considered in the work is sixteen and the folding is done by a factor of four

    Digital and Mixed Domain Hardware Reduction Algorithms and Implementations for Massive MIMO

    Get PDF
    Emerging 5G and 6G based wireless communications systems largely rely on multiple-input-multiple-output (MIMO) systems to reduce inherently extensive path losses, facilitate high data rates, and high spatial diversity. Massive MIMO systems used in mmWave and sub-THz applications consists of hundreds perhaps thousands of antenna elements at base stations. Digital beamforming techniques provide the highest flexibility and better degrees of freedom for phased antenna arrays as compared to its analog and hybrid alternatives but has the highest hardware complexity. Conventional digital beamformers at the receiver require a dedicated analog to digital converter (ADC) for every antenna element, leading to ADCs for elements. The number of ADCs is the key deterministic factor for the power consumption of an antenna array system. The digital hardware consists of fast Fourier transform (FFT) cores with a multiplier complexity of (N log2N) for an element system to generate multiple beams. It is required to reduce the mixed and digital hardware complexities in MIMO systems to reduce the cost and the power consumption, while maintaining high performance. The well-known concept has been in use for ADCs to achieve reduced complexities. An extension of the architecture to multi-dimensional domain is explored in this dissertation to implement a single port ADC to replace ADCs in an element system, using the correlation of received signals in the spatial domain. This concept has applications in conventional uniform linear arrays (ULAs) as well as in focal plane array (FPA) receivers. Our analysis has shown that sparsity in the spatio-temporal frequency domain can be exploited to reduce the number of ADCs from N to where . By using the limited field of view of practical antennas, multiple sub-arrays are combined without interferences to achieve a factor of K increment in the information carrying capacity of the ADC systems. Applications of this concept include ULAs and rectangular array systems. Experimental verifications were done for a element, 1.8 - 2.1 GHz wideband array system to sample using ADCs. This dissertation proposes that frequency division multiplexing (FDM) receiver outputs at an intermediate frequency (IF) can pack multiple (M) narrowband channels with a guard band to avoid interferences. The combined output is then sampled using a single wideband ADC and baseband channels are retrieved in the digital domain. Measurement results were obtained by employing a element, 28 GHz antenna array system to combine channels together to achieve a 75% reduction of ADC requirement. Implementation of FFT cores in the digital domain is not always exact because of the finite precision. Therefore, this dissertation explores the possibility of approximating the discrete Fourier transform (DFT) matrix to achieve reduced hardware complexities at an allowable cost of accuracy. A point approximate DFT (ADFT) core was implemented on digital hardware using radix-32 to achieve savings in cost, size, weight and power (C-SWaP) and synthesized for ASIC at 45-nm technology

    Fast Fourier transforms on energy-efficient application-specific processors

    Get PDF
    Many of the current applications used in battery powered devices are from digital signal processing, telecommunication, and multimedia domains. Traditionally application-specific fixed-function circuits have been used in these designs in form of application-specific integrated circuits (ASIC) to reach the required performance and energy-efficiency. The complexity of these applications has increased over the years, thus the design complexity has increased even faster, which implies increased design time. At the same time, there are more and more standards to be supported, thus using optimised fixed-function implementations for all the functions in all the standards is impractical. The non-recurring engineering costs for integrated circuits have also increased significantly, so manufacturers can only afford fewer chip iterations. Although tailoring the circuit for a specific application provides the best performance and/or energy-efficiency, such approach lacks flexibility. E.g., if an error is found after the manufacturing, an expensive chip iteration is required. In addition, new functionalities cannot be added afterwards to support evolution of standards. Flexibility can be obtained with software based implementation technologies. Unfortunately, general-purpose processors do not provide the energy-efficiency of the fixed-function circuit designs. A useful trade-off between flexibility and performance is implementation based on application-specific processors (ASP) where programmability provides the flexibility and computational resources customised for the given application provide the performance. In this Thesis, application-specific processors are considered by using fast Fourier transform as the representative algorithm. The architectural template used here is transport triggered architecture (TTA) which resembles very long instruction word machines but the operand execution resembles data flow machines rather than traditional operand triggering. The developed TTA processors exploit inherent parallelism of the application. In addition, several characteristics of the application have been identified and those are exploited by developing customised functional units for speeding up the execution. Several customisations are proposed for the data path of the processor but it is also important to match the memory bandwidth to the computation speed. This calls for a memory organisation supporting parallel memory accesses. The proposed optimisations have been used to improve the energy-efficiency of the processor and experiments show that a programmable solution can have energy-efficiency comparable to fixed-function ASIC designs

    Algorithm-Architecture Co-Design for Digital Front-Ends in Mobile Receivers

    Get PDF
    The methodology behind this work has been to use the concept of algorithm-hardware co-design to achieve efficient solutions related to the digital front-end in mobile receivers. It has been shown that, by looking at algorithms and hardware architectures together, more efficient solutions can be found; i.e., efficient with respect to some design measure. In this thesis the main focus have been placed on two such parameters; first reduced complexity algorithms to lower energy consumptions at limited performance degradation, secondly to handle the increasing number of wireless standards that preferably should run on the same hardware platform. To be able to perform this task it is crucial to understand both sides of the table, i.e., both algorithms and concepts for wireless communication as well as the implications arising on the hardware architecture. It is easier to handle the high complexity by separating those disciplines in a way of layered abstraction. However, this representation is imperfect, since many interconnected "details" belonging to different layers are lost in the attempt of handling the complexity. This results in poor implementations and the design of mobile terminals is no exception. Wireless communication standards are often designed based on mathematical algorithms with theoretical boundaries, with few considerations to actual implementation constraints such as, energy consumption, silicon area, etc. This thesis does not try to remove the layer abstraction model, given its undeniable advantages, but rather uses those cross-layer "details" that went missing during the abstraction. This is done in three manners: In the first part, the cross-layer optimization is carried out from the algorithm perspective. Important circuit design parameters, such as quantization are taken into consideration when designing the algorithm for OFDM symbol timing, CFO, and SNR estimation with a single bit, namely, the Sign-Bit. Proof-of-concept circuits were fabricated and showed high potential for low-end receivers. In the second part, the cross-layer optimization is accomplished from the opposite side, i.e., the hardware-architectural side. A SDR architecture is known for its flexibility and scalability over many applications. In this work a filtering application is mapped into software instructions in the SDR architecture in order to make filtering-specific modules redundant, and thus, save silicon area. In the third and last part, the optimization is done from an intermediate point within the algorithm-architecture spectrum. Here, a heterogeneous architecture with a combination of highly efficient and highly flexible modules is used to accomplish initial synchronization in at least two concurrent OFDM standards. A demonstrator was build capable of performing synchronization in any two standards, including LTE, WiFi, and DVB-H

    A power-scalable variable-length analogue DFT processor for multi-standard wireless transceivers

    Get PDF
    In the Orthogonal Frequency-Division Multiplexing (OFDM) based transceivers, digital computation of the Discrete Fourier Transform (DFT) is a power hungry process. Reduction in the hardware cost and power consumption is possible by implementing the DFT processor with analogue circuits. This thesis presents the real-time recursive DFT processor. Previously, changing the transform length and scaling the power could only be performed by digital Fast Fourier Transform (FFT) processors. By using the real-time recursive DFT processor, the decimation filter is eliminated. Thus, further reduction in the hardware cost and power consumption of the multi-standard transceiver is achieved. The real-time recursive DFT processor was designed in 180 nm CMOS technology. Results of device mismatch analysis indicate that the 8-point recursive DFT processor has a yield of 97.5% for the BPSK modulated signal. For the QPSK modulated signal, however, yield of the 8-point recursive DFT processor is 8.9%. Moreover, doubling the transform length reduces the average dynamic range by 3dB. Accordingly, the 16-point recursive DFT processor has a yield of 43.4% for the BPSK modulated signal. Power consumption of the recursive DFT processor is about 1/6 of the power consumption of a previous analogue FFT processor
    corecore