72 research outputs found

    Complexity Reduction in the CORDIC Algorithm by using MUXes

    Get PDF
    Nowadays, the CORDIC algorithm plays an important role to deal with the non-linear functions in hardware. In this thesis, a novel methodology is described to reduce the complexity in an unrolled CORDIC architecture, which gives higher speed, lesser area, and lower power consumption. That is, MUXes are used to replace adder stages. Five different unrolled CORDIC architectures have been implemented in ASIC using a 65nm CMOS technology with Low Power High V_T transistors. The area, computational speed, accuracy, error behavior, and power consumption have been analyzed. The design aim is to reduce the power consumption, which is more and more important depending on the area. As a result the area and power consumption get 7.9% lower and 27.2% lower separately, and the speed is 22.9% higher compared to the original unrolled CORDIC architecture

    Design and analysis of optimized CORDIC based GMSK system on FPGA platform

    Get PDF
    The Gaussian minimum shift keying (GMSK) is one of the best suited digital modulation schemes in the global system for mobile communication (GSM) because of its constant envelop and spectral efficiency characteristics. Most of the conventional GMSK approaches failed to balance the digital modulation with efficient usage of spectrum. In this article, the hardware architecture of the optimized CORDIC-based GMSK system is designed, which includes GMSK Modulation with the channel and GMSK Demodulation. The modulation consists of non-return zero (NRZ) encoder, an integrator followed by Gaussian filtering and frequency modulation (FM). The GMSK demodulation consists of FM demodulator, followed by differentiation and NRZ decoder. The FM Modulation and demodulation use the optimized CORDIC model for an In-phase (I) and quadrature (Q) phase generation. The optimized CORDIC is designed by using quadrant mapping and pipelined structure to improve the hardware and computational complexity in GMSK systems. The GMSK system is designed on the Xilinx platform and implemented on Artix-7 and Spartan-3EFPGA. The hardware constraints like area, power, and timing utilization are summarized. The comparison of the optimized CORDIC model with similar CORDIC approaches is tabulated with improvements

    KAVUAKA: a low-power application-specific processor architecture for digital hearing aids

    Get PDF
    The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements

    Design and Implementation of an RF Front-End for Software Defined Radios

    Get PDF
    Software Defined Radios have brought a major reformation in the design standards for radios, in which a large portion of the functionality is implemented through pro­ grammable signal processing devices, giving the radio the ability to change its op­ erating parameters to accommodate new features and capabilities. A software radio approach reduces the content of radio frequency and other analog components of the traditional radios and emphasizes digital signal processing to enhance overall receiver flexibility. Field Programmable Gate Arrays (FPGA) are a suitable technology for the hardware platform as they offer the potential of hardware-like performance coupled with software-like programmability. Software defined radio is a very broad field, encompassing the design of various technologies all the way from the antenna to RF, IF, and baseband digital design. The RF section primarily consists of analog hardware modules. The IF and baseband sections are primarily digital. It is the general process of the radio to convert the incoming signal from RF to IF and then IF to baseband for better signal processing system. In this thesis, some of major building blocks of a Software defined radio are de­ signed and implemented using FPGAs. The design of a Digital front end, which provides the bridge between the baseband and analog RF portions of a wireless receiver, is synthesized. The Digital front end receiver consists of a digital down converter(DDC) which in turn comprises of a direct digital frequency synthesizer (DDFS), a phase accumulator and a low pass filter. The signal processing block of the DDFS is executed using Co-ordinate Rotation Digital Computer (CORDIC) iii Abstract algorithm. Cascaded-Integrator-Comb filters (CIC) are implemented for changing the sample rate of the incoming data. Application of a DDC includes software ra­ dios, multicarrier, multimode digital receivers, micro and pico cell systems,broadband data applications, instrumentation and test equipment and in-building wireless tele­ phony. Also, in this thesis, interfaces for connecting Texas Instruments high speed and high resolution Analog-to-Digital converters (ADC) and Digital-to-Analog converters (DAC) with Xilinx Virtex-5 FPGAs are also implemented and demonstrated

    Hardware Implementation of the Logarithm Function using Improved Parabolic Synthesis

    Get PDF
    This thesis presents a design that approximates the fractional part of the based two logarithm function by using Improved Parabolic Synthesis including its CMOS VLSI implementations. Improved Parabolic Synthesis is a novel methodology in favor of implementing unary functions e.g. trigonometric, logarithm, square root etc. in hardware. It is an evolved approach from Parabolic Synthesis by combining it with Second-Degree Interpolation. In the thesis, the design explores a simple and parallel architecture for fast timing and optimizes wordlengths in computing stages for a small design. The error behavior of the design is described and char- acterized to meet the desired error metrics. This implementation is compared to other approaches e.g. Parabolic Synthesis and CORDIC using 65nm standard cell libraries and it is proved to have better performance in terms of smaller chip area, lower dynamic power, and shorter critical path

    Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing

    Get PDF
    Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial and technological revolution of the present and future. We envision a world with smart devices that are able to mimic human behavior (sense, process, and act) and perform tasks that at one time we thought could only be carried out by humans. The vision is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware platforms. However, embedding machine learning algorithms in many application domains such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing challenge. A challenge that is controlled by the computational complexity of ML algorithms, the performance/availability of hardware platforms, and the application\u2019s budget (power constraint, real-time operation, etc.). In this dissertation, we focus on the design and implementation of efficient ML algorithms to handle the aforementioned challenges. First, we apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of ML algorithms. Then, we design custom Hardware Accelerators to improve the performance of the implementation within a specified budget. Finally, a tactile data processing application is adopted for the validation of the proposed exact and approximate embedded machine learning accelerators. The dissertation starts with the introduction of the various ML algorithms used for tactile data processing. These algorithms are assessed in terms of their computational complexity and the available hardware platforms which could be used for implementation. Afterward, a survey on the existing approximate computing techniques and hardware accelerators design methodologies is presented. Based on the findings of the survey, an approach for applying algorithmic-level ACTs on machine learning algorithms is provided. Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN). The three accelerators offer a real-time classification with monumental reductions in the hardware resources and power consumption compared to existing implementations targeting the same tactile data processing application on FPGA. Moreover, the approximate accelerators maintain a high classification accuracy with a loss of at most 5%

    Efficient arithmetic for high speed DSP implementation on FPGAs

    Get PDF
    The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems. The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive. The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation
    corecore