72 research outputs found
Complexity Reduction in the CORDIC Algorithm by using MUXes
Nowadays, the CORDIC algorithm plays an important role to deal with the non-linear functions in hardware. In this thesis, a novel methodology is described to reduce the complexity in an unrolled CORDIC architecture, which gives higher speed, lesser area, and lower power consumption. That is, MUXes are used to replace adder stages. Five different unrolled CORDIC architectures have been implemented in ASIC using a 65nm CMOS technology with Low Power High V_T transistors. The area, computational speed, accuracy, error behavior, and power consumption have been analyzed. The design aim is to reduce the power consumption, which is more and more important depending on the area. As a result the area and power consumption get 7.9% lower and 27.2% lower separately, and the speed is 22.9% higher compared to the original unrolled CORDIC architecture
Design and analysis of optimized CORDIC based GMSK system on FPGA platform
The Gaussian minimum shift keying (GMSK) is one of the best suited digital modulation schemes in the global system for mobile communication (GSM) because of its constant envelop and spectral efficiency characteristics. Most of the conventional GMSK approaches failed to balance the digital modulation with efficient usage of spectrum. In this article, the hardware architecture of the optimized CORDIC-based GMSK system is designed, which includes GMSK Modulation with the channel and GMSK Demodulation. The modulation consists of non-return zero (NRZ) encoder, an integrator followed by Gaussian filtering and frequency modulation (FM). The GMSK demodulation consists of FM demodulator, followed by differentiation and NRZ decoder. The FM Modulation and demodulation use the optimized CORDIC model for an In-phase (I) and quadrature (Q) phase generation. The optimized CORDIC is designed by using quadrant mapping and pipelined structure to improve the hardware and computational complexity in GMSK systems. The GMSK system is designed on the Xilinx platform and implemented on Artix-7 and Spartan-3EFPGA. The hardware constraints like area, power, and timing utilization are summarized. The comparison of the optimized CORDIC model with similar CORDIC approaches is tabulated with improvements
KAVUAKA: a low-power application-specific processor architecture for digital hearing aids
The power consumption of digital hearing aids is very restricted due to their small physical size and the available hardware resources for signal processing are limited. However, there is a demand for more processing performance to make future hearing aids more useful and smarter. Future hearing aids should be able to detect, localize, and recognize target speakers in complex acoustic environments to further improve the speech intelligibility of the individual hearing aid user. Computationally intensive algorithms are required for this task. To maintain acceptable battery life, the hearing aid processing architecture must be highly optimized for extremely low-power consumption and high processing performance.The integration of application-specific instruction-set processors (ASIPs) into hearing aids enables a wide range of architectural customizations to meet the stringent power consumption and performance requirements. In this thesis, the application-specific hearing aid processor KAVUAKA is presented, which is customized and optimized with state-of-the-art hearing aid algorithms such as speaker localization, noise reduction, beamforming algorithms, and speech recognition. Specialized and application-specific instructions are designed and added to the baseline instruction set architecture (ISA). Among the major contributions are a multiply-accumulate (MAC) unit for real- and complex-valued numbers, architectures for power reduction during register accesses, co-processors and a low-latency audio interface. With the proposed MAC architecture, the KAVUAKA processor requires 16 % less cycles for the computation of a 128-point fast Fourier transform (FFT) compared to related programmable digital signal processors. The power consumption during register file accesses is decreased by 6 %to 17 % with isolation and by-pass techniques. The hardware-induced audio latency is 34 %lower compared to related audio interfaces for frame size of 64 samples.The final hearing aid system-on-chip (SoC) with four KAVUAKA processor cores and ten co-processors is integrated as an application-specific integrated circuit (ASIC) using a 40 nm low-power technology. The die size is 3.6 mm2. Each of the processors and co-processors contains individual customizations and hardware features with a varying datapath width between 24-bit to 64-bit. The core area of the 64-bit processor configuration is 0.134 mm2. The processors are organized in two clusters that share memory, an audio interface, co-processors and serial interfaces. The average power consumption at a clock speed of 10 MHz is 2.4 mW for SoC and 0.6 mW for the 64-bit processor.Case studies with four reference hearing aid algorithms are used to present and evaluate the proposed hardware architectures and optimizations. The program code for each processor and co-processor is generated and optimized with evolutionary algorithms for operation merging,instruction scheduling and register allocation. The KAVUAKA processor architecture is com-pared to related processor architectures in terms of processing performance, average power consumption, and silicon area requirements
Design and Implementation of an RF Front-End for Software Defined Radios
Software Defined Radios have brought a major reformation in the design standards for radios, in which a large portion of the functionality is implemented through pro grammable signal processing devices, giving the radio the ability to change its op erating parameters to accommodate new features and capabilities. A software radio approach reduces the content of radio frequency and other analog components of the traditional radios and emphasizes digital signal processing to enhance overall receiver flexibility. Field Programmable Gate Arrays (FPGA) are a suitable technology for the hardware platform as they offer the potential of hardware-like performance coupled with software-like programmability.
Software defined radio is a very broad field, encompassing the design of various technologies all the way from the antenna to RF, IF, and baseband digital design. The RF section primarily consists of analog hardware modules. The IF and baseband sections are primarily digital. It is the general process of the radio to convert the incoming signal from RF to IF and then IF to baseband for better signal processing system.
In this thesis, some of major building blocks of a Software defined radio are de signed and implemented using FPGAs. The design of a Digital front end, which provides the bridge between the baseband and analog RF portions of a wireless receiver, is synthesized. The Digital front end receiver consists of a digital down converter(DDC) which in turn comprises of a direct digital frequency synthesizer (DDFS), a phase accumulator and a low pass filter. The signal processing block
of the DDFS is executed using Co-ordinate Rotation Digital Computer (CORDIC) iii
Abstract
algorithm. Cascaded-Integrator-Comb filters (CIC) are implemented for changing the sample rate of the incoming data. Application of a DDC includes software ra dios, multicarrier, multimode digital receivers, micro and pico cell systems,broadband data applications, instrumentation and test equipment and in-building wireless tele phony. Also, in this thesis, interfaces for connecting Texas Instruments high speed and high resolution Analog-to-Digital converters (ADC) and Digital-to-Analog converters (DAC) with Xilinx Virtex-5 FPGAs are also implemented and demonstrated
Recommended from our members
Energy Efficient Loop Unrolling for Low-Cost FPGAs
Many embedded applications implement block ciphers and sorting and searching algorithms which use multiple loop iterations for computation. These applications often demand low power operation. The power consumption of designs varies with the implementation choices made by designers. The sequential implementation of loop operations consumes minimal area, but latency and clock power are high. Alternatively, loop unrolling causes high glitch power. In this work, we propose a low area overhead approach for unrolling loop iterations that exhibits reduced glitch power. A latch based glitch filter is introduced that reduces the propagation of glitches from one iteration to next. We explore the optimal number of filters to be inserted for different applications that give a good balance between area and power. We also implement partial unrolling with glitch filters. This approach consumes less area while still giving energy savings comparable to the fully unrolled implementation.
Our approach is targeted to Xilinx and Altera FPGAs. We simulate different implementation choices and compare energy results to evaluate the savings. We demonstrate our approach on SIMON-128 and AES-256 block ciphers and a sorting algorithm. We prototype our design on Xilinx Artix-7 and Altera Cyclone-IV-GX FPGA development boards and measure the actual power savings. Results show up-to 90% dynamic energy reduction in Xilinx designs, and 97% reduction in Altera designs with our glitch filtering approach due to glitch power reduction
Hardware Implementation of the Logarithm Function using Improved Parabolic Synthesis
This thesis presents a design that approximates the fractional part of the based two logarithm function by using Improved Parabolic Synthesis including its CMOS VLSI implementations. Improved Parabolic Synthesis is a novel methodology in favor of implementing unary functions e.g. trigonometric, logarithm, square root etc. in hardware. It is an evolved approach from Parabolic Synthesis by combining it with Second-Degree Interpolation. In the thesis, the design explores a simple and parallel architecture for fast timing and optimizes wordlengths in computing stages for a small design. The error behavior of the design is described and char- acterized to meet the desired error metrics. This implementation is compared to other approaches e.g. Parabolic Synthesis and CORDIC using 65nm standard cell libraries and it is proved to have better performance in terms of smaller chip area, lower dynamic power, and shorter critical path
Embedded Machine Learning: Emphasis on Hardware Accelerators and Approximate Computing for Tactile Data Processing
Machine Learning (ML) a subset of Artificial Intelligence (AI) is driving the industrial
and technological revolution of the present and future. We envision a world with smart
devices that are able to mimic human behavior (sense, process, and act) and perform
tasks that at one time we thought could only be carried out by humans. The vision
is to achieve such a level of intelligence with affordable, power-efficient, and fast hardware
platforms. However, embedding machine learning algorithms in many application domains
such as the internet of things (IoT), prostheses, robotics, and wearable devices is an ongoing
challenge. A challenge that is controlled by the computational complexity of ML algorithms,
the performance/availability of hardware platforms, and the application\u2019s budget (power
constraint, real-time operation, etc.). In this dissertation, we focus on the design and
implementation of efficient ML algorithms to handle the aforementioned challenges. First, we
apply Approximate Computing Techniques (ACTs) to reduce the computational complexity of
ML algorithms. Then, we design custom Hardware Accelerators to improve the performance
of the implementation within a specified budget. Finally, a tactile data processing application
is adopted for the validation of the proposed exact and approximate embedded machine
learning accelerators.
The dissertation starts with the introduction of the various ML algorithms used for
tactile data processing. These algorithms are assessed in terms of their computational
complexity and the available hardware platforms which could be used for implementation.
Afterward, a survey on the existing approximate computing techniques and hardware
accelerators design methodologies is presented. Based on the findings of the survey, an
approach for applying algorithmic-level ACTs on machine learning algorithms is provided.
Then three novel hardware accelerators are proposed: (1) k-Nearest Neighbor (kNN) based
on a selection-based sorter, (2) Tensorial Support Vector Machine (TSVM) based on Shallow
Neural Networks, and (3) Hybrid Precision Binary Convolution Neural Network (BCNN).
The three accelerators offer a real-time classification with monumental reductions in the
hardware resources and power consumption compared to existing implementations targeting
the same tactile data processing application on FPGA. Moreover, the approximate accelerators
maintain a high classification accuracy with a loss of at most 5%
Efficient arithmetic for high speed DSP implementation on FPGAs
The author was sponsored by EnTegra Ltd, a company who develop hardware and software products and services for the real time implementation of DSP and RF systems.
The field programmable gate array (FPGA) is being used increasingly in the field of DSP. This is due to the fact that the parallel computing power of such devices is ideal for today’s truly demanding DSP algorithms. Algorithms such as the QR-RLS update are computationally intensive and must be carried out at extremely high speeds (MHz). This means that the DSP processor is simply not an option. ASICs can be used but the expense of developing custom logic is prohibitive.
The increased use of the FPGA in DSP means that there is a significant requirement for efficient arithmetic cores that utilises the resources on such devices. This thesis presents the research and development effort that was carried out to produce fixed point division and square root cores for use in a new Electronic Design Automation (EDA) tool for EnTegra, which is targeted at FPGA implementation of DSP systems. Further to this, a new technique for predicting the accuracy of CORDIC systems computing vector magnitudes and cosines/sines is presented. This work allows the most efficient CORDIC design for a specified level of accuracy to be found quickly and easily without the need to run lengthy simulations, as was the case before. The CORDIC algorithm is a technique using mainly shifts and additions to compute many arithmetic functions and is thus ideal for FPGA implementation
- …