817 research outputs found
A 16-bit CORDIC rotator for high-performance wireless LAN
In this paper we propose a novel 16-bit low power CORDIC rotator that is used for high-speed wireless LAN. The algorithm converges to the final target angle by adaptively selecting appropriate iteration steps while keeping the scale factor virtually constant. The VLSI architecture of the proposed design eliminates the entire arithmetic hardware in the angle approximation datapath and reduces the number of iterations by 50% on an average. The cell area of the processor is 0.7 mm2 and it dissipates 7 mW power at 20 MHz frequency
On the hardware reduction of z-datapath of vectoring CORDIC
In this article we present a novel design of a hardware optimal vectoring CORDIC processor. We present a mathematical theory to show that using bipolar binary notation it is possible to eliminate all the arithmetic computations required along the z-datapath. Using this technique it is possible to achieve three and 1.5 times reduction in the number of registers and adder respectively compared to conventional CORDIC. Following this, a 16-bit vectoring CORDIC is designed for the application in Synchronizer for IEEE 802.11a standard. The total area and dynamic power consumption of the processor is 0.14 mm2 and 700?W respectively when synthesized in 0.18?m CMOS library which shows its effectiveness as a low-area low-power processor
A VLSI Array Architecture for Realization of DFT, DHT, DCT and DST
A unified array architecture is described for computation of DFT, DHT, DCT and DST using a modified CORDIC (CoOrdinate Rotation DIgital Computer) arithmetic unit as the basic Processing Element (PE). All these four transforms can be computed by simple rearrangement of input samples. Compared to five other existing architectures, this one has the advantage in speed in terms of latency and throughput. Moreover, the simple local neighborhood interprocessor connections make it convenient for VLSI implementation. The architecture can be extended to compute transformation of longer length by judicially cascading the modules of shorter transformation length which will be suitable for Wafer Scale Integration (WSI). CORDIC is designed using Transmission Gate Logic (TGL) on sea of gates semicustom environment. Simulation results show that this architecture may be a suitable candidate for low power/low voltage applications
FPGA Implementation of Spectral Subtraction for In-Car Speech Enhancement and Recognition
The use of speech recognition in noisy environments requires the use of speech enhancement algorithms in order to improve recognition performance. Deploying these enhancement techniques requires significant engineering to ensure algorithms are realisable in electronic hardware. This paper describes the design decisions and process to port the popular spectral subtraction algorithm to a Virtex-4 field-programmable gate array (FPGA) device. Resource analysis shows the final design uses only 13% of the total available FPGA resources. Waveforms and spectrograms presented support the validity of the proposed FPGA design
Computational structures for robotic computations
The computational problem of inverse kinematics and inverse dynamics of robot manipulators by taking advantage of parallelism and pipelining architectures is discussed. For the computation of inverse kinematic position solution, a maximum pipelined CORDIC architecture has been designed based on a functional decomposition of the closed-form joint equations. For the inverse dynamics computation, an efficient p-fold parallel algorithm to overcome the recurrence problem of the Newton-Euler equations of motion to achieve the time lower bound of O(log sub 2 n) has also been developed
A low-power geometric mapping co-processor for high-speed graphics application
In this article we present a novel design of a low-power geometric mapping co-processor that can be used for high-performance graphics system. The processor can carry out any single or a combination of transformations belonging to affine transformation family ranging from 1-D to 3-D. It allows interactive operations which can be defined either by a user (allowing it to be a stand-alone geometric transformation processor) or by a host processor (allowing it to be a co-processor to accelerate certain graphics operations). It occupies a silicon area of 6 mm2 and consumes 40 mW power when synthesized with 0.25?m technology
DeepSecure: Scalable Provably-Secure Deep Learning
This paper proposes DeepSecure, a novel framework that enables scalable
execution of the state-of-the-art Deep Learning (DL) models in a
privacy-preserving setting. DeepSecure targets scenarios in which neither of
the involved parties including the cloud servers that hold the DL model
parameters or the delegating clients who own the data is willing to reveal
their information. Our framework is the first to empower accurate and scalable
DL analysis of data generated by distributed clients without sacrificing the
security to maintain efficiency. The secure DL computation in DeepSecure is
performed using Yao's Garbled Circuit (GC) protocol. We devise GC-optimized
realization of various components used in DL. Our optimized implementation
achieves more than 58-fold higher throughput per sample compared with the
best-known prior solution. In addition to our optimized GC realization, we
introduce a set of novel low-overhead pre-processing techniques which further
reduce the GC overall runtime in the context of deep learning. Extensive
evaluations of various DL applications demonstrate up to two
orders-of-magnitude additional runtime improvement achieved as a result of our
pre-processing methodology. This paper also provides mechanisms to securely
delegate GC computations to a third party in constrained embedded settings
- âŠ