904 research outputs found
Recommended from our members
Two-dimensional DCT/IDCT architecture
A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2â arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
A 64-point Fourier transform chip for high-speed wireless LAN application using OFDM
In this article, we present a novel fixed-point 16-bit word-width 64-point FFT/IFFT processor developed primarily for the application in the OFDM based IEEE 802.11a Wireless LAN (WLAN) baseband processor. The 64-point FFT is realized by decomposing it into a 2-D structure of 8-point FFTs. This approach reduces the number of required complex multiplications compared to the conventional radix-2 64-point FFT algorithm. The complex multiplication operations are realized using shift-and-add operations. Thus, the processor does not use any 2-input digital multiplier. It also does not need any RAM or ROM for internal storage of coefficients. The proposed 64-point FFT/IFFT processor has been fabricated and tested successfully using our in-house 0.25 ?m BiCMOS technology. The core area of this chip is 6.8 mm2. The average dynamic power consumption is 41 mW @ 20 MHz operating frequency and 1.8 V supply voltage. The processor completes one parallel-to-parallel (i. e., when all input data are available in parallel and all output data are generated in parallel) 64-point FFT computation in 23 cycles. These features show that though it has been developed primarily for application in the IEEE 802.11a standard, it can be used for any application that requires fast operation as well as low power consumption
Architectures for block Toeplitz systems
In this paper efficient VLSI architectures of highly concurrent algorithms for the solution of block linear systems with Toeplitz or near-to-Toeplitz entries are presented. The main features of the proposed scheme are the use of scalar only operations, multiplications/divisions and additions, and the local communication which enables the development of wavefront array architecture. Both the mean squared error and the total squared error formulations are described and a variety of implementations are given
A comparison of VLSI architectures for time and transform domain decoding of Reed-Solomon codes
It is well known that the Euclidean algorithm or its equivalent, continued fractions, can be used to find the error locator polynomial needed to decode a Reed-Solomon (RS) code. It is shown that this algorithm can be used for both time and transform domain decoding by replacing its initial conditions with the Forney syndromes and the erasure locator polynomial. By this means both the errata locator polynomial and the errate evaluator polynomial can be obtained with the Euclidean algorithm. With these ideas, both time and transform domain Reed-Solomon decoders for correcting errors and erasures are simplified and compared. As a consequence, the architectures of Reed-Solomon decoders for correcting both errors and erasures can be made more modular, regular, simple, and naturally suitable for VLSI implementation
Bit-level pipelined digit-serial array processors
A new architecture for high performance digit-serial vector inner product (VIP) which can be pipelined to the bit-level is introduced. The design of the digit-serial vector inner product is based on a new systematic design methodology using radix-2n arithmetic. The proposed architecture allows a high level of bit-level pipelining to increase the throughput rate with minimum initial delay and minimum area. This will give designers greater flexibility in finding the best tradeoff between hardware cost and throughput rate. It is shown that sub-digit pipelined digit-serial structure can achieve a higher throughput rate with much less area consumption than an equivalent bit-parallel structure. A twin-pipe architecture to double the throughput rate of digit-serial multipliers and consequently that of the digit-serial vector inner product is also presented. The effect of the number of pipelining levels and the twin-pipe architecture on the throughput rate and hardware cost are discussed. A two's complement digit-serial architecture which can operate on both negative and positive numbers is also presented
A compact multi-chip-module implementation of a multi-precision neural network classifier
This paper describes a novel MCM digital implementation of a reconfigurable multi-precision neural network classifier. The design is based on a scalable systolic architecture with a user defined topology and arithmetic precision of the neural network. Indeed, the MCM integrates 64/32/16 neurons with a corresponding accuracy of 4/8/16-bits. A prototype has been designed and successfully tested in CMOS 0.7 ÎŒm technolog
- âŠ