668 research outputs found
A bibliography on parallel and vector numerical algorithms
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also
Arithmetic Operations in Multi-Valued Logic
This paper presents arithmetic operations like addition, subtraction and
multiplications in Modulo-4 arithmetic, and also addition, multiplication in
Galois field, using multi-valued logic (MVL). Quaternary to binary and binary
to quaternary converters are designed using down literal circuits. Negation in
modular arithmetic is designed with only one gate. Logic design of each
operation is achieved by reducing the terms using Karnaugh diagrams, keeping
minimum number of gates and depth of net in to consideration. Quaternary
multiplier circuit is proposed to achieve required optimization. Simulation
result of each operation is shown separately using Hspice.Comment: 12 Pages, VLSICS Journal 201
A VLSI synthesis of a Reed-Solomon processor for digital communication systems
The Reed-Solomon codes have been widely used in digital communication systems such as computer networks, satellites, VCRs, mobile communications and high- definition television (HDTV), in order to protect digital data against erasures, random and burst errors during transmission. Since the encoding and decoding algorithms for such codes are computationally intensive, special purpose hardware implementations are often required to meet the real time requirements. -- One motivation for this thesis is to investigate and introduce reconfigurable Galois field arithmetic structures which exploit the symmetric properties of available architectures. Another is to design and implement an RS encoder/decoder ASIC which can support a wide family of RS codes. -- An m-programmable Galois field multiplier which uses the standard basis representation of the elements is first introduced. It is then demonstrated that the exponentiator can be used to implement a fast inverter which outperforms the available inverters in GF(2m). Using these basic structures, an ASIC design and synthesis of a reconfigurable Reed-Solomon encoder/decoder processor which implements a large family of RS codes is proposed. The design is parameterized in terms of the block length n, Galois field symbol size m, and error correction capability t for the various RS codes. The design has been captured using the VHDL hardware description language and mapped onto CMOS standard cells available in the 0.8-µm BiCMOS design kits for Cadence and Synopsys tools. The experimental chip contains 218,206 logic gates and supports values of the Galois field symbol size m = 3,4,5,6,7,8 and error correction capability t = 1,2,3, ..., 16. Thus, the block length n is variable from 7 to 255. Error correction t and Galois field symbol size m are pin-selectable. -- Since low design complexity and high throughput are desired in the VLSI chip, the algebraic decoding technique has been investigated instead of the time or transform domain. The encoder uses a self-reciprocal generator polynomial which structures the codewords in a systematic form. At the beginning of the decoding process, received words are initially stored in the first-in-first-out (FIFO) buffer as they enter the syndrome module. The Berlekemp-Massey algorithm is used to determine both the error locator and error evaluator polynomials. The Chien Search and Forney's algorithms operate sequentially to solve for the error locations and error values respectively. The error values are exclusive or-ed with the buffered messages in order to correct the errors, as the processed data leave the chip
FFT for the APE Parallel Computer
We present a parallel FFT algorithm for SIMD systems following the `Transpose
Algorithm' approach. The method is based on the assignment of the data field
onto a 1-dimensional ring of systolic cells. The systolic array can be
universally mapped onto any parallel system. In particular for systems with
next-neighbour connectivity our method has the potential to improve the
efficiency of matrix transposition by use of hyper-systolic communication. We
have realized a scalable parallel FFT on the APE100/Quadrics massively parallel
computer, where our implementation is part of a 2-dimensional hydrodynamics
code for turbulence studies. A possible generalization to 4-dimensional FFT is
presented, having in mind QCD applications.Comment: 17 pages, 13 figures, figures include
Towards hardware acceleration of neuroevolution for multimedia processing applications on mobile devices
This paper addresses the problem of accelerating large artificial neural networks (ANN), whose topology and weights can evolve via the use of a genetic algorithm. The proposed digital hardware architecture is capable of processing any evolved network topology, whilst at the same time providing a good trade off between throughput, area and power consumption. The latter is vital for a longer battery life on mobile devices. The architecture uses multiple parallel arithmetic units in each processing element (PE). Memory partitioning and data caching are used to minimise the effects of PE pipeline stalling. A first order minimax polynomial approximation scheme, tuned via a genetic algorithm, is used for the activation function generator. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design
Area- Efficient VLSI Implementation of Serial-In Parallel-Out Multiplier Using Polynomial Representation in Finite Field GF(2m)
Finite field multiplier is mainly used in elliptic curve cryptography,
error-correcting codes and signal processing. Finite field multiplier is
regarded as the bottleneck arithmetic unit for such applications and it is the
most complicated operation over finite field GF(2m) which requires a huge
amount of logic resources. In this paper, a new modified serial-in parallel-out
multiplication algorithm with interleaved modular reduction is suggested. The
proposed method offers efficient area architecture as compared to proposed
algorithms in the literature. The reduced finite field multiplier complexity is
achieved by means of utilizing logic NAND gate in a particular architecture.
The efficiency of the proposed architecture is evaluated based on criteria such
as time (latency, critical path) and space (gate-latch number) complexity. A
detailed comparative analysis indicates that, the proposed finite field
multiplier based on logic NAND gate outperforms previously known resultsComment: 19 pages, 4 figure
Systolic array implementation of Euclid's algorithm for inversion and division in GF(2m)
[[abstract]]This paper presents a new systolic VLSI architecture for computing inverses and divisions in finite fields GF(2m) based on a variant of Euclid's algorithm. It is highly regular, modular, and thus well suited to VLSI implementation. It has O(m2) area complexity and can produce one result per clock cycle with a latency of 8m-2 clock cycles. As compared to existing related systolic architectures with the same throughput performance, the proposed one gains a significant improvement in area complexity[[fileno]]2030102030060[[department]]電機工程學
Speeding up a scalable modular inversion hardware architecture
The modular inversion is a fundamental process in several cryptographic systems.
It can be computed in software or hardware, but hardware computation proven to be
faster and more secure. This research focused on improving an old scalable inversion
hardware architecture proposed in 2004 for finite field GF(p). The architecture has
been made of two parts, a computing unit and a memory unit. The memory unit is to
hold all the data bits of computation whereas the computing unit performs all the
arithmetic operations in word (digit) by word bases known as scalable method.
The main objective of this project was to investigate the cost and benefit of
modifying the memory unit to include parallel shifting, which was one of the tasks of
the scalable computing unit. The study included remodeling the entire hardware
architecture removing the shifter from the scalable computing part embedding it in
the memory unit instead. This modification resulted in a speedup to the complete
inversion process with an area increase due to the new memory shifting unit.
Quantitative measurements of the speed area trade-off have been investigated. The
results showed that the extra hardware to be added for this modification compared to
the speedup gained, giving the user the complete picture to choose from depending on
the application need.the British council in Saudi Arabia, KFUPM, Dr. Tatiana Kalganova at the Electrical &
Computer Engineering Department of Brunel University in Uxbridg
- …