131 research outputs found
Low Latency Prefix Accumulation Driven Compound MAC Unit for Efficient FIR Filter Implementation
135–138This article presents hierarchical single compound adder-based MAC with assertion based error correction for speculation variations in the prefix addition for FIR filter design. The VLSI implementation of approximation in prefix adder results show a significant delay and complexity reductions, all this at the cost of latency measures when speculation fails during carry propagation, which is the main reason preventing the use of speculation in parallel-prefix adders in DSP applications. The speculative adder which is based on Han Carlson parallel prefix adder structure accomplishes better reduction in latency. Introducing a structured and efficient shift-add technique and explore latency reduction by incorporating approximation in addition. The improvements made in terms of reduction in latency and merits in performance by the proposed MAC unit are showed through the synthesis done by FPGA hardware. Results show that proposed method outpaces both formerly projected MAC designs using multiplication methods for attaining high speed
Design and implementation of high-radix arithmetic systems based on the SDNR/RNS data representation
This project involved the design and implementation of high-radix arithmetic systems based on the hybrid SDNRIRNS data representation. Some real-time applications require a real-time arithmetic system. An SDNR/RNS arithmetic system provides parallel, real-time processing. The advantages and disadvantages of high-radix SDNR/RNS arithmetic, and the feasibility of implementing SDNR/RNS arithmetic systems in CMOS VLSI technology, were investigated in this project. A common methodological model, which included the stages of analysis, design, implementation, testing, and simulation, was followed. The combination of the SDNR and RNS transforms potential complex logic networks into simpler logic blocks. It was found that when constructing a SDNRIRNS adder, factors such as the radix, digit set, and moduli must be taken into account. There are many avenues still to explore. For example, implementing other arithmetic systems in the same CMOS VLSI technology used in this project and comparing them to equivalent SDNR/RNS systems would provide a set of benchmarks. These benchmarks would be useful in addressing issues relating to relative performance
Realizing arbitrary-precision modular multiplication with a fixed-precision multiplier datapath
Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying data path or registers. In this paper we present a simple yet effective technique for increasing the scalability of a fixed-precision Montgomery multiplier. Our idea is to extend the datapath of a Montgomery multiplier in such a way that it can also perform an ordinary multiplication of two n-bit operands (without modular reduction), yielding a 2n-bit result. This
conventional (nxn->2n)-bit multiplication is then used as a “sub-routine” to realize arbitrary-precision Montgomery multiplication according to standard software algorithms such as Coarsely Integrated Operand Scanning (CIOS). We
show that performing a 2n-bit modular multiplication on an n-bit multiplier can be done in 5n clock cycles, whereby we assume that the n-bit modular multiplication takes n cycles. Extending a Montgomery multiplier for this extra
functionality requires just some minor modifications of the datapath and entails a slight increase in silicon area
Number Systems for Deep Neural Network Architectures: A Survey
Deep neural networks (DNNs) have become an enabling component for a myriad of
artificial intelligence applications. DNNs have shown sometimes superior
performance, even compared to humans, in cases such as self-driving, health
applications, etc. Because of their computational complexity, deploying DNNs in
resource-constrained devices still faces many challenges related to computing
complexity, energy efficiency, latency, and cost. To this end, several research
directions are being pursued by both academia and industry to accelerate and
efficiently implement DNNs. One important direction is determining the
appropriate data representation for the massive amount of data involved in DNN
processing. Using conventional number systems has been found to be sub-optimal
for DNNs. Alternatively, a great body of research focuses on exploring suitable
number systems. This article aims to provide a comprehensive survey and
discussion about alternative number systems for more efficient representations
of DNN data. Various number systems (conventional/unconventional) exploited for
DNNs are discussed. The impact of these number systems on the performance and
hardware design of DNNs is considered. In addition, this paper highlights the
challenges associated with each number system and various solutions that are
proposed for addressing them. The reader will be able to understand the
importance of an efficient number system for DNN, learn about the widely used
number systems for DNN, understand the trade-offs between various number
systems, and consider various design aspects that affect the impact of number
systems on DNN performance. In addition, the recent trends and related research
opportunities will be highlightedComment: 28 page
A high-speed integrated circuit with applications to RSA Cryptography
Merged with duplicate record 10026.1/833 on 01.02.2017 by CS (TIS)The rapid growth in the use of computers and networks in government, commercial and
private communications systems has led to an increasing need for these systems to be
secure against unauthorised access and eavesdropping. To this end, modern computer
security systems employ public-key ciphers, of which probably the most well known is the
RSA ciphersystem, to provide both secrecy and authentication facilities.
The basic RSA cryptographic operation is a modular exponentiation where the modulus
and exponent are integers typically greater than 500 bits long. Therefore, to obtain reasonable
encryption rates using the RSA cipher requires that it be implemented in hardware.
This thesis presents the design of a high-performance VLSI device, called the WHiSpER
chip, that can perform the modular exponentiations required by the RSA cryptosystem
for moduli and exponents up to 506 bits long. The design has an expected throughput
in excess of 64kbit/s making it attractive for use both as a general RSA processor within
the security function provider of a security system, and for direct use on moderate-speed
public communication networks such as ISDN.
The thesis investigates the low-level techniques used for implementing high-speed arithmetic
hardware in general, and reviews the methods used by designers of existing modular
multiplication/exponentiation circuits with respect to circuit speed and efficiency.
A new modular multiplication algorithm, MMDDAMMM, based on Montgomery arithmetic,
together with an efficient multiplier architecture, are proposed that remove the
speed bottleneck of previous designs.
Finally, the implementation of the new algorithm and architecture within the WHiSpER
chip is detailed, along with a discussion of the application of the chip to ciphering and key
generation
Radix-8 Booth Encoded Modulo Multiplier
Abstract To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. The encryption and decryption of PKC algorithms are performed by repeated modulo multiplications these multiplications differ from those encountered in signal processing and general computing applications. The Residue Number System (RNS) has emerged as a promising alternative number representation for the design of faster and low power multipliers owing to its merit to distribute a long integer multiplication into several shorter and independent modulo multiplications. The multipliers are the essential elements of the digital signal processing such as filtering, convolution, transformations and Inner products. RNS has also been successfully employed to design fault tolerant digital circuits. The modulo multiplier is usually the noncritical data path among all modulo multipliers in such high-DR RNS multiplier. This timing slack can be exploited to reduce the system area and power consumption without compromising the system performance. With this precept, a family of radix-8 Booth encoded modulo multipliers, with delay adaptable to the RNS multiplier delay, is proposed. In this paper, the radix-8 Booth encoded modulo multipliers whose delay can be tuned to match the RNS delay. In the proposed multiplier, the hard multiple is implemented using small word-length ripple carry adders (RCAs) operating in parallel. The carry-out bits from the adders are not propagated but treated as partial product bits to be accumulated in the CSA tree. The delay of the modulo multiplier can be directly controlled by the word-length of the RCAs to equal the delay of the critical modulo multiplier of the RNS. By combining radix-8 Booth encoded modulo multiplier, CSA and prefix architecture of multiplier, for high speed and low-power is achieved
Algorithms and VLSI architectures for parametric additive synthesis
A parametric additive synthesis approach to sound synthesis is advantageous as it can model sounds in a large scale manner, unlike the classical sinusoidal additive based synthesis paradigms. It is known that a large body of naturally occurring sounds are resonant in character and thus fit the concept well. This thesis is concerned with the computational optimisation of a super class of form ant synthesis which extends the sinusoidal parameters with a spread parameter known as band width. Here a modified formant algorithm is introduced which can be traced back to work done at IRCAM, Paris. When impulse driven, a filter based approach to modelling a formant limits the computational work-load. It is assumed that the filter's coefficients are fixed at initialisation, thus avoiding interpolation which can cause the filter to become chaotic. A filter which is more complex than a second order section is required. Temporal resolution of an impulse generator is achieved by using a two stage polyphase decimator which drives many filterbanks. Each filterbank describes one formant and is composed of sub-elements which allow variation of the formant’s parameters. A resource manager is discussed to overcome the possibility of all sub- banks operating in unison. All filterbanks for one voice are connected in series to the impulse generator and their outputs are summed and scaled accordingly. An explorative study of number systems for DSP algorithms and their architectures is investigated. I invented a new theoretical mechanism for multi-level logic based DSP. Its aims are to reduce the number of transistors and to increase their functionality. A review of synthesis algorithms and VLSI architectures are discussed in a case study between a filter based bit-serial and a CORDIC based sinusoidal generator. They are both of similar size, but the latter is always guaranteed to be stable
A computer-aided design for digital filter implementation
Imperial Users onl
FIR Filter IC Design Using Redundant Binary Number Systems
Conventional number systems is the weighted fixed positive radix number systems, where signed number uses the sign as a symbol followed by the number part either in magnitude or r’s complement form. Addition of conventional number systems requires carry propagation (serial signal propagation) from LSD to MSD and the addition time depends on word-length, which is the main limitation of the VLSI performance.But Redundant number systems (RNS) is to allow addition of two numbers in which no serial signal propagation is required along the adder; that is, the time duration of the operation is independent of length of the operands and is the time required for the addition of two digits. This is the advantage of RNS over conventional number systems. Because of this advantage, in this thesis it proposed to design an FIR filter based on RNS. In order to implement FIR filter, it is necessary to design adder, multiplier and D-FF. For implementation, the structural blocks are to be designed such as PPM adder, MMP subtractor, D-FF, Digit-serial multiplier.In this thesis, a 368.18MHZ 3-tap FIR filter and 80MHZ Box-car FIR filter be designed based on bottom-up design flow using CADENCE 5.1.41, cadence IC design environment. The design was based on the CMOS 90nm technology process. Bottom level transistors are used from gpdk090 library. The advantages of full custom are maximum circuit performance, minimum design size, and minimum high-volume production cost
- …