121 research outputs found
Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues
The residue number system (RNS) is suitable for DSP architectures because of its ability to perform fast carry-free arithmetic. However, this advantage is over-shadowed by the complexity involved in the conversion of numbers between binary and RNS representations. Although the reverse conversion (RNS to binary) is more complex, the forward transformation is not simple either. Most forward converters make use of look-up tables (memory). Recently, a memoryless forward converter architecture for arbitrary moduli sets was proposed by Premkumar in 2002. In this paper, we present an extension to that architecture which results in 44% less hardware for parallel conversion and achieves 43% improvement in speed for serial conversions. It makes use of the periodicity properties of residues obtained using modular exponentiation
Residue Number System Based Building Blocks for Applications in Digital Signal Processing
Předkládaná disertační práce se zabývá návrhem základních bloků v systému zbytkových tříd pro zvýšení výkonu aplikací určených pro digitální zpracování signálů (DSP). Systém zbytkových tříd (RNS) je neváhová číselná soustava, jež umožňuje provádět paralelizovatelné, vysokorychlostní, bezpečné a proti chybám odolné aritmetické operace, které jsou zpracovávány bez přenosu mezi řády. Tyto vlastnosti jej činí značně perspektivním pro použití v DSP aplikacích náročných na výpočetní výkon a odolných proti chybám. Typický RNS systém se skládá ze tří hlavních částí: převodníku z binárního kódu do RNS, který počítá ekvivalent vstupních binárních hodnot v systému zbytkových tříd, dále jsou to paralelně řazené RNS aritmetické jednotky, které provádějí aritmetické operace s operandy již převedenými do RNS. Poslední část pak tvoří převodník z RNS do binárního kódu, který převádí výsledek zpět do výchozího binárního kódu. Hlavním cílem této disertační práce bylo navrhnout nové struktury základních bloků výše zmiňovaného systému zbytkových tříd, které mohou být využity v aplikacích DSP. Tato disertační práce předkládá zlepšení a návrhy nových struktur komponent RNS, simulaci a také ověření jejich funkčnosti prostřednictvím implementace v obvodech FPGA. Kromě návrhů nové struktury základních komponentů RNS je prezentován také podrobný výzkum různých sad modulů, který je srovnává a determinuje nejefektivnější sadu pro různé dynamické rozsahy. Dalším z klíčových přínosů disertační práce je objevení a ověření podmínky určující výběr optimální sady modulů, která umožňuje zvýšit výkonnost aplikací DSP. Dále byla navržena aplikace pro zpracování obrazu využívající RNS, která má vůči klasické binární implementanci nižší spotřebu a vyšší maximální pracovní frekvenci. V závěru práce byla vyhodnocena hlavní kritéria při rozhodování, zda je vhodnější pro danou aplikaci využít binární číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.
Optimization of new Chinese Remainder theorems using special moduli sets
The residue number system (RNS) is an integer number representation system, which is capable of supporting parallel, high-speed arithmetic. This system also offers some useful properties for error detection, error correction and fault tolerance. It has numerous applications in computation-intensive digital signal processing (DSP) operations, like digital filtering, convolution, correlation, Discrete Fourier Transform, Fast Fourier Transform, direct digital frequency synthesis, etc. The residue to binary conversion is based on Chinese Remainder Theorem (CRT) and Mixed Radix Conversion (MRC). However, the CRT requires a slow large modulo operation while the MRC requires finding the mixed radix digits which is a slow process. The new Chinese Remainder Theorems (CRT I, CRT II and CRT III) make the computations faster and efficient without any extra overheads. But, New CRTs are hardware intensive as they require many inverse modulus operators, modulus operators, multipliers and dividers. Dividers and inverse modulus operators in turn needs many half and full adders and subtractors. So, some kind of optimization is necessary to implement these theorems practically. In this research, for the optimization, new both co-prime and non co-prime multi modulus sets are proposed that simplify the new Chinese Remainder theorems by eliminating the huge summations, inverse modulo operators, and dividers. Furthermore, the proposed hardware optimization removes the multiplication terms in the theorems, which further simplifies the implementation
Application-Specific Number Representation
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-
specific number representations. Well-known number formats include fixed-point, floating-
point, logarithmic number system (LNS), and residue number system (RNS). Such different
number representations lead to different arithmetic designs and error behaviours, thus produc-
ing implementations with different performance, accuracy, and cost.
To investigate the design options in number representations, the first part of this thesis presents
a platform that enables automated exploration of the number representation design space. The
second part of the thesis shows case studies that optimise the designs for area, latency or
throughput from the perspective of number representations.
Automated design space exploration in the first part addresses the following two major issues:
² Automation requires arithmetic unit generation. This thesis provides optimised
arithmetic library generators for logarithmic and residue arithmetic units, which support
a wide range of bit widths and achieve significant improvement over previous designs.
² Generation of arithmetic units requires specifying the bit widths for each
variable. This thesis describes an automatic bit-width optimisation tool called R-Tool,
which combines dynamic and static analysis methods, and supports different number
systems (fixed-point, floating-point, and LNS numbers).
Putting it all together, the second part explores the effects of application-specific number
representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic
imaging computations. Experimental results show that customising the number representations
brings benefits to hardware implementations: by selecting a more appropriate number format,
we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by
performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%.
On the performance side, hardware implementations with customised number formats achieve
5 to potentially over 40 times speedup over software implementations
Vers une arithmétique efficace pour le chiffrement homomorphe basé sur le Ring-LWE
Fully homomorphic encryption is a kind of encryption offering the ability to manipulate encrypted data directly through their ciphertexts. In this way it is possible to process sensitive data without having to decrypt them beforehand, ensuring therefore the datas' confidentiality. At the numeric and cloud computing era this kind of encryption has the potential to considerably enhance privacy protection. However, because of its recent discovery by Gentry in 2009, we do not have enough hindsight about it yet. Therefore several uncertainties remain, in particular concerning its security and efficiency in practice, and should be clarified before an eventual widespread use. This thesis deals with this issue and focus on performance enhancement of this kind of encryption in practice. In this perspective we have been interested in the optimization of the arithmetic used by these schemes, either the arithmetic underlying the Ring Learning With Errors problem on which the security of these schemes is based on, or the arithmetic specific to the computations required by the procedures of some of these schemes. We have also considered the optimization of the computations required by some specific applications of homomorphic encryption, and in particular for the classification of private data, and we propose methods and innovative technics in order to perform these computations efficiently. We illustrate the efficiency of our different methods through different software implementations and comparisons to the related art.Le chiffrement totalement homomorphe est un type de chiffrement qui permet de manipuler directement des données chiffrées. De cette manière, il est possible de traiter des données sensibles sans avoir à les déchiffrer au préalable, permettant ainsi de préserver la confidentialité des données traitées. À l'époque du numérique à outrance et du "cloud computing" ce genre de chiffrement a le potentiel pour impacter considérablement la protection de la vie privée. Cependant, du fait de sa découverte récente par Gentry en 2009, nous manquons encore de recul à son propos. C'est pourquoi de nombreuses incertitudes demeurent, notamment concernant sa sécurité et son efficacité en pratique, et devront être éclaircies avant une éventuelle utilisation à large échelle.Cette thèse s'inscrit dans cette problématique et se concentre sur l'amélioration des performances de ce genre de chiffrement en pratique. Pour cela nous nous sommes intéressés à l'optimisation de l'arithmétique utilisée par ces schémas, qu'elle soit sous-jacente au problème du "Ring-Learning With Errors" sur lequel la sécurité des schémas considérés est basée, ou bien spécifique aux procédures de calculs requises par certains de ces schémas. Nous considérons également l'optimisation des calculs nécessaires à certaines applications possibles du chiffrement homomorphe, et en particulier la classification de données privées, de sorte à proposer des techniques de calculs innovantes ainsi que des méthodes pour effectuer ces calculs de manière efficace. L'efficacité de nos différentes méthodes est illustrée à travers des implémentations logicielles et des comparaisons aux techniques de l'état de l'art
Montgomery and RNS for RSA Hardware Implementation
There are many architectures for RSA hardware implementation which improve its performance. Two main methods for this purpose are Montgomery and RNS. These are fast methods to convert plaintext to ciphertext in RSA algorithm with hardware implementation. RNS is faster than Montgomery but it uses more area. The goal of this paper is to compare these two methods based on the speed and on the used area. For this purpose the architecture that has a better performance for each method is selected, and some modification is done to enhance their performance. This comparison can be used to select the proper method for hardware implementation in both FPGA and ASIC design
Architectures and implementations for the Polynomial Ring Engine over small residue rings
This work considers VLSI implementations for the recently introduced Polynomial Ring Engine (PRE) using small residue rings. To allow for a comprehensive approach to the implementation of the PRE mappings for DSP algorithms, this dissertation introduces novel techniques ranging from system level architectures to transistor level considerations. The Polynomial Ring Engine combines both classical residue mappings and new polynomial mappings. This dissertation develops a systematic approach for generating pipelined systolic/ semi-systolic structures for the PRE mappings. An example architecture is constructed and simulated to illustrate the properties of the new architectures. To simultaneously achieve large computational dynamic range and high throughput rate the basic
building blocks of the PRE architecture use transistor size profiling. Transistor sizing software is developed for profiling the Switching Tree dynamic logic used to build the basic modulo blocks. The software handles complex nFET structures using a simple iterative algorithm. Issues such as convergence of the iterative technique and validity of the sizing formulae have been treated with an appropriate mathematical analysis.
As an illustration of the use of PRE architectures for modem DSP computational problems, a Wavelet Transform for HDTV image compression is implemented. An interesting use is made of the PRE technique of using polynomial indeterminates as \u27placeholders\u27 for components of the processed data. In this case we use an indeterminate to symbolically handle the irrational number [square root of 3] of the Daubechie mother wavelet for N = 4.
Finally, a multi-level fault tolerant PRE architecture is developed by combining the classical redundant residue approach and the circuit parity check approach. The proposed architecture uses syndromes to correct faulty residue channels and an embedded parity check to correct faulty computational channels. The architecture offers superior fault detection and correction with online data interruption
Number Systems for Deep Neural Network Architectures: A Survey
Deep neural networks (DNNs) have become an enabling component for a myriad of
artificial intelligence applications. DNNs have shown sometimes superior
performance, even compared to humans, in cases such as self-driving, health
applications, etc. Because of their computational complexity, deploying DNNs in
resource-constrained devices still faces many challenges related to computing
complexity, energy efficiency, latency, and cost. To this end, several research
directions are being pursued by both academia and industry to accelerate and
efficiently implement DNNs. One important direction is determining the
appropriate data representation for the massive amount of data involved in DNN
processing. Using conventional number systems has been found to be sub-optimal
for DNNs. Alternatively, a great body of research focuses on exploring suitable
number systems. This article aims to provide a comprehensive survey and
discussion about alternative number systems for more efficient representations
of DNN data. Various number systems (conventional/unconventional) exploited for
DNNs are discussed. The impact of these number systems on the performance and
hardware design of DNNs is considered. In addition, this paper highlights the
challenges associated with each number system and various solutions that are
proposed for addressing them. The reader will be able to understand the
importance of an efficient number system for DNN, learn about the widely used
number systems for DNN, understand the trade-offs between various number
systems, and consider various design aspects that affect the impact of number
systems on DNN performance. In addition, the recent trends and related research
opportunities will be highlightedComment: 28 page
Application of the residue number system to the matrix multiplication problem
Due to the character of the original source materials and the nature of batch digitization, quality control issues may be present in this document. Please report any quality issues you encounter to [email protected], referencing the URI of the item.Includes bibliographical references.Not availabl
- …