Search CORE

610 research outputs found

Residue Number System Based Building Blocks for Applications in Digital Signal Processing

Author: Younes Dina
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2013
Field of study

Předkládaná disertační práce se zabývá návrhem základních bloků v systému zbytkových tříd pro zvýšení výkonu aplikací určených pro digitální zpracování signálů (DSP). Systém zbytkových tříd (RNS) je neváhová číselná soustava, jež umožňuje provádět paralelizovatelné, vysokorychlostní, bezpečné a proti chybám odolné aritmetické operace, které jsou zpracovávány bez přenosu mezi řády. Tyto vlastnosti jej činí značně perspektivním pro použití v DSP aplikacích náročných na výpočetní výkon a odolných proti chybám. Typický RNS systém se skládá ze tří hlavních částí: převodníku z binárního kódu do RNS, který počítá ekvivalent vstupních binárních hodnot v systému zbytkových tříd, dále jsou to paralelně řazené RNS aritmetické jednotky, které provádějí aritmetické operace s operandy již převedenými do RNS. Poslední část pak tvoří převodník z RNS do binárního kódu, který převádí výsledek zpět do výchozího binárního kódu. Hlavním cílem této disertační práce bylo navrhnout nové struktury základních bloků výše zmiňovaného systému zbytkových tříd, které mohou být využity v aplikacích DSP. Tato disertační práce předkládá zlepšení a návrhy nových struktur komponent RNS, simulaci a také ověření jejich funkčnosti prostřednictvím implementace v obvodech FPGA. Kromě návrhů nové struktury základních komponentů RNS je prezentován také podrobný výzkum různých sad modulů, který je srovnává a determinuje nejefektivnější sadu pro různé dynamické rozsahy. Dalším z klíčových přínosů disertační práce je objevení a ověření podmínky určující výběr optimální sady modulů, která umožňuje zvýšit výkonnost aplikací DSP. Dále byla navržena aplikace pro zpracování obrazu využívající RNS, která má vůči klasické binární implementanci nižší spotřebu a vyšší maximální pracovní frekvenci. V závěru práce byla vyhodnocena hlavní kritéria při rozhodování, zda je vhodnější pro danou aplikaci využít binární číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.

Digital library of Brno University of Technology

National Repository of Grey Literature

Recommended from our members

A Study of High Performance Multiple Precision Arithmetic on Graphics Processing Units

Author: Emmart Niall
Publication venue: ScholarWorks@UMass Amherst
Publication date: 21/03/2018
Field of study

Multiple precision (MP) arithmetic is a core building block of a wide variety of algorithms in computational mathematics and computer science. In mathematics MP is used in computational number theory, geometric computation, experimental mathematics, and in some random matrix problems. In computer science, MP arithmetic is primarily used in cryptographic algorithms: securing communications, digital signatures, and code breaking. In most of these application areas, the factor that limits performance is the MP arithmetic. The focus of our research is to build and analyze highly optimized libraries that allow the MP operations to be offloaded from the CPU to the GPU. Our goal is to achieve an order of magnitude improvement over the CPU in three key metrics: operations per second per socket, operations per watt, and operation per second per dollar. What we find is that the SIMD design and balance of compute, cache, and bandwidth resources on the GPU is quite different from the CPU, so libraries such as GMP cannot simply be ported to the GPU. New approaches and algorithms are required to achieve high performance and high utilization of GPU resources. Further, we find that low-level ISA differences between GPU generations means that an approach that works well on one generation might not run well on the next. Here we report on our progress towards MP arithmetic libraries on the GPU in four areas: (1) large integer addition, subtraction, and multiplication; (2) high performance modular multiplication and modular exponentiation (the key operations for cryptographic algorithms) across generations of GPUs; (3) high precision floating point addition, subtraction, multiplication, division, and square root; (4) parallel short division, which we prove is asymptotically optimal on EREW and CREW PRAMs

ScholarWorks@UMass Amherst

Number Systems for Deep Neural Network Architectures: A Survey

Author: Al-Qutayri Mahmoud
Alsuhli Ghada
Mohammad Baker
Sakellariou Vasileios
Saleh Hani
Stouraitis Thanos
Publication venue
Publication date: 11/07/2023
Field of study

Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlightedComment: 28 page

arXiv.org e-Print Archive

BINARY TO RNS ENCODER FOR THE MODULI SET {2n−1,2n ,2n+1} WITH EMBEDDED DIMINISHED-1 CHANNEL FOR DSP APPLICATION

Author: Krstić Ivan
Stamenković Negovan
Stojanović Vidosav
Publication venue: 'National Library of Serbia'
Publication date: 08/10/2015
Field of study

Architecture of binary to residue number system encoder based onthe moduli set {2n − 1,2n,2n + 1} with embedded modulo 2n + 1 channel in the diminished-1 representation, which can be used instead of the standard modulo 2n+1 channel, is presented. We consider the binary numbers with dynamic range of proposed moduli set which is 23n−2n. Within this dynamic range, 3n-bit binary number is partitioned into three n-bit parts and converted to residue numbers. The proposed architecture based on moduli set {2n−1,2n ,2n+1} with embedded diminished-1 encoded channel have been mapped on Xilinx FPGA chip. The proposed architecture can be utilized in conjunction with any fast binary adder without requiring any extra hardware

University of Niš: Facta Universitatis (E-Journals) / Универзитет у Нишу

Efficient Computation and FPGA implementation of Fully Homomorphic Encryption with Cloud Computing Significance

Author: Zeng Qiang
Publication venue: 'University of Windsor Leddy Library'
Publication date: 20/12/2018
Field of study

Homomorphic Encryption provides unique security solution for cloud computing. It ensures not only that data in cloud have confidentiality but also that data processing by cloud server does not compromise data privacy. The Fully Homomorphic Encryption (FHE) scheme proposed by Lopez-Alt, Tromer, and Vaikuntanathan (LTV), also known as NTRU(Nth degree truncated polynomial ring) based method, is considered one of the most important FHE methods suitable for practical implementation. In this thesis, an efficient algorithm and architecture for LTV Fully Homomorphic Encryption is proposed. Conventional linear feedback shift register (LFSR) structure is expanded and modified for performing the truncated polynomial ring multiplication in LTV scheme in parallel. Novel and efficient modular multiplier, modular adder and modular subtractor are proposed to support high speed processing of LFSR operations. In addition, a family of special moduli are selected for high speed computation of modular operations. Though the area keeps the complexity of O(Nn^2) with no advantage in circuit level. The proposed architecture effectively reduces the time complexity from O(N log N) to linear time, O(N), compared to the best existing works. An FPGA implementation of the proposed architecture for LTV FHE is achieved and demonstrated. An elaborate comparison of the existing methods and the proposed work is presented, which shows the proposed work gains significant speed up over existing works

Scholarship at UWindsor

Horner's Rule-Based Multiplication over Fp and Fp^n: A Survey

Author: Beuchat Jean-Luc
Miyoshi Takanori
Muller Jean-Michel
Okamoto Eiji
Publication venue: 'Informa UK Limited'
Publication date: 01/07/2008
Field of study

International audienceThis paper aims at surveying multipliers based on Horner's rule for finite field arithmetic. We present a generic architecture based on five processing elements and introduce a classification of several algorithms based on our model. We provide the readers with a detailed description of each scheme which should allow them to write a VHDL description or a VHDL code generator

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Digital signal processing application based on residue number system

Author: Rolko Maroš
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2011
Field of study

Tato práce se zabývá systémem zbytkových tříd a jeho aplikacemi v digitálních obvodech. První část se zabývá VHDL návrhem různých typů sčítaček v systému zbytkových tříd a jejich porovnání se standartními sčítačkami. V druhé části je implementován obrázkový processor který pracuje v systému zbytkových tříd a jeho výkonostní analýza. V textu je popsán postup návrhu a jsou prezentovány výsledky analýz.This work deals with residue number system and its applications in digital circuits. The first part is VHDL design of different adder types in residue number system and their comparison with regular adders. The second part is VHDL implementation of image processor that computes in residue number system and its performance analysis. Presented text contains description of design procedures and presentation of analysis results.

Digital library of Brno University of Technology

National Repository of Grey Literature

Parallel-prefix structures for binary and modulo {2n - 1, 2n, 2n + 1} adders

Author: Chen Jun
Publication venue
Publication date: 01/12/2008
Field of study

Adders are the among the most essential arithmetic units within digital systems. Parallel-prefix structures are efficient for adders because of their regular topology and logarithmic delay. However, building parallel-prefix adders are barely discussed in literature. This work puts emphasis on how to build prefix trees and simple algorithms for building these architectures. One particular modification of adders is for use with modulo arithmetic. The most common type of modulo adders are modulo 2n -1 and modulo 2n + 1 adders because they have a common base that is a power of 2. In order to improve their speed, parallel-prefix structures can also be employed for modulo 2n +- 1 adders. This dissertation presents the formation of several binary and modulo prefix architectures and their modifications using Ling's algorithm. For all binary and modulo adders, both algorithmic and quantitative analysis are provided to compare the performance of different architectures. Furthermore, to see how process impact the design, three technologies, from deep submicron to nanometer range, are utilized to collect the quantitative data

SHAREOK repository

Emerging Design Methodology And Its Implementation Through Rns And Qca

Author: Dajani Omar
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2013
Field of study

Digital logic technology has been changing dramatically from integrated circuits, to a Very Large Scale Integrated circuits (VLSI) and to a nanotechnology logic circuits. Research focused on increasing the speed and reducing the size of the circuit design. Residue Number System (RNS) architecture has ability to support high speed concurrent arithmetic applications. To reduce the size, Quantum-Dot Cellular Automata (QCA) has become one of the new nanotechnology research field and has received a lot of attention within the engineering community due to its small size and ultralow power. In the last decade, residue number system has received increased attention due to its ability to support high speed concurrent arithmetic applications such as Fast Fourier Transform (FFT), image processing and digital filters utilizing the efficiencies of RNS arithmetic in addition and multiplication. In spite of its effectiveness, RNS has remained more an academic challenge and has very little impact in practical applications due to the complexity involved in the conversion process, magnitude comparison, overflow detection, sign detection, parity detection, scaling and division. The advancements in very large scale integration technology and demand for parallelism computation have enabled researchers to consider RNS as an alternative approach to high speed concurrent arithmetic. Novel parallel - prefix structure binary to residue number system conversion method and RNS novel scaling method are presented in this thesis. Quantum-dot cellular automata has become one of the new nanotechnology research field and has received a lot of attention within engineering community due to its extremely small feature size and ultralow power consumption compared to COMS technology. Novel methodology for generating QCA Boolean circuits from multi-output Boolean circuits is presented. Our methodology takes as its input a Boolean circuit, generates simplified XOR-AND equivalent circuit and output an equivalent majority gate circuits. During the past decade, quantum-dot cellular automata showed the ability to implement both combinational and sequential logic devices. Unlike conventional Boolean AND-OR-NOT based circuits, the fundamental logical device in QCA Boolean networks is majority gate. With combining these QCA gates with NOT gates any combinational or sequential logical device can be constructed from QCA cells. We present an implementation of generalized pipeline cellular array using quantum-dot cellular automata cells. The proposed QCA pipeline array can perform all basic operations such as multiplication, division, squaring and square rooting. The different mode of operations are controlled by a single control line

Digital Commons@Wayne State University