2,044 research outputs found

    Residue Number Systems: a Survey

    Get PDF

    Application-Specific Number Representation

    No full text
    Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application- specific number representations. Well-known number formats include fixed-point, floating- point, logarithmic number system (LNS), and residue number system (RNS). Such different number representations lead to different arithmetic designs and error behaviours, thus produc- ing implementations with different performance, accuracy, and cost. To investigate the design options in number representations, the first part of this thesis presents a platform that enables automated exploration of the number representation design space. The second part of the thesis shows case studies that optimise the designs for area, latency or throughput from the perspective of number representations. Automated design space exploration in the first part addresses the following two major issues: ² Automation requires arithmetic unit generation. This thesis provides optimised arithmetic library generators for logarithmic and residue arithmetic units, which support a wide range of bit widths and achieve significant improvement over previous designs. ² Generation of arithmetic units requires specifying the bit widths for each variable. This thesis describes an automatic bit-width optimisation tool called R-Tool, which combines dynamic and static analysis methods, and supports different number systems (fixed-point, floating-point, and LNS numbers). Putting it all together, the second part explores the effects of application-specific number representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic imaging computations. Experimental results show that customising the number representations brings benefits to hardware implementations: by selecting a more appropriate number format, we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%. On the performance side, hardware implementations with customised number formats achieve 5 to potentially over 40 times speedup over software implementations

    Fast scaling in the residue number system

    Get PDF
    Copyright © 2009 IEEEA new scheme for precisely scaling numbers in the residue number system (RNS) is presented. The scale factor K can be any number coprime to the RNS moduli. Lookup table implementations are used as a basis for comparisons between the new scheme and scaling schemes from the literature. It is shown that new scheme decreases hardware complexity compared to previous schemes without affecting time complexity.Yinan Kong and Braden Phillip

    Towards the AlexNet Moment for Homomorphic Encryption: HCNN, theFirst Homomorphic CNN on Encrypted Data with GPUs

    Get PDF
    Deep Learning as a Service (DLaaS) stands as a promising solution for cloud-based inference applications. In this setting, the cloud has a pre-learned model whereas the user has samples on which she wants to run the model. The biggest concern with DLaaS is user privacy if the input samples are sensitive data. We provide here an efficient privacy-preserving system by employing high-end technologies such as Fully Homomorphic Encryption (FHE), Convolutional Neural Networks (CNNs) and Graphics Processing Units (GPUs). FHE, with its widely-known feature of computing on encrypted data, empowers a wide range of privacy-concerned applications. This comes at high cost as it requires enormous computing power. In this paper, we show how to accelerate the performance of running CNNs on encrypted data with GPUs. We evaluated two CNNs to classify homomorphically the MNIST and CIFAR-10 datasets. Our solution achieved a sufficient security level (> 80 bit) and reasonable classification accuracy (99%) and (77.55%) for MNIST and CIFAR-10, respectively. In terms of latency, we could classify an image in 5.16 seconds and 304.43 seconds for MNIST and CIFAR-10, respectively. Our system can also classify a batch of images (> 8,000) without extra overhead

    Number Systems for Deep Neural Network Architectures: A Survey

    Full text link
    Deep neural networks (DNNs) have become an enabling component for a myriad of artificial intelligence applications. DNNs have shown sometimes superior performance, even compared to humans, in cases such as self-driving, health applications, etc. Because of their computational complexity, deploying DNNs in resource-constrained devices still faces many challenges related to computing complexity, energy efficiency, latency, and cost. To this end, several research directions are being pursued by both academia and industry to accelerate and efficiently implement DNNs. One important direction is determining the appropriate data representation for the massive amount of data involved in DNN processing. Using conventional number systems has been found to be sub-optimal for DNNs. Alternatively, a great body of research focuses on exploring suitable number systems. This article aims to provide a comprehensive survey and discussion about alternative number systems for more efficient representations of DNN data. Various number systems (conventional/unconventional) exploited for DNNs are discussed. The impact of these number systems on the performance and hardware design of DNNs is considered. In addition, this paper highlights the challenges associated with each number system and various solutions that are proposed for addressing them. The reader will be able to understand the importance of an efficient number system for DNN, learn about the widely used number systems for DNN, understand the trade-offs between various number systems, and consider various design aspects that affect the impact of number systems on DNN performance. In addition, the recent trends and related research opportunities will be highlightedComment: 28 page

    TOA Estimation of Chirp Signal in Dense Multipath Environment for Low-Cost Acoustic Ranging

    Get PDF
    In this paper, a novel time of arrival (TOA) estimation method is proposed based on an iterative cleaning process to extract the first path signal. The purpose is to address the challenge in dense multipath indoor environments that the power of the first path component is normally smaller than other multipath components, where the traditional match filtering (MF)-based TOA estimator causes huge errors. Along with parameter estimation, the proposed process is trying to detect and extract the first path component by eliminating the strongest multipath component using a band-elimination filter in fractional Fourier domain at each iterative procedure. To further improve the stability, a slack threshold and a strict threshold are introduced. Six simple and easily calculated termination criteria are proposed to monitor the iterative process. When the iterative 'cleaning' process is done, the outputs include the enhanced first path component and its estimated parameters. Based on these outputs, an optimal reference signal for the MF estimator can be constructed, and a more accurate TOA estimation can be conveniently obtained. The results from numerical simulations and experimental investigations verified that, for acoustic chirp signal TOA estimation, the accuracy of the proposed method is superior to those obtained by the conventional MF estimators

    Residue Number System Based Building Blocks for Applications in Digital Signal Processing

    Get PDF
    Předkládaná disertační práce se zabývá návrhem základních bloků v systému zbytkových tříd pro zvýšení výkonu aplikací určených pro digitální zpracování signálů (DSP). Systém zbytkových tříd (RNS) je neváhová číselná soustava, jež umožňuje provádět paralelizovatelné, vysokorychlostní, bezpečné a proti chybám odolné aritmetické operace, které jsou zpracovávány bez přenosu mezi řády. Tyto vlastnosti jej činí značně perspektivním pro použití v DSP aplikacích náročných na výpočetní výkon a odolných proti chybám. Typický RNS systém se skládá ze tří hlavních částí: převodníku z binárního kódu do RNS, který počítá ekvivalent vstupních binárních hodnot v systému zbytkových tříd, dále jsou to paralelně řazené RNS aritmetické jednotky, které provádějí aritmetické operace s operandy již převedenými do RNS. Poslední část pak tvoří převodník z RNS do binárního kódu, který převádí výsledek zpět do výchozího binárního kódu. Hlavním cílem této disertační práce bylo navrhnout nové struktury základních bloků výše zmiňovaného systému zbytkových tříd, které mohou být využity v aplikacích DSP. Tato disertační práce předkládá zlepšení a návrhy nových struktur komponent RNS, simulaci a také ověření jejich funkčnosti prostřednictvím implementace v obvodech FPGA. Kromě návrhů nové struktury základních komponentů RNS je prezentován také podrobný výzkum různých sad modulů, který je srovnává a determinuje nejefektivnější sadu pro různé dynamické rozsahy. Dalším z klíčových přínosů disertační práce je objevení a ověření podmínky určující výběr optimální sady modulů, která umožňuje zvýšit výkonnost aplikací DSP. Dále byla navržena aplikace pro zpracování obrazu využívající RNS, která má vůči klasické binární implementanci nižší spotřebu a vyšší maximální pracovní frekvenci. V závěru práce byla vyhodnocena hlavní kritéria při rozhodování, zda je vhodnější pro danou aplikaci využít binární číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.

    Gaussian Message Passing for Overloaded Massive MIMO-NOMA

    Full text link
    This paper considers a low-complexity Gaussian Message Passing (GMP) scheme for a coded massive Multiple-Input Multiple-Output (MIMO) systems with Non-Orthogonal Multiple Access (massive MIMO-NOMA), in which a base station with NsN_s antennas serves NuN_u sources simultaneously in the same frequency. Both NuN_u and NsN_s are large numbers, and we consider the overloaded cases with Nu>NsN_u>N_s. The GMP for MIMO-NOMA is a message passing algorithm operating on a fully-connected loopy factor graph, which is well understood to fail to converge due to the correlation problem. In this paper, we utilize the large-scale property of the system to simplify the convergence analysis of the GMP under the overloaded condition. First, we prove that the \emph{variances} of the GMP definitely converge to the mean square error (MSE) of Linear Minimum Mean Square Error (LMMSE) multi-user detection. Secondly, the \emph{means} of the traditional GMP will fail to converge when Nu/Ns<(21)25.83 N_u/N_s< (\sqrt{2}-1)^{-2}\approx5.83. Therefore, we propose and derive a new convergent GMP called scale-and-add GMP (SA-GMP), which always converges to the LMMSE multi-user detection performance for any Nu/Ns>1N_u/N_s>1, and show that it has a faster convergence speed than the traditional GMP with the same complexity. Finally, numerical results are provided to verify the validity and accuracy of the theoretical results presented.Comment: Accepted by IEEE TWC, 16 pages, 11 figure

    Obstacle avoidance and distance measurement for unmanned aerial vehicles using monocular vision

    Get PDF
    Unmanned Aerial Vehicles or commonly known as drones are better suited for "dull, dirty, or dangerous" missions than manned aircraft. The drone can be either remotely controlled or it can travel as per predefined path using complex automation algorithm built during its development. In general, Unmanned Aerial Vehicle (UAV) is the combination of Drone in the air and control system on the ground. Design of an UAV means integrating hardware, software, sensors, actuators, communication systems and payloads into a single unit for the application involved. To make it completely autonomous, the most challenging problem faced by UAVs is obstacle avoidance. In this paper, a novel method to detect frontal obstacles using monocular camera is proposed. Computer Vision algorithms like Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Feature (SURF) are used to detect frontal obstacles and then distance of the obstacle from camera is calculated. To meet the defined objectives, designed system is tested with self-developed videos which are captured by DJI Phantom 4 pro

    Searching for continuous gravitational wave sources in binary systems

    Get PDF
    We consider the problem of searching for continuous gravitational wave sources orbiting a companion object. This issue is of particular interest because the LMXB's, and among them Sco X-1, might be marginally detectable with 2 years coherent observation time by the Earth-based laser interferometers expected to come on line by 2002, and clearly observable by the second generation of detectors. Moreover, several radio pulsars, which could be deemed to be CW sources, are found to orbit a companion star or planet, and the LIGO/VIRGO/GEO network plans to continuously monitor such systems. We estimate the computational costs for a search launched over the additional five parameters describing generic elliptical orbits using match filtering techniques. These techniques provide the optimal signal-to-noise ratio and also a very clear and transparent theoretical framework. We provide ready-to-use analytical expressions for the number of templates required to carry out the searches in the astrophysically relevant regions of the parameter space, and how the computational cost scales with the ranges of the parameters. We also determine the critical accuracy to which a particular parameter must be known, so that no search is needed for it. In order to disentangle the computational burden involved in the orbital motion of the CW source, from the other source parameters (position in the sky and spin-down), and reduce the complexity of the analysis, we assume that the source is monochromatic and its location in the sky is exactly known. The orbital elements, on the other hand, are either assumed to be completely unknown or only partly known. We apply our theoretical analysis to Sco X-1 and the neutron stars with binary companions which are listed in the radio pulsar catalogue.Comment: 31 pages, LaTeX, 6 eps figures, submitted to PR
    corecore