2,044 research outputs found
Application-Specific Number Representation
Reconfigurable devices, such as Field Programmable Gate Arrays (FPGAs), enable application-
specific number representations. Well-known number formats include fixed-point, floating-
point, logarithmic number system (LNS), and residue number system (RNS). Such different
number representations lead to different arithmetic designs and error behaviours, thus produc-
ing implementations with different performance, accuracy, and cost.
To investigate the design options in number representations, the first part of this thesis presents
a platform that enables automated exploration of the number representation design space. The
second part of the thesis shows case studies that optimise the designs for area, latency or
throughput from the perspective of number representations.
Automated design space exploration in the first part addresses the following two major issues:
² Automation requires arithmetic unit generation. This thesis provides optimised
arithmetic library generators for logarithmic and residue arithmetic units, which support
a wide range of bit widths and achieve significant improvement over previous designs.
² Generation of arithmetic units requires specifying the bit widths for each
variable. This thesis describes an automatic bit-width optimisation tool called R-Tool,
which combines dynamic and static analysis methods, and supports different number
systems (fixed-point, floating-point, and LNS numbers).
Putting it all together, the second part explores the effects of application-specific number
representation on practical benchmarks, such as radiative Monte Carlo simulation, and seismic
imaging computations. Experimental results show that customising the number representations
brings benefits to hardware implementations: by selecting a more appropriate number format,
we can reduce the area cost by up to 73.5% and improve the throughput by 14.2% to 34.1%; by
performing the bit-width optimisation, we can further reduce the area cost by 9.7% to 17.3%.
On the performance side, hardware implementations with customised number formats achieve
5 to potentially over 40 times speedup over software implementations
Fast scaling in the residue number system
Copyright © 2009 IEEEA new scheme for precisely scaling numbers in the residue number system (RNS) is presented. The scale factor K can be any number coprime to the RNS moduli. Lookup table implementations are used as a basis for comparisons between the new scheme and scaling schemes from the literature. It is shown that new scheme decreases hardware complexity compared to previous schemes without affecting time complexity.Yinan Kong and Braden Phillip
Towards the AlexNet Moment for Homomorphic Encryption: HCNN, theFirst Homomorphic CNN on Encrypted Data with GPUs
Deep Learning as a Service (DLaaS) stands as a promising solution for
cloud-based inference applications. In this setting, the cloud has a
pre-learned model whereas the user has samples on which she wants to run the
model. The biggest concern with DLaaS is user privacy if the input samples are
sensitive data. We provide here an efficient privacy-preserving system by
employing high-end technologies such as Fully Homomorphic Encryption (FHE),
Convolutional Neural Networks (CNNs) and Graphics Processing Units (GPUs). FHE,
with its widely-known feature of computing on encrypted data, empowers a wide
range of privacy-concerned applications. This comes at high cost as it requires
enormous computing power. In this paper, we show how to accelerate the
performance of running CNNs on encrypted data with GPUs. We evaluated two CNNs
to classify homomorphically the MNIST and CIFAR-10 datasets. Our solution
achieved a sufficient security level (> 80 bit) and reasonable classification
accuracy (99%) and (77.55%) for MNIST and CIFAR-10, respectively. In terms of
latency, we could classify an image in 5.16 seconds and 304.43 seconds for
MNIST and CIFAR-10, respectively. Our system can also classify a batch of
images (> 8,000) without extra overhead
Number Systems for Deep Neural Network Architectures: A Survey
Deep neural networks (DNNs) have become an enabling component for a myriad of
artificial intelligence applications. DNNs have shown sometimes superior
performance, even compared to humans, in cases such as self-driving, health
applications, etc. Because of their computational complexity, deploying DNNs in
resource-constrained devices still faces many challenges related to computing
complexity, energy efficiency, latency, and cost. To this end, several research
directions are being pursued by both academia and industry to accelerate and
efficiently implement DNNs. One important direction is determining the
appropriate data representation for the massive amount of data involved in DNN
processing. Using conventional number systems has been found to be sub-optimal
for DNNs. Alternatively, a great body of research focuses on exploring suitable
number systems. This article aims to provide a comprehensive survey and
discussion about alternative number systems for more efficient representations
of DNN data. Various number systems (conventional/unconventional) exploited for
DNNs are discussed. The impact of these number systems on the performance and
hardware design of DNNs is considered. In addition, this paper highlights the
challenges associated with each number system and various solutions that are
proposed for addressing them. The reader will be able to understand the
importance of an efficient number system for DNN, learn about the widely used
number systems for DNN, understand the trade-offs between various number
systems, and consider various design aspects that affect the impact of number
systems on DNN performance. In addition, the recent trends and related research
opportunities will be highlightedComment: 28 page
TOA Estimation of Chirp Signal in Dense Multipath Environment for Low-Cost Acoustic Ranging
In this paper, a novel time of arrival (TOA) estimation method is proposed based on an iterative cleaning process to extract the first path signal. The purpose is to address the challenge in dense multipath indoor environments that the power of the first path component is normally smaller than other multipath components, where the traditional match filtering (MF)-based TOA estimator causes huge errors. Along with parameter estimation, the proposed process is trying to detect and extract the first path component by eliminating the strongest multipath component using a band-elimination filter in fractional Fourier domain at each iterative procedure. To further improve the stability, a slack threshold and a strict threshold are introduced. Six simple and easily calculated termination criteria are proposed to monitor the iterative process. When the iterative 'cleaning' process is done, the outputs include the enhanced first path component and its estimated parameters. Based on these outputs, an optimal reference signal for the MF estimator can be constructed, and a more accurate TOA estimation can be conveniently obtained. The results from numerical simulations and experimental investigations verified that, for acoustic chirp signal TOA estimation, the accuracy of the proposed method is superior to those obtained by the conventional MF estimators
Residue Number System Based Building Blocks for Applications in Digital Signal Processing
Předkládaná disertační práce se zabývá návrhem základních bloků v systému zbytkových tříd pro zvýšení výkonu aplikací určených pro digitální zpracování signálů (DSP). Systém zbytkových tříd (RNS) je neváhová číselná soustava, jež umožňuje provádět paralelizovatelné, vysokorychlostní, bezpečné a proti chybám odolné aritmetické operace, které jsou zpracovávány bez přenosu mezi řády. Tyto vlastnosti jej činí značně perspektivním pro použití v DSP aplikacích náročných na výpočetní výkon a odolných proti chybám. Typický RNS systém se skládá ze tří hlavních částí: převodníku z binárního kódu do RNS, který počítá ekvivalent vstupních binárních hodnot v systému zbytkových tříd, dále jsou to paralelně řazené RNS aritmetické jednotky, které provádějí aritmetické operace s operandy již převedenými do RNS. Poslední část pak tvoří převodník z RNS do binárního kódu, který převádí výsledek zpět do výchozího binárního kódu. Hlavním cílem této disertační práce bylo navrhnout nové struktury základních bloků výše zmiňovaného systému zbytkových tříd, které mohou být využity v aplikacích DSP. Tato disertační práce předkládá zlepšení a návrhy nových struktur komponent RNS, simulaci a také ověření jejich funkčnosti prostřednictvím implementace v obvodech FPGA. Kromě návrhů nové struktury základních komponentů RNS je prezentován také podrobný výzkum různých sad modulů, který je srovnává a determinuje nejefektivnější sadu pro různé dynamické rozsahy. Dalším z klíčových přínosů disertační práce je objevení a ověření podmínky určující výběr optimální sady modulů, která umožňuje zvýšit výkonnost aplikací DSP. Dále byla navržena aplikace pro zpracování obrazu využívající RNS, která má vůči klasické binární implementanci nižší spotřebu a vyšší maximální pracovní frekvenci. V závěru práce byla vyhodnocena hlavní kritéria při rozhodování, zda je vhodnější pro danou aplikaci využít binární číselnou soustavu nebo RNS.This doctoral thesis deals with designing residue number system based building blocks to enhance the performance of digital signal processing applications. The residue number system (RNS) is a non-weighted number system that provides carry-free, parallel, high speed, secure and fault tolerant arithmetic operations. These features make it very attractive to be used in high-performance and fault tolerant digital signal processing (DSP) applications. A typical RNS system consists of three main components; the first one is the binary to residue converter that computes the RNS equivalent of the inputs represented in the binary number system. The second component in this system is parallel residue arithmetic units that perform arithmetic operations on the operands already represented in RNS. The last component is the residue to binary converter, which converts the outputs back into their binary representation. The main aim of this thesis was to propose novel structures of the basic components of this system in order to be later used as fundamental units in DSP applications. This thesis encloses improving and designing novel structures of these components, simulating and verifying their efficiency via FPGA implementation. In addition to suggesting novel structures of basic RNS components, a detailed study on different moduli sets that compares and determines the most efficient one for different dynamic range requirements is also presented. One of the main outcomes of this thesis is concluding and verifying the main condition that should be met when choosing a moduli set, in order to improve the timing performance of a DSP application. An RNS-based image processing application is also proposed. Its efficiency, in terms of timing performance and power consumption, is proved via comparing it with a binary-based one. Finally, the main considerations that should be taken into account when choosing to use the binary number system or RNS are also discussed in details.
Gaussian Message Passing for Overloaded Massive MIMO-NOMA
This paper considers a low-complexity Gaussian Message Passing (GMP) scheme
for a coded massive Multiple-Input Multiple-Output (MIMO) systems with
Non-Orthogonal Multiple Access (massive MIMO-NOMA), in which a base station
with antennas serves sources simultaneously in the same frequency.
Both and are large numbers, and we consider the overloaded cases
with . The GMP for MIMO-NOMA is a message passing algorithm operating
on a fully-connected loopy factor graph, which is well understood to fail to
converge due to the correlation problem. In this paper, we utilize the
large-scale property of the system to simplify the convergence analysis of the
GMP under the overloaded condition. First, we prove that the \emph{variances}
of the GMP definitely converge to the mean square error (MSE) of Linear Minimum
Mean Square Error (LMMSE) multi-user detection. Secondly, the \emph{means} of
the traditional GMP will fail to converge when . Therefore, we propose and derive a new
convergent GMP called scale-and-add GMP (SA-GMP), which always converges to the
LMMSE multi-user detection performance for any , and show that it
has a faster convergence speed than the traditional GMP with the same
complexity. Finally, numerical results are provided to verify the validity and
accuracy of the theoretical results presented.Comment: Accepted by IEEE TWC, 16 pages, 11 figure
Obstacle avoidance and distance measurement for unmanned aerial vehicles using monocular vision
Unmanned Aerial Vehicles or commonly known as drones are better suited for "dull, dirty, or dangerous" missions than manned aircraft. The drone can be either remotely controlled or it can travel as per predefined path using complex automation algorithm built during its development. In general, Unmanned Aerial Vehicle (UAV) is the combination of Drone in the air and control system on the ground. Design of an UAV means integrating hardware, software, sensors, actuators, communication systems and payloads into a single unit for the application involved. To make it completely autonomous, the most challenging problem faced by UAVs is obstacle avoidance. In this paper, a novel method to detect frontal obstacles using monocular camera is proposed. Computer Vision algorithms like Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Feature (SURF) are used to detect frontal obstacles and then distance of the obstacle from camera is calculated. To meet the defined objectives, designed system is tested with self-developed videos which are captured by DJI Phantom 4 pro
Searching for continuous gravitational wave sources in binary systems
We consider the problem of searching for continuous gravitational wave
sources orbiting a companion object. This issue is of particular interest
because the LMXB's, and among them Sco X-1, might be marginally detectable with
2 years coherent observation time by the Earth-based laser interferometers
expected to come on line by 2002, and clearly observable by the second
generation of detectors. Moreover, several radio pulsars, which could be deemed
to be CW sources, are found to orbit a companion star or planet, and the
LIGO/VIRGO/GEO network plans to continuously monitor such systems. We estimate
the computational costs for a search launched over the additional five
parameters describing generic elliptical orbits using match filtering
techniques. These techniques provide the optimal signal-to-noise ratio and also
a very clear and transparent theoretical framework. We provide ready-to-use
analytical expressions for the number of templates required to carry out the
searches in the astrophysically relevant regions of the parameter space, and
how the computational cost scales with the ranges of the parameters. We also
determine the critical accuracy to which a particular parameter must be known,
so that no search is needed for it. In order to disentangle the computational
burden involved in the orbital motion of the CW source, from the other source
parameters (position in the sky and spin-down), and reduce the complexity of
the analysis, we assume that the source is monochromatic and its location in
the sky is exactly known. The orbital elements, on the other hand, are either
assumed to be completely unknown or only partly known. We apply our theoretical
analysis to Sco X-1 and the neutron stars with binary companions which are
listed in the radio pulsar catalogue.Comment: 31 pages, LaTeX, 6 eps figures, submitted to PR
- …