6,172 research outputs found
Hardware Implementation of the GPS authentication
In this paper, we explore new area/throughput trade- offs for the Girault,
Poupard and Stern authentication protocol (GPS). This authentication protocol
was selected in the NESSIE competition and is even part of the standard ISO/IEC
9798. The originality of our work comes from the fact that we exploit a fixed
key to increase the throughput. It leads us to implement GPS using the Chapman
constant multiplier. This parallel implementation is 40 times faster but 10
times bigger than the reference serial one. We propose to serialize this
multiplier to reduce its area at the cost of lower throughput. Our hybrid
Chapman's multiplier is 8 times faster but only twice bigger than the
reference. Results presented here allow designers to adapt the performance of
GPS authentication to their hardware resources. The complete GPS prover side is
also integrated in the network stack of the PowWow sensor which contains an
Actel IGLOO AGL250 FPGA as a proof of concept.Comment: ReConFig - International Conference on ReConFigurable Computing and
FPGAs (2012
Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture
This paper describes a low-power processor tailored for fast Fourier
transform computations where transport triggering template is exploited. The
processor is software-programmable while retaining an energy-efficiency
comparable to existing fixed-function implementations. The power savings are
achieved by compressing the computation kernel into one instruction word. The
word is stored in an instruction loop buffer, which is more power-efficient
than regular instruction memory storage. The processor supports all
power-of-two FFT sizes from 64 to 16384 and given 1 mJ of energy, it can
compute 20916 transforms of size 1024.Comment: 5 pages, 4 figures, 1 table, ICASSP 2019 conferenc
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Improved Memoryless RNS Forward Converter Based on the Periodicity of Residues
The residue number system (RNS) is suitable for DSP architectures because of its ability to perform fast carry-free arithmetic. However, this advantage is over-shadowed by the complexity involved in the conversion of numbers between binary and RNS representations. Although the reverse conversion (RNS to binary) is more complex, the forward transformation is not simple either. Most forward converters make use of look-up tables (memory). Recently, a memoryless forward converter architecture for arbitrary moduli sets was proposed by Premkumar in 2002. In this paper, we present an extension to that architecture which results in 44% less hardware for parallel conversion and achieves 43% improvement in speed for serial conversions. It makes use of the periodicity properties of residues obtained using modular exponentiation
Recommended from our members
Two-dimensional DCT/IDCT architecture
A fully parallel architecture for the computation of a two-dimensional (2-D) discrete cosine transform (DCT), based on row-column decomposition is presented. It uses the same one dimensional (1-D) DCT unit for the row and column computations and (N2+N) registers to perform the transposition. It possesses features of regularity and modularity, and is thus well suited for VLSI implementation. It can be used for the computation of either the forward or the inverse 2-D DCT. Each 1-D DCT unit uses N fully parallel vector inner product (VIP) units. The design of the VIP units is based on a systematic design methodology using radix-2â arithmetic, which allows partitioning of the elements of each vector into small groups. Array multipliers without the final adder are used to produce the different partial product terms. This allows a more efficient use of 4:2 compressors for the accumulation of the products in the intermediate stages and reduces the number of accumulators from N to one. Using this procedure, the 2-D DCT architecture requires less than N2 multipliers (in terms of area occupied) and only 2N adders. It can compute a N x N-point DCT at a rate of one complete transform per N cycles after an appropriate initial delay
Computational Modalities of Belousov-Zhabotinsky Encapsulated Vesicles
We present both simulated and partial empirical evidence for the
computational utility of many connected vesicle analogs of an encapsulated
non-linear chemical processing medium. By connecting small vesicles containing
a solution of sub-excitable Belousov-Zhabotinsky (BZ) reaction, sustained and
propagating wave fragments are modulated by both spatial geometry, network
connectivity and their interaction with other waves. The processing ability is
demonstrated through the creation of simple Boolean logic gates and then by the
combination of those gates to create more complex circuits
Weighted p-bits for FPGA implementation of probabilistic circuits
Probabilistic spin logic (PSL) is a recently proposed computing paradigm
based on unstable stochastic units called probabilistic bits (p-bits) that can
be correlated to form probabilistic circuits (p-circuits). These p-circuits can
be used to solve problems of optimization, inference and also to implement
precise Boolean functions in an "inverted" mode, where a given Boolean circuit
can operate in reverse to find the input combinations that are consistent with
a given output. In this paper we present a scalable FPGA implementation of such
invertible p-circuits. We implement a "weighted" p-bit that combines stochastic
units with localized memory structures. We also present a generalized tile of
weighted p-bits to which a large class of problems beyond invertible Boolean
logic can be mapped, and how invertibility can be applied to interesting
problems such as the NP-complete Subset Sum Problem by solving a small instance
of this problem in hardware
Fast Quantum Modular Exponentiation
We present a detailed analysis of the impact on modular exponentiation of
architectural features and possible concurrent gate execution. Various
arithmetic algorithms are evaluated for execution time, potential concurrency,
and space tradeoffs. We find that, to exponentiate an n-bit number, for storage
space 100n (twenty times the minimum 5n), we can execute modular exponentiation
two hundred to seven hundred times faster than optimized versions of the basic
algorithms, depending on architecture, for n=128. Addition on a neighbor-only
architecture is limited to O(n) time when non-neighbor architectures can reach
O(log n), demonstrating that physical characteristics of a computing device
have an important impact on both real-world running time and asymptotic
behavior. Our results will help guide experimental implementations of quantum
algorithms and devices.Comment: to appear in PRA 71(5); RevTeX, 12 pages, 12 figures; v2 revision is
substantial, with new algorithmic variants, much shorter and clearer text,
and revised equation formattin
- âŠ