1,237 research outputs found
Hardware Implementation of the GPS authentication
In this paper, we explore new area/throughput trade- offs for the Girault,
Poupard and Stern authentication protocol (GPS). This authentication protocol
was selected in the NESSIE competition and is even part of the standard ISO/IEC
9798. The originality of our work comes from the fact that we exploit a fixed
key to increase the throughput. It leads us to implement GPS using the Chapman
constant multiplier. This parallel implementation is 40 times faster but 10
times bigger than the reference serial one. We propose to serialize this
multiplier to reduce its area at the cost of lower throughput. Our hybrid
Chapman's multiplier is 8 times faster but only twice bigger than the
reference. Results presented here allow designers to adapt the performance of
GPS authentication to their hardware resources. The complete GPS prover side is
also integrated in the network stack of the PowWow sensor which contains an
Actel IGLOO AGL250 FPGA as a proof of concept.Comment: ReConFig - International Conference on ReConFigurable Computing and
FPGAs (2012
Digital implementation of the cellular sensor-computers
Two different kinds of cellular sensor-processor architectures are used nowadays in various
applications. The first is the traditional sensor-processor architecture, where the sensor and the
processor arrays are mapped into each other. The second is the foveal architecture, in which a
small active fovea is navigating in a large sensor array. This second architecture is introduced
and compared here. Both of these architectures can be implemented with analog and digital
processor arrays. The efficiency of the different implementation types, depending on the used
CMOS technology, is analyzed. It turned out, that the finer the technology is, the better to use
digital implementation rather than analog
Design and implementation of DA FIR filter for bio-inspired computing architecture
This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability
LUT-NN: Empower Efficient Neural Network Inference with Centroid Learning and Table Lookup
On-device Deep Neural Network (DNN) inference consumes significant computing
resources and development efforts. To alleviate that, we propose LUT-NN, the
first system to empower inference by table lookup, to reduce inference cost.
LUT-NN learns the typical features for each operator, named centroid, and
precompute the results for these centroids to save in lookup tables. During
inference, the results of the closest centroids with the inputs can be read
directly from the table, as the approximated outputs without computations.
LUT-NN integrates two major novel techniques: (1) differentiable centroid
learning through backpropagation, which adapts three levels of approximation to
minimize the accuracy impact by centroids; (2) table lookup inference
execution, which comprehensively considers different levels of parallelism,
memory access reduction, and dedicated hardware units for optimal performance.
LUT-NN is evaluated on multiple real tasks, covering image and speech
recognition, and nature language processing. Compared to related work, LUT-NN
improves accuracy by 66% to 92%, achieving similar level with the original
models. LUT-NN reduces the cost at all dimensions, including FLOPs (
16x), model size ( 7x), latency ( 6.8x), memory ( 6.5x), and
power ( 41.7%)
Time-Precision Flexible Arithmetic Unit
Paper submitted to the XVIII Conference on Design of Circuits and Integrated Systems (DCIS), Ciudad Real, España, 2003.A new conception of flexible calculation that allows us to adjust an operation depending on the available time computation is presented. The proposed arithmetic unit is based on this principle. It contains a control operation module that determines the process time of each calculation. The operation method design uses precalculated data stored in look-up tables, which provide, above all, quality results and systematization in the implementation of low level primitives that set parameters for the processing time. We report an evaluation of the architecture in area, delay and computation error, as well as a suitable implementation in FPGA to validate the design.This work is being backed by grant DPI2002-04434-C04-01 from the Ministerio de Ciencia y TecnologĂa of the Spanish Government
Efficient hardware implementations of low bit depth motion estimation algorithms
In this paper, we present efficient hardware implementation of multiplication free one-bit transform (MF1BT) based and constraint one-bit transform (C-1BT) based motion estimation (ME) algorithms, in order to provide low bit-depth representation based full search block ME hardware for real-time video encoding. We used a source pixel based linear array (SPBLA) hardware architecture for low bit depth ME for the first time in the literature. The proposed SPBLA based implementation results in a genuine data flow scheme which significantly reduces the number of data reads from the current block memory, which in turn reduces the power consumption by at least 50% compared to conventional 1BT based ME hardware architecture presented in the literature. Because of the binary nature of low bit-depth ME algorithms, their hardware architectures are more efficient than existing 8 bits/pixel representation based ME architectures
- âŠ