Search CORE

1,264 research outputs found

Composite Iterative Algorithm and Architecture for q-th Root Calculation

Author: Bruguera Javier
Vazquez Alvaro
Publication venue: HAL CCSD
Publication date: 10/03/2011
Field of study

An algorithm for the q-th root extraction, being q any integer, is presented in this paper. The algorithm is based on an optimized implementation of X^{1/q} by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right carry-free multiplication and (4) on-line exponential. A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q. The execution time and hardware requirements are estimated for single and double precision floating-point computations for several radices; this helps to determine which radices result in the most efficient implementations. The architectures proposed improve the features of other architectures for q-th root extraction.Dans cet article, nous présentons un algorithme matériel pour l'extraction de la racine q-ième d'un nombre X, où q est un entier naturel non nul. Cet algorithme est basé sur une implantation optimisée de la fonction X^{1/q} par une séquence d'opérations parallèles et/ou superposées: (1) réciproque, (2) logarithme chiffre par chiffre, (3) multiplication de gauche-à-droite sans propagation de retenue et (4) exponentielle en ligne. Une analyse détaillée des erreurs et deux architectures sont proposées, pour q de basse précision et pour q de précision plus haute. Le temps d'exécution et les composants matériels à utiliser sont estimés pour des calculs en virgule flottante simple et double précision et pour plusieurs bases. Cette étude aide à déterminer quelles bases mènent aux implantations les plus efficaces. Les architectures proposées améliorent les caractéristiques d'architectures précédentes destinées à l'extraction des racines

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

A NoC-based hybrid message-passing/shared-memory approach to CMP design

Author: Agarwal
Daemen
Forsell
Grecu
Karniadakis
Lorensen
Mario R. Casu
Massimo Ruo Roch
Maurizio Zamboni
Owens
Paulin
Radulescu
Sergio V. Tota
Snir
Tota
Publication venue: Elsevier
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Author: Benini L.
Gurkaynak F.K.
Schaffner M.
Schuiki F.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2019
Field of study

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

A study and comparison of COordinate Rotation DIgital Computer (CORDIC) architectures

Author: Nawandar Neha K
Satpute Vishal R
Publication venue
Publication date: 08/11/2022
Field of study

Most of the digital signal processing applications performs operations like multiplication, addition, square-root calculation, solving linear equations etc. The physical implementation of these operations consumes a lot of hardware and, software implementation consumes large memory. Even if they are implemented in hardware, they do not provide high speed, and due to this reason, even today the software implementation dominates hardware. For realizing operations from basic to very complex ones with less hardware, a Co-ordinate Rotation Digital Computer (CORDIC) proves beneficial. It is capable of performing mathematical operations right from addition to highly complex functions with the help of arithmetic unit and shifters only. This paper gives a brief overview of various existing CORDIC architectures, their working principle, application domain and a comparison of these architectures. Different designs are available as per the target, i.e. high accuracy and precision, low area, low latency, hardware efficient, low power, reconfigurability, etc. that can be used as per the application in which the architecture needs to be employed

arXiv.org e-Print Archive

Fast architectures for the $\eta_T$ pairing over small-characteristic supersingular elliptic curves

Author: Beuchat Jean-Luc
Detrey Jérémie
Estibals Nicolas
Okamoto Eiji
Rodríguez-Henríquez Francisco
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2011
Field of study

International audienceThis paper is devoted to the design of fast parallel accelerators for the cryptographic

\eta_T

pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of coprocessors for the

\eta_T

pairing over \F_{2^m} and \F_{3^m}. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the

\eta_T

pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors

INRIA a CCSD electronic archive server

Fast Architectures for the $\eta_T$ Pairing over Small-Characteristic Supersingular Elliptic Curves

Author: Eiji Okamoto
Francisco Rodríguez-Henríquez
Jean-Luc Beuchat
Jérémie Detrey
Nicolas Estibals
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 19/08/2009
Field of study

This paper is devoted to the design of fast parallel accelerators for the cryptographic

\eta_T

pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller\u27s algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller\u27s loop and outline the architecture of coprocessors for the

\eta_T

pairing over

\mathbb{F}_{2^m}

and

\mathbb{F}_{3^m}

. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the

\eta_T

Cryptology ePrint Archive

Mapping and assessment of tree roots using ground penetrating radar with low-cost GPS

Author: Alani Amir
Giannakis Iraklis
Sato Motoyuki
Tosti Fabio
Wang Yan
Zou Lilong
Publication venue: 'MDPI AG'
Publication date: 20/04/2020
Field of study

In this paper, we have presented a methodology combining ground penetrating radar (GPR) and a low-cost GPS receiver for three-dimensional detection of tree roots. This research aims to provide an effective and affordable testing tool to assess the root system of a number of trees. For this purpose, a low-cost GPS receiver was used, which recorded the approximate position of each GPR track, collected with a 500 MHz RAMAC shielded antenna. A dedicated post-processing methodology based on the precise position of the satellite data, satellite clock offsets data, and a local reference Global Navigation Satellite System (GNSS) Earth Observation Network System (GEONET) Station close to the survey site was developed. Firstly, the positioning information of local GEONET stations was used to filter out the errors caused by satellite position error, satellite clock offset, and ionosphere. In addition, the advanced Kalman filter was designed to minimise receiver offset and the multipath error, in order to obtain a high precision position of each GPR track. Kirchhoff migration considering near-field effect was used to identify the three-dimensional distribution of the root. In a later stage, a novel processing scheme was used to detect and clearly map the coarse roots of the investigated tree. A successful case study is proposed, which supports the following premise: the current scheme is an affordable and accurate mapping method of the root system architecture

UWL Repository

Kingston University Research Repository