1,264 research outputs found
Composite Iterative Algorithm and Architecture for q-th Root Calculation
An algorithm for the q-th root extraction, being q any integer, is presented in this paper. The algorithm is based on an optimized implementation of X^{1/q} by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right carry-free multiplication and (4) on-line exponential. A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q. The execution time and hardware requirements are estimated for single and double precision floating-point computations for several radices; this helps to determine which radices result in the most efficient implementations. The architectures proposed improve the features of other architectures for q-th root extraction.Dans cet article, nous présentons un algorithme matériel pour l'extraction de la racine q-ième d'un nombre X, où q est un entier naturel non nul. Cet algorithme est basé sur une implantation optimisée de la fonction X^{1/q} par une séquence d'opérations parallèles et/ou superposées: (1) réciproque, (2) logarithme chiffre par chiffre, (3) multiplication de gauche-à -droite sans propagation de retenue et (4) exponentielle en ligne. Une analyse détaillée des erreurs et deux architectures sont proposées, pour q de basse précision et pour q de précision plus haute. Le temps d'exécution et les composants matériels à utiliser sont estimés pour des calculs en virgule flottante simple et double précision et pour plusieurs bases. Cette étude aide à déterminer quelles bases mènent aux implantations les plus efficaces. Les architectures proposées améliorent les caractéristiques d'architectures précédentes destinées à l'extraction des racines
A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets
Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system
A study and comparison of COordinate Rotation DIgital Computer (CORDIC) architectures
Most of the digital signal processing applications performs operations like
multiplication, addition, square-root calculation, solving linear equations
etc. The physical implementation of these operations consumes a lot of hardware
and, software implementation consumes large memory. Even if they are
implemented in hardware, they do not provide high speed, and due to this
reason, even today the software implementation dominates hardware. For
realizing operations from basic to very complex ones with less hardware, a
Co-ordinate Rotation Digital Computer (CORDIC) proves beneficial. It is capable
of performing mathematical operations right from addition to highly complex
functions with the help of arithmetic unit and shifters only. This paper gives
a brief overview of various existing CORDIC architectures, their working
principle, application domain and a comparison of these architectures.
Different designs are available as per the target, i.e. high accuracy and
precision, low area, low latency, hardware efficient, low power,
reconfigurability, etc. that can be used as per the application in which the
architecture needs to be employed
Fast architectures for the pairing over small-characteristic supersingular elliptic curves
International audienceThis paper is devoted to the design of fast parallel accelerators for the cryptographic pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of coprocessors for the pairing over \F_{2^m} and \F_{3^m}. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors
Fast Architectures for the Pairing over Small-Characteristic Supersingular Elliptic Curves
This paper is devoted to the design of fast parallel accelerators for the cryptographic pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller\u27s algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller\u27s loop and outline the architecture of coprocessors for the pairing over and . Thanks to a careful choice of algorithms for the tower field arithmetic associated with the pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources.
According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors
Mapping and assessment of tree roots using ground penetrating radar with low-cost GPS
In this paper, we have presented a methodology combining ground penetrating radar (GPR) and a low-cost GPS receiver for three-dimensional detection of tree roots. This research aims to provide an effective and affordable testing tool to assess the root system of a number of trees. For this purpose, a low-cost GPS receiver was used, which recorded the approximate position of each GPR track, collected with a 500 MHz RAMAC shielded antenna. A dedicated post-processing methodology based on the precise position of the satellite data, satellite clock offsets data, and a local reference Global Navigation Satellite System (GNSS) Earth Observation Network System (GEONET) Station close to the survey site was developed. Firstly, the positioning information of local GEONET stations was used to filter out the errors caused by satellite position error, satellite clock offset, and ionosphere. In addition, the advanced Kalman filter was designed to minimise receiver offset and the multipath error, in order to obtain a high precision position of each GPR track. Kirchhoff migration considering near-field effect was used to identify the three-dimensional distribution of the root. In a later stage, a novel processing scheme was used to detect and clearly map the coarse roots of the investigated tree. A successful case study is proposed, which supports the following premise: the current scheme is an affordable and accurate mapping method of the root system architecture
- …