1,264 research outputs found

    Composite Iterative Algorithm and Architecture for q-th Root Calculation

    Get PDF
    An algorithm for the q-th root extraction, being q any integer, is presented in this paper. The algorithm is based on an optimized implementation of X^{1/q} by a sequence of parallel and/or overlapped operations: (1) reciprocal, (2) digit-recurrence logarithm, (3) left-to-right carry-free multiplication and (4) on-line exponential. A detailed error analysis and two architectures are proposed, for low precision q and for higher precision q. The execution time and hardware requirements are estimated for single and double precision floating-point computations for several radices; this helps to determine which radices result in the most efficient implementations. The architectures proposed improve the features of other architectures for q-th root extraction.Dans cet article, nous présentons un algorithme matériel pour l'extraction de la racine q-ième d'un nombre X, où q est un entier naturel non nul. Cet algorithme est basé sur une implantation optimisée de la fonction X^{1/q} par une séquence d'opérations parallèles et/ou superposées: (1) réciproque, (2) logarithme chiffre par chiffre, (3) multiplication de gauche-à-droite sans propagation de retenue et (4) exponentielle en ligne. Une analyse détaillée des erreurs et deux architectures sont proposées, pour q de basse précision et pour q de précision plus haute. Le temps d'exécution et les composants matériels à utiliser sont estimés pour des calculs en virgule flottante simple et double précision et pour plusieurs bases. Cette étude aide à déterminer quelles bases mènent aux implantations les plus efficaces. Les architectures proposées améliorent les caractéristiques d'architectures précédentes destinées à l'extraction des racines

    A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

    Get PDF
    Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far. Based on an in-depth analysis of the key computational patterns in state-of-the-art gradient-based training methods, we propose an efficient near-memory acceleration engine called NTX that can be used to train state-of-the-art deep convolutional neural networks at scale. Our main contributions are: (i) a loose coupling of RISC-V cores and NTX co-processors reducing offloading overhead by 7 x over previously published results; (ii) an optimized IEEE 754 compliant data path for fast high-precision convolutions and gradient propagation; (iii) evaluation of near-memory computing with NTX embedded into residual area on the Logic Base die of a Hybrid Memory Cube; and (iv) a scaling analysis to meshes of HMCs in a data center scenario. We demonstrate a 2.7 x energy efficiency improvement of NTX over contemporary GPUs at 4.4 x less silicon area, and a compute performance of 1.2 Tflop/s for training large state-of-the-art networks with full floating-point precision. At the data center scale, a mesh of NTX achieves above 95 percent parallel and energy efficiency, while providing 2.1 x energy savings or 3.1 x performance improvement over a GPU-based system

    A study and comparison of COordinate Rotation DIgital Computer (CORDIC) architectures

    Full text link
    Most of the digital signal processing applications performs operations like multiplication, addition, square-root calculation, solving linear equations etc. The physical implementation of these operations consumes a lot of hardware and, software implementation consumes large memory. Even if they are implemented in hardware, they do not provide high speed, and due to this reason, even today the software implementation dominates hardware. For realizing operations from basic to very complex ones with less hardware, a Co-ordinate Rotation Digital Computer (CORDIC) proves beneficial. It is capable of performing mathematical operations right from addition to highly complex functions with the help of arithmetic unit and shifters only. This paper gives a brief overview of various existing CORDIC architectures, their working principle, application domain and a comparison of these architectures. Different designs are available as per the target, i.e. high accuracy and precision, low area, low latency, hardware efficient, low power, reconfigurability, etc. that can be used as per the application in which the architecture needs to be employed

    Fast architectures for the ηT\eta_T pairing over small-characteristic supersingular elliptic curves

    Get PDF
    International audienceThis paper is devoted to the design of fast parallel accelerators for the cryptographic ηT\eta_T pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller's algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller's loop and outline the architecture of coprocessors for the ηT\eta_T pairing over \F_{2^m} and \F_{3^m}. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the ηT\eta_T pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors

    Fast Architectures for the ηT\eta_T Pairing over Small-Characteristic Supersingular Elliptic Curves

    Get PDF
    This paper is devoted to the design of fast parallel accelerators for the cryptographic ηT\eta_T pairing on supersingular elliptic curves over finite fields of characteristics two and three. We propose here a novel hardware implementation of Miller\u27s algorithm based on a parallel pipelined Karatsuba multiplier. After a short description of the strategies we considered to design our multiplier, we point out the intrinsic parallelism of Miller\u27s loop and outline the architecture of coprocessors for the ηT\eta_T pairing over F2m\mathbb{F}_{2^m} and F3m\mathbb{F}_{3^m}. Thanks to a careful choice of algorithms for the tower field arithmetic associated with the ηT\eta_T pairing, we manage to keep the pipelined multiplier at the heart of each coprocessor busy. A final exponentiation is still required to obtain a unique value, which is desirable in most cryptographic protocols. We supplement our pairing accelerators with a coprocessor responsible for this task. An improved exponentiation algorithm allows us to save hardware resources. According to our place-and-route results on Xilinx FPGAs, our designs improve both the computation time and the area-time trade-off compared to previously published coprocessors

    Mapping and assessment of tree roots using ground penetrating radar with low-cost GPS

    Get PDF
    In this paper, we have presented a methodology combining ground penetrating radar (GPR) and a low-cost GPS receiver for three-dimensional detection of tree roots. This research aims to provide an effective and affordable testing tool to assess the root system of a number of trees. For this purpose, a low-cost GPS receiver was used, which recorded the approximate position of each GPR track, collected with a 500 MHz RAMAC shielded antenna. A dedicated post-processing methodology based on the precise position of the satellite data, satellite clock offsets data, and a local reference Global Navigation Satellite System (GNSS) Earth Observation Network System (GEONET) Station close to the survey site was developed. Firstly, the positioning information of local GEONET stations was used to filter out the errors caused by satellite position error, satellite clock offset, and ionosphere. In addition, the advanced Kalman filter was designed to minimise receiver offset and the multipath error, in order to obtain a high precision position of each GPR track. Kirchhoff migration considering near-field effect was used to identify the three-dimensional distribution of the root. In a later stage, a novel processing scheme was used to detect and clearly map the coarse roots of the investigated tree. A successful case study is proposed, which supports the following premise: the current scheme is an affordable and accurate mapping method of the root system architecture
    • …
    corecore