32,901 research outputs found

    NACU: A Non-Linear Arithmetic Unit for Neural Networks

    Get PDF
    Reconfigurable architectures targeting neural networks are an attractive option. They allow multiple neural networks of different types to be hosted on the same hardware, in parallel or sequence. Reconfigurability also grants the ability to morph into different micro-architectures to meet varying power-performance constraints. In this context, the need for a reconfigurable non-linear computational unit has not been widely researched. In this work, we present a formal and comprehensive method to select the optimal fixed-point representation to achieve the highest accuracy against the floating-point implementation benchmark. We also present a novel design of an optimised reconfigurable arithmetic unit for calculating non-linear functions. The unit can be dynamically configured to calculate the sigmoid, hyperbolic tangent, and exponential function using the same underlying hardware. We compare our work with the state-of-the-art and show that our unit can calculate all three functions without loss of accuracy

    Analysis of the Basic Implementation Aspects of Hardware-Accelerated Density Functional Theory Calculations

    Get PDF
    This paper presents a Field Programmable Gate Array (FPGA) implementation of a calculation module for exponential part of Gaussian Type Orbital (GTO). The module is composed of several specially crafted floating-point modules which are fully pipelined and optimized for high performance. The hardware implementation revealed significant speed-up for the finite sum of the exponential products calculation ranging from 2.5x to 20x in comparison to a general-purpose Central Processing Unit (CPU) version. Calculating values of GTOs is one of computationally critical parts of the Kohn-Sham algorithm. The approach proposed in the paper aims to increase the performance of a part of the quantum chemistry computational system by employing FPGA-based accelerator. Several issues are addressed, such as identification of code fragments which benefit most from hardware acceleration, porting a part of the Kohn-Sham algorithm to FPGA, data precision adjustment and data transfer overhead. The authors' intention was also to make hardware implementation of calculating the orbital function universal and easily attachable to different quantum-chemistry software packages

    Algorithms and architectures for decimal transcendental function computation

    Get PDF
    Nowadays, there are many commercial demands for decimal floating-point (DFP) arithmetic operations such as financial analysis, tax calculation, currency conversion, Internet based applications, and e-commerce. This trend gives rise to further development on DFP arithmetic units which can perform accurate computations with exact decimal operands. Due to the significance of DFP arithmetic, the IEEE 754-2008 standard for floating-point arithmetic includes it in its specifications. The basic decimal arithmetic unit, such as decimal adder, subtracter, multiplier, divider or square-root unit, as a main part of a decimal microprocessor, is attracting more and more researchers' attentions. Recently, the decimal-encoded formats and DFP arithmetic units have been implemented in IBM's system z900, POWER6, and z10 microprocessors. Increasing chip densities and transistor count provide more room for designers to add more essential functions on application domains into upcoming microprocessors. Decimal transcendental functions, such as DFP logarithm, antilogarithm, exponential, reciprocal and trigonometric, etc, as useful arithmetic operations in many areas of science and engineering, has been specified as the recommended arithmetic in the IEEE 754-2008 standard. Thus, virtually all the computing systems that are compliant with the IEEE 754-2008 standard could include a DFP mathematical library providing transcendental function computation. Based on the development of basic decimal arithmetic units, more complex DFP transcendental arithmetic will be the next building blocks in microprocessors. In this dissertation, we researched and developed several new decimal algorithms and architectures for the DFP transcendental function computation. These designs are composed of several different methods: 1) the decimal transcendental function computation based on the table-based first-order polynomial approximation method; 2) DFP logarithmic and antilogarithmic converters based on the decimal digit-recurrence algorithm with selection by rounding; 3) a decimal reciprocal unit using the efficient table look-up based on Newton-Raphson iterations; and 4) a first radix-100 division unit based on the non-restoring algorithm with pre-scaling method. Most decimal algorithms and architectures for the DFP transcendental function computation developed in this dissertation have been the first attempt to analyze and implement the DFP transcendental arithmetic in order to achieve faithful results of DFP operands, specified in IEEE 754-2008. To help researchers evaluate the hardware performance of DFP transcendental arithmetic units, the proposed architectures based on the different methods are modeled, verified and synthesized using FPGAs or with CMOS standard cells libraries in ASIC. Some of implementation results are compared with those of the binary radix-16 logarithmic and exponential converters; recent developed high performance decimal CORDIC based architecture; and Intel's DFP transcendental function computation software library. The comparison results show that the proposed architectures have significant speed-up in contrast to the above designs in terms of the latency. The algorithms and architectures developed in this dissertation provide a useful starting point for future hardware-oriented DFP transcendental function computation researches

    A biophysically accurate floating point somatic neuroprocessor

    Get PDF

    Minimum entropy restoration using FPGAs and high-level techniques

    Get PDF
    One of the greatest perceived barriers to the widespread use of FPGAs in image processing is the difficulty for application specialists of developing algorithms on reconfigurable hardware. Minimum entropy deconvolution (MED) techniques have been shown to be effective in the restoration of star-field images. This paper reports on an attempt to implement a MED algorithm using simulated annealing, first on a microprocessor, then on an FPGA. The FPGA implementation uses DIME-C, a C-to-gates compiler, coupled with a low-level core library to simplify the design task. Analysis of the C code and output from the DIME-C compiler guided the code optimisation. The paper reports on the design effort that this entailed and the resultant performance improvements

    Evaluating critical bits in arithmetic operations due to timing violations

    Full text link
    Various error models are being used in simulation of voltage-scaled arithmetic units to examine application-level tolerance of timing violations. The selection of an error model needs further consideration, as differences in error models drastically affect the performance of the application. Specifically, floating point arithmetic units (FPUs) have architectural characteristics that characterize its behavior. We examine the architecture of FPUs and design a new error model, which we call Critical Bit. We run selected benchmark applications with Critical Bit and other widely used error injection models to demonstrate the differences

    Fourier Transform of the Stretched Exponential Function: Analytic Error Bounds, Double Exponential Transform, and Open-Source Implementation libkww

    Get PDF
    The C library \texttt{libkww} provides functions to compute the Kohlrausch-Williams-Watts function, i.e.\ the Laplace-Fourier transform of the stretched (or compressed) exponential function exp⁡(−tβ)\exp(-t^\beta) for exponents β\beta between 0.1 and 1.9 with sixteen-digits accuracy. Analytic error bounds are derived for the low and high frequency series expansions. For intermediate frequencies the numeric integration is enormously accelerated by using the Ooura-Mori double exponential transformation. The source code is available from the project home page \url{http://apps.jcns.fz-juelich.de/doku/sc/kww}.Comment: Version 3. 11 pages, 4 figures. Describes software version 2.
    • …
    corecore