55 research outputs found

    Solving Systems of Linear Equations in Complex Domain : Complex E-Method

    Get PDF
    The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions

    Complex Multiply-Add and Other Related Operators

    Get PDF
    International audienceIn this work, we present algorithms and schemes for computing common arithmetic expressions defined in the complex domain as hardware-implemented operators.The operators include Complex Multiply-Add (CMA: ab+c), Complex Sum of Producrs (CSP: ab+ce+f), Complex Sum of Squares (CSS: a^2+b^2) and complex Integer Powers. The proposed approach is to map the expression to a system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector corresponds to the expressions being evaluated. The number of digit cycles is about m for m-digit precision. The basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide

    Design and Implementation of a Radix-4 Complex Division Unit with Prescaling

    Get PDF
    International audienceWe present a design and implementation of a radix-4 complex division unit with prescaling of the operands. Specifically, we extend the treatment of the residual bound and errors due to the use of truncated redundant representation. The requirements for prescaling tables are simplified and a detailed specification of the table design is given. All principal components used in the design are described and the proposed optimizations are explained. The target platform for implementation was an Altera Stratix II FPGA [15] for which we report timing and area requirements. For a precision of 36 bits, the implementation uses 1093 ALUTs, achieving a latency of 97ns. The maximum clock frequency is 268.53 MHz

    Low Precision Table Based Complex Reciprocal Approximation

    Get PDF
    International audienceA recently proposed complex valued division algorithm designed for efficient hardware implementations requires a prescaling step by a constant factor. Techniques for obtaining this prescaling factor have been mentioned by the authors, which serves to justify the feasibility of the algorithm but is inadequate for obtaining efficient implementations. Table based solutions are formulated in this paper for obtaining the prescaling factor, a low precision reciprocal approximation for a complex value, using techniques adopted from univariate function approximations. Two separate designs are proposed, one using a single table (a reference design) and another using generalized multipartite tables. The main contribution of this work is the extension of generalized multipartite table methods to a function of two variables. The multipartite tables derived were up to 67% more memory efficient than their single table counterparts

    (M,p,k)-friendly points: a table-based method for trigonometric function evaluation

    Get PDF
    International audienceWe present a new way of approximating the sine and cosine functions by a few table look-ups and additions. It consists in first reducing the input range to a very small interval by using rotations with "(M, p, k) friendly angles", proposed in this work, and then by using a bipartite table method in a small interval. An implementation of the method for 24- bit case is described and compared with CORDIC. Roughly, the proposed scheme offers a speedup of 2 compared with an unfolded double-rotation radix-2 CORDIC

    Simple Seed Architectures for Reciprocal and Square Root Reciprocal

    Get PDF
    This report presents a simple hardware architecture for computing the seed values for reciprocal and square root reciprocal. These seeds are used in the initialization of floating-point division and square root software iterations. The proposed solution is based on polynomial approximation with specific coefficients and a table lookup. The obtained architectures lead to small and fast circuits

    Improving Goldschmidt Division, Square Root and Square Root Reciprocal

    Get PDF
    The aim of this paper is to accelerate division, square root and square root reciprocal computations, when Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm assuming 4-cycle pipelined multiplier and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given.Le but de cet article est l'accélération de la division, et du calcul de racines carrées et d'inverses de racines carrées lorsque la méthode de Goldschmidt est utilisée sur un multiplieur pipe-line. Nous faisons ceci en remplaçant la dernière itération par l'addition d'un terme de correction qui peut être déduit d'une lecture de table effectuée lors des premières itérations. Nous décrivons plusieurs variantes de l'algorithme obtenu en supposant un multiplieur à 4 étages de pipe-line, et donnons pour chaque variante l'erreur obtenue et le nombre de cycles de calcul. Des extensions de ce travail à des multiplieurs dont le nombre d'étages est différent sont présentées

    LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations

    Full text link
    We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.Comment: In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20), February 23-25, 2020, Seaside, CA, US

    A General Method for Evaluation of Functions and Computations in A Digital Computer

    No full text
    118 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1975.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD
    • …
    corecore