Search CORE

55 research outputs found

Solving Systems of Linear Equations in Complex Domain : Complex E-Method

Author: Ercegovac Milos
Muller Jean-Michel
Publication venue: HAL CCSD
Publication date: 24/01/2007
Field of study

The E-method, introduced by Ercegovac, allows efficient parallel solution of diagonally dominant systems of linear equations in real domain using simple and highly regular hardware. Since the evaluation of polynomials and certain rational functions can be achieved by solving the corresponding linear systems, the E-method is an attractive general approach for function evaluation. We generalize the E-method to complex linear systems, and show some potential applications such as the evaluation of complex polynomials and rational functions

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Complex Multiply-Add and Other Related Operators

Author: Ercegovac Milos
Muller Jean-Michel
Publication venue: 'Instytut Dermatologii Radoslaw Spiewak'
Publication date: 26/08/2007
Field of study

International audienceIn this work, we present algorithms and schemes for computing common arithmetic expressions defined in the complex domain as hardware-implemented operators.The operators include Complex Multiply-Add (CMA: ab+c), Complex Sum of Producrs (CSP: ab+ce+f), Complex Sum of Squares (CSS: a^2+b^2) and complex Integer Powers. The proposed approach is to map the expression to a system of linear equations, apply a complex-to-real transform, and compute the solutions to the linear system using a digit-by-digit, the most significant digit first, recurrence method. The components of the solution vector corresponds to the expressions being evaluated. The number of digit cycles is about m for m-digit precision. The basic modules are similar to left-to-right multipliers. The interconnections between the modules are digit-wide

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Design and Implementation of a Radix-4 Complex Division Unit with Prescaling

Author: Dormiani Pouya
Ercegovac Milos
Muller Jean-Michel
Publication venue: IEEE Computer Society
Publication date: 07/07/2009
Field of study

International audienceWe present a design and implementation of a radix-4 complex division unit with prescaling of the operands. Specifically, we extend the treatment of the residual bound and errors due to the use of truncated redundant representation. The requirements for prescaling tables are simplified and a detailed specification of the table design is given. All principal components used in the design are described and the proposed optimizations are explained. The target platform for implementation was an Altera Stratix II FPGA [15] for which we report timing and area requirements. For a precision of 36 bits, the implementation uses 1093 ALUTs, achieving a latency of 97ns. The maximum clock frequency is 268.53 MHz

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Low Precision Table Based Complex Reciprocal Approximation

Author: Dormiani Pouya
Ercegovac Milos
Muller Jean-Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

International audienceA recently proposed complex valued division algorithm designed for efficient hardware implementations requires a prescaling step by a constant factor. Techniques for obtaining this prescaling factor have been mentioned by the authors, which serves to justify the feasibility of the algorithm but is inadequate for obtaining efficient implementations. Table based solutions are formulated in this paper for obtaining the prescaling factor, a low precision reciprocal approximation for a complex value, using techniques adopted from univariate function approximations. Two separate designs are proposed, one using a single table (a reference design) and another using generalized multipartite tables. The main contribution of this work is the extension of generalized multipartite table methods to a function of two variables. The multipartite tables derived were up to 67% more memory efficient than their single table counterparts

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

(M,p,k)-friendly points: a table-based method for trigonometric function evaluation

Author: Brisebarre Nicolas
Ercegovac Milos
Muller Jean-Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2012
Field of study

International audienceWe present a new way of approximating the sine and cosine functions by a few table look-ups and additions. It consists in first reducing the input range to a very small interval by using rotations with "(M, p, k) friendly angles", proposed in this work, and then by using a bipartite table method in a small interval. An implementation of the method for 24- bit case is described and compared with CORDIC. Roughly, the proposed scheme offers a speedup of 2 compared with an unfolded double-rotation radix-2 CORDIC

HAL-ENS-LYON

Crossref

INRIA a CCSD electronic archive server

Hal-Diderot

Simple Seed Architectures for Reciprocal and Square Root Reciprocal

Author: Ercegovac Milos
Muller Jean-Michel
Tisserand Arnaud
Publication venue: HAL CCSD
Publication date: 01/01/2005
Field of study

This report presents a simple hardware architecture for computing the seed values for reciprocal and square root reciprocal. These seeds are used in the initialization of floating-point division and square root software iterations. The proposed solution is based on polynomial approximation with specific coefficients and a table lookup. The obtained architectures lead to small and fast circuits

HAL-ENS-LYON

CiteSeerX

INRIA a CCSD electronic archive server

Hal-Diderot

Improving Goldschmidt Division, Square Root and Square Root Reciprocal

Author: Ercegovac Milos
Imbert Laurent
Matula David
Muller Jean-Michel
Wei Guoheng
Publication venue: HAL CCSD
Publication date: 01/01/1999
Field of study

The aim of this paper is to accelerate division, square root and square root reciprocal computations, when Goldschmidt method is used on a pipelined multiplier. This is done by replacing the last iteration by the addition of a correcting term that can be looked up during the early iterations. We describe several variants of the Goldschmidt algorithm assuming 4-cycle pipelined multiplier and discuss obtained number of cycles and error achieved. Extensions to other than 4-cycle multipliers are given.Le but de cet article est l'accélération de la division, et du calcul de racines carrées et d'inverses de racines carrées lorsque la méthode de Goldschmidt est utilisée sur un multiplieur pipe-line. Nous faisons ceci en remplaçant la dernière itération par l'addition d'un terme de correction qui peut être déduit d'une lecture de table effectuée lors des premières itérations. Nous décrivons plusieurs variantes de l'algorithme obtenu en supposant un multiplieur à 4 étages de pipe-line, et donnons pour chaque variante l'erreur obtenue et le nombre de cycles de calcul. Des extensions de ce travail à des multiplieurs dont le nombre d'étages est différent sont présentées

INRIA a CCSD electronic archive server

LUXOR: An FPGA Logic Cell Architecture for Efficient Compressor Tree Implementations

Author: Ajay
Dadda Luigi
Earle J. G.
Ercegovac Milos D
Hubara Itay
IBM ILOG CPLEX.
Kumm Martin
Umuroglu Yaman
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 06/03/2020
Field of study

We propose two tiers of modifications to FPGA logic cell architecture to deliver a variety of performance and utilization benefits with only minor area overheads. In the irst tier, we augment existing commercial logic cell datapaths with a 6-input XOR gate in order to improve the expressiveness of each element, while maintaining backward compatibility. This new architecture is vendor-agnostic, and we refer to it as LUXOR. We also consider a secondary tier of vendor-speciic modifications to both Xilinx and Intel FPGAs, which we refer to as X-LUXOR+ and I-LUXOR+ respectively. We demonstrate that compressor tree synthesis using generalized parallel counters (GPCs) is further improved with the proposed modifications. Using both the Intel adaptive logic module and the Xilinx slice at the 65nm technology node for a comparative study, it is shown that the silicon area overhead is less than 0.5% for LUXOR and 5-6% for LUXOR+, while the delay increments are 1-6% and 3-9% respectively. We demonstrate that LUXOR can deliver an average reduction of 13-19% in logic utilization on micro-benchmarks from a variety of domains.BNN benchmarks benefit the most with an average reduction of 37-47% in logic utilization, which is due to the highly-efficient mapping of the XnorPopcount operation on our proposed LUXOR+ logic cells.Comment: In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20), February 23-25, 2020, Seaside, CA, US

arXiv.org e-Print Archive

Crossref

A General Method for Evaluation of Functions and Computations in A Digital Computer

Author: Ercegovac Milos Dragutin
Publication venue
Publication date
Field of study

118 p.Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 1975.U of I OnlyRestricted to the U of I community idenfinitely during batch ingest of legacy ETD

Illinois Digital Environment for Access to Learning and Scholarship Repository