Synthesis for mixed arithmetic. by Mignotte, Anne et al.
HAL Id: hal-02101886
https://hal-lara.archives-ouvertes.fr/hal-02101886
Submitted on 17 Apr 2019
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of sci-
entific research documents, whether they are pub-
lished or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
Synthesis for mixed arithmetic.
Anne Mignotte, Jean-Michel Muller, Olivier Peyran
To cite this version:
Anne Mignotte, Jean-Michel Muller, Olivier Peyran. Synthesis for mixed arithmetic.. [Research
Report] LIP RR-1997-11, Laboratoire de l’informatique du parallélisme. 1997, 2+24p. ￿hal-02101886￿
Laboratoire de l’Informatique du Parallélisme
Ecole Normale Supérieure de Lyon
Unité de recherche associée au CNRS n°1398 
Synthesis for mixed arithmetic
Anne Mignotte
Jean Michel Muller
Olivier Peyran
November  
Research Report No  
Ecole Normale Supérieure de Lyon
Adresse électronique : lip@lip.ens−lyon.fr 
Téléphone : (+33) (0)4.72.72.80.00    Télécopieur : (+33) (0)4.72.72.80.80
46 Allée d’Italie, 69364 Lyon Cedex 07, France
Synthesis for mixed arithmetic
Anne Mignotte
Jean Michel Muller
Olivier Peyran
November  
Abstract
This article presents a methodology to use a powerful arithmetic  redundant arithmetic
in some parts of designs in order to fasten them without a large increase in area thanks
to the use of both conventional and redundant number systems This implies specic
constraints in the scheduling process An integer linear programming  ILP formulation
is proposed which nds an optimal solution for reasonable examples In order to solve
the problem of possibly huge ILP computational time a general solution based on a
constraint graph partitioning is proposed
Keywords  Arithmetic redundant number systems scheduling integer linear programming par
titioning
Resume
Cette article presente une methode permettant lutilisation dune arithmetique tres per
formante  larithmetique redondante sur certaines parties dun circuit an daugmenter
sa vitesse sans trop augmenter sa surface gr	ace au melange darithmetiques non re
dondantes conventionnelles et darithmetiques redondantes Cela induit des contraintes
speciques dans le processus dordonnancement Une formulation en programme lineaire
en nombres entiers est proposee an de trouver le resultat optimal pour des exemples
de taille raisonnable Une solution basee sur le partitionnement dun graphe de con
traintes permet de resoudre le probleme des temps de calculs trop importants
Motscles  Arithmetique systeme redondant decriture des nombres ordonnancement program
mation lineaire en nombres entiers partitionnement
Synthesis for mixed arithmetic
A Mignotte JM Muller and O Peyran
LIP CNRS URA   Ecole Normale Sup	erieure de Lyon


 Lyon Cedex  France
e mail AnneMignotte Jean MichelMuller OlivierPeyranlipens lyonfr
November  
Contents
 Introduction 
 Mixed arithmetic 

 Redundant arithmetic                                                                       



 Using redundant arithmetic globally                                                       

 Using mixed arithmetic                                                                     
 High level synthesis and mixed arithmetic 
 ILP formulation 
 Denitions                                                                                     

 The formulation                                                                             
 Results                                                                                         
	 Overcoming the problem of drastic ILP computation time 	
 Partitioning                                                                                   
 Partitioning methodology                                                           

 Reduced constraint graph                                                                   
 Results                                                                                         


 Conclusion 

  Introduction
When considering an application as a ow of operations numbers are generally encoded using
conventional binary number systems  
s complement unsigned binary signmagnitude These
representations are optimal in terms of compression and oer the smallest possible register size
However operators for usual operations such as multiplication division or square root almost
systematically use redundant number representation as an internal encoding as very fast carry
free additions can be performed using these representation These operators need a nal conversion
in order to return a non redundant result As this conversion is equivalent to a conventional addition
it can be benecial to avoid this last operation which would improve both delay and area This
leads to designs using redundant arithmetic explicitly
However we show in the following section that fully redundant arithmetics are in general not
interesting regarding area and consumption criteria Our approach is to mix redundant and non
redundant arithmetics  mixed arithmetic in order to take benet from the advantages of redun
dant arithmetic without its drawbacks In Section  we present a methodology to automatically
introduce mixed arithmetic in high level synthesis In Section  a solution based on integer linear
programming  ILP is proposed Finally in Section  the more general question of overcoming the
problem of drastic ILP computational time is addressed
 Mixed arithmetic
  Redundant arithmetic
Some number systems may allow faster arithmetic operations than our conventional  binary or
decimal number systems Assume that we want to compute the sum s  sn  sn        s of two
numbers x  xn  xn        x and y  yn  yn        y represented in the conventional binary number
system By examining the wellknown equation that describes the addition process
 EqAdd
 

c  
si  xi   yi   ci
ci   xiyi  xici  yici
one can see that there is a dependency relation between ci the incoming carry at position i and
ci  This does not mean that the addition process is intrinsically sequential and that the sum of
two numbers is necessarily computed in a time that grows linearly with the size of the operands
Many addition algorithms and architectures proposed in the literature are much faster than a
straightforward purely sequential implementation of  EqAdd Among such adders one can cite
the conditionalsum adder  implemented in the IBM RS  which performs the addition
of two nbit numbers in time proportional to logn and the carryskip adder 
   which per
forms the addition of two nbit numbers in time proportional to
p
n Nevertheless the dependency
relation between the carries makes a fully parallel addition impossible in the conventional number
systems
In  Avizienis  suggested to use dierent number systems called signed digit number sys
tems Assume that we use radix r In a signeddigit number system the numbers are no longer
represented using digits between  and r   but with digits between a and a where a  r  
Avizienis showed that every number is representable in such a system provided that 
a  r  
Another important property is that if 
a  r then some numbers have several possible represen
tations the number system is redundant


Avizienis also gave addition algorithms adapted to his number systems The following algorithm
performs the addition of two numbers x  xn  xn       x and y  yn  yn       y represented in
radix r with digits between a and a where a  r   and 
a  r   
Algorithm  Avizienis
Input  x  xn  xn        x and y  yn  yn       y
Output  s  snsn  sn        s
 in parallel for i        n  compute ti  carry and wi intermediate sum satisfying 

ti  
 

 if xi  yi  a
 if a    xi  yi  a 
 if xi  yi  a
wi  xi  yi  b ti 
 in parallel for i        n compute si  wi  ti with wn  t  
By carefully examining that algorithm one can see that the carry ti  does not depend on ti
There is no carry propagation any longer It can be shown that a fully parallel addition can only
be performed under reasonable hypotheses thanks to a redundant number system 

Now let us focus on the particular case of radix 
 The conditions 
a  r and a  r 
cannot be simultaneously satised in radix 
 However it is possible to perform totally parallel
carry free additions in radix 
 In this radix the two usual redundant number systems are the
carry save  CS number system and the signeddigit number system In the carrysave number
system numbers are represented with digits   and 
 and each digit d is represented by two bits
d  and d whose sum equals d In the signeddigit number system with digits   and  we
represent the digits with the borrow save  BS encoding each digit d is represented by two bits d
and d  such that d  d   d Those two number systems allow very fast additionsubtraction
The carry save adder  see for instance  is a very wellknown structure used for adding a number
represented in the carrysave system and a number represented in the conventional binary system
Algorithm  Carry Save
Input  x  x n  x

n  x
 
n x

n        x
 
 x

 and
y  yn  yn       y
Output  s  s n s

n s
 
n  s

n  s
 
n s

n        s
 
 s


In parallel for i        n   compute s i and si  with t   

s n  s

  
s
 
i  x
 
i   xi   yi
s

i   x
 
i  x

i  x
 
i  yi  x

i  yi
This algorithm can be implemented by a row of fulladder cells  a full adder cell computes two bits
t and u from three bits x y and z such that 
t  u equals x  y  z The addition of two CS
operands  x  x x and y  y  y can obviously be performed by two rows of full adders
cells as s  x  y can be decomposed into z  x  y  followed by s  z  y which both are
additions of a CS operand and a non redundant operand Such an adder is represented in Fig 
 This condition is stronger than the condition  a   r   that is required to represent every number

Redundant  resp non redundant number systems are denoted by R  resp NR An operator
that performs the operation  from two operands of type X and Y  and gives a result of type Z
is denoted by X  Y  Z and is called redundant if Z is a redundant representation Similarly
a converter from redundant to non redundant is denoted by R NR Actually this operation is
a conventional addition for CS as a CS number is the addition of two NR numbers  if x is a CS
number then x  x   x where x  and x are NR numbers For the same reason a CS
addition with two CS operands  NRNR CS does not need to be performed by an operator
We call such an addition a virtual addition The BS system has the same property with subtraction
       
X Y Z
T U
+ +
+
+
+ X + Y + Z = 2.T + U
CS + NR       CS
A B,1 B,2
CS + NR       CS
S
+ +
+
+
+
+ +
+
+
+
+
+
+
+
+ +
++++
A7,1 B7,1 A7,2 B7,2 A6,1 B6,1 A6,2 B6,2
S8,1 S8,2 S7,1 S7,2
A Full Adder Cell
}
}
00
+ +
+
+
+
+
+
+ +
+
+ +
+++
++
+ +
A1,1 B1,1 A1,2 B1,2 A0,1 B0,1 A0,2 B0,2
S2,2 S1,1 S1,2 S0,1 S0,2
+
Figure  A CSCS  CS adder made up with two CS NR CS adders
Redundant number systems are rather commonly used into arithmetic operators such as multi
pliers and dividers  those operators have their input and output data represented in a nonredundant
number system but perform some of their internal calculations in a redundant number system
For instance most multipliers use  at least implicitly the carrysave number system the multiplier
of the TI  chip internally uses the radix
 signeddigit number system 
 while the divider
of the Pentium actually uses two dierent redundant number systems the division iterations are
performed in carrysave and the quotient is rst generated in radix  with digits between 
 and

 and then converted in the usual radix
 number system
All these large operators perform a nal conversion in order to convert this internal representa
tion into a conventional one The drawback is that a conversion from redundant to non redundant
represents an important cost regarding area and speed It can be benecial to avoid this nal
conversion and thus redundant numbers are used explicitly in the whole design and not only
inside complex operators
The use of fully redundant arithmetic within a design shows major drawbacks in term of area
and consumption but it can be avoided by converting the operands which leads to designs using
redundant and non redundant arithmetics  mixed arithmetic as explained in the next section
   Using redundant arithmetic globally
Using for instance the CS number system in the whole design would imply to replace the con
ventional adders by CS  CS  CS adders Several types of 
bit adders  redundant and non
redundant have been implemented Table  shows the result in terms of area delay and con
sumption One can see that a carry look ahead adder has comparable delay  the redundant adder
is only  better for 
bit operands whereas a carry skip adder is better in term of area and
consumption with a reasonable delay
However these results do not address the problem of registers Indeed in radix 
 redundant
numbers are twice larger than non redundant ones which leads to a drastic increase in consump
This work was supported by PRC GDR ANM in the scope of a project with the MASIParis VI and CSIINPG
laboratories

tion Lang Cortadella and Mussoll studied the problem of redundant addition 
 their solution
uses dierent adders for dierent codings of the CS system considering transition probabilities
to avoid critical digit transitions  for instance 
   in CS where the two bits are changed
However this solution requires the knowledge of these transition probabilities and brings only a
small improvement Hence as consumption has become a major constraint using fully redundant
arithmetic seems to be unrealistic
 bit adder Delay Area  w Consumption
Ripple Carry  ns  	  wMhz
Carry Skip  ns 
   wMhz
Carry Look Ahead 
 ns 	
    wMhz
CS CS   CS  ns  	 
  wMhz
Table  Performance of several types of adder Technology is CMOS  m
Another major drawback of fully redundant arithmetic concerns the multiplication one of
the multiplication operand has to be NR otherwise area and consumption are dramatically in
creased 
Nevertheless if one of the operands is non redundant redundant additions become very pow
erful Fig 
 shows some implementations of various redundant 
bit adders compared to a carry
look ahead one A CSNR CS adder is three times faster than the fastest non redundant one
 CLA and has the same area and consumption as the smallest and least consuming one  ripple
carry Thus mixed operators are interesting both in terms of speed and area or consumption
The problem of registers is also largely decreased as only half of the operands would be re
dundant which increases the register consumption by only  compared to conventional rep
resentation Besides using radix  operators would lead to a  register consumption increase
as redundant numbers would only be  larger Radix  redundant operators remain faster and
smaller than non redundant radix 
 adders and their low consumption would balance the 
register consumption increase We are currently working on the validation of this representation
All these remarks show the interest of using mixed arithmetic  mixing redundant and non
redundant operands converters instead of systematically outputting large operators  multipliers
dividers only convert some of the operands Thus CS  NR  CS adders are used instead of
fully redundant ones Moreover if the conversion is not always necessary inside a ow of operations
it has to be done before outputting the results Thus a converter R  NR  redundant to non
redundant is always present in a design and it can be useful to take advantage of this resource on
the whole design instead of using it only for the nal conversion
There are already numerous applications using mixed arithmetic in a way that does not cost
time  ie by overlapping conversion and computation Kornerup studied conversions between
dierent redundant and non redundant systems 
 Koren et al  proposed an original adder
whose operands could be partially redundant in order to limitate the carry propagation with a
limited increase in area Concerning multiplication Matula and Lyu 
 investigated the problem
of converting redundant binary inputs into Booth encoding They have proposed a general purpose
multiplier using a precoder providing partial compression of a redundant binary value  and with
no extra delay for the non redundant case in a format that may be directly input to a standard
radix  Booth recoder
However as the use of such operators requires a good redundant arithmetic expertise these
architectures are generally related to specic applications For example Briggs and Matula 

realized a processor eecting a x bit multiplyandadd implemented into the Cyrix D
numeric coprocessor in which the multiplier result is not converted before being transmitted to the
adder
The problem we address is more general Our aim is to use mixed arithmetic globally during the
design automation ow in order to take benet from the speed of redundant arithmetic without the
drawbacks of area and consumption Therefore our solution is not to design innovating operators
but to propose a global approach of the conversion insertion problem in order to limitate the
redundant operands
CS+NR−>CS         BS+NR−>BS         BS+CS−>BS         BS+BS−>BS         CS+CS−>CS         NR+NR−>NR
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
AREA
___ : Xilinx 4000
− − : Actel ACT3
... : AMD Mach
. − : ES2 ecpd07
ar
ea
   
  (
ar
ea
 o
f n
r+
cs
−
>
cs
 =
 1
)
CS+NR−>CS         BS+NR−>BS         BS+CS−>BS         BS+BS−>BS         CS+CS−>CS         NR+NR−>NR
0
5
10
15
20
25
30
35
40
45
50
DELAY
___ : Xilinx 4000
− − : Actel ACT3
... : AMD Mach
. − : ES2 ecpd07
sp
ee
d 
   
 (
ns
)
Figure 
 Results of dierent mixed adders on several technologies  Actel ACT Xilinx  AMD
Mach and ES
 ecpd
  Using mixed arithmetic
We have extracted the following observations from our studies and our redundant arithmetic ex
pertise
 The problem can be dealt with at the algorithmic level conversion insertion is equivalent to
choosing the variable  operand encoding
 Performing redundant operations with only one redundant operand remains reasonable  ie
NR R

 A converter R NR is always present in a design Our approach is to also use this converter
inside the ow of operations to reduce the possibility of having redundant operands
Thus the problem we address becomes the following
Mixed arithmetic problem  Having a ow of dependent arithmetic operations algorithm
several mixed operators for the usual operations addition subtraction multiplication at least one
converter and considering the number of cycles the cycle delay and the area
What type of operator 	ts the best to an operation

What is the best choice for the use of the converters which operands should be converted

We have tried to solve this problem manually on dierent algorithms Thus we have ex
perimented the use of mixed arithmetic on several benchmarks Table 
 shows that interesting
improvement of the delay can be achieved without a large increase in the number of cycles We
use a particularity of redundant arithmetic in order to make the problem more manageable when
there is no possibility of keeping one of the operand non redundant  or if the conversion costs too
much we can perform a fully redundant addition  R  R  R using two NR  R  R adders
 see Fig  Conversely if both operands are non redundant  NRNR R the addition is vir
tual These two cases match the mixed arithmetic approach and they only dier from the regular
RNR R adder by the number of resources
An interesting example is the th order 	lter design There are two critical paths of  cycles but
the xed number of resources  resource constraint two adders one multiplier makes impossible to
nd a schedule with a conventional arithmetic in less than  cycles Figure  shows the scheduled
graph of the th order 	lter design using mixed arithmetic Every operation gives a redundant result
thus as operator outputs become inputs of other operators every operation has a priori redundant
operands However we keep the same resource constraint regarding area and consumption which
means that the number of conventional adders in the scheduling using NR arithmetic became the
number of NR  CS  CS adders in the scheduling using mixed arithmetic This implies the
conversion of half of the operands which seems di cult considering that each cycle can use two
adders  thus four operands but only one converter Intermediate results  t t   are not always
converted but the nal result  out has to be non redundant One can see that we managed to
reach the  cycle limit This example shows that even with one converter for two adders and
with very weak operation mobility it is possible to nd a schedule using mixed arithmetic with the
same number of cycles than the classical one The main amelioration is that multiplication results
are not converted anymore which is benecial both in term of delay and area Most adders are
NR R R ones
These benchmarks have convinced us that the mixed arithmetic approach is very realistic and
interesting
We have tried to automate the mixed arithmetic problem previously dened The problem is no
longer a problem of arithmetic operators but it becomes a high level synthesis one More precisely
it is an extended problem of scheduling and operator type selection The next section addresses
the solutions we have developed
 High level synthesis and mixed arithmetic
High level synthesis  HLS translates an algorithm  formulated using languages like VHDL or
Verilog into a register transfer level  RTL description It can be decomposed into four main

CS+NR->CS
    
    

 : Conv. node
CS + CS -> CS : R : NR 
*
*
*
*
*
*
*
t2 t13 t18 t26 t38 t33 t39
in
*
CS+NR->CS CS+NR->CS
CS+NR->CSCS+NR->CS
CS+NR->CS
CS + CS -> CS
CS+NR->CS
    
    
    



    
    


   
   


    
    


    
    
    



    
    
    



   
   


   
   


   
   


    
    


   
   
   



    
    
    



   
   


   
   


   
   


 
 
 
 
 
 
 
 
 









    
    
    
    
    
    
    







 
 
 
 




 
 
 
 
 





 
 
 
 
 
 






 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 



















 
 
 
 
 
 
 
 
 
 










         
         


      
      
      



           
           
           



           
           


        
        
        
        




 
 
 
 
 
 
 
 
 









         
         


 
 
 
 
 
 
 
 
 
 










      
      
      
      




         
         
         
         
         





t38t2 t13 t18 t26 t33 t39
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Cycle 11
Cycle 12
Cycle 14
cst
cst
cst
cst
out
Cycle 13
Cycle 15
Cycle 16
CS+NR->CS
CS+NR->CS CS+NR->CS
CS+NR->CS
CS+NR->CS CS+NR->CS
CS+NR->CS
CS+NR->CS
CS+NR->CS
CS+NR->CS
CS+NR->CS
CS+NR->CSCS+NR->CS
CS+NR->CS
CS+NR->CS
CS+NR->CS
cst
cst
cst
CS+NR->CS
cst
CS+NR->CS
: NR+NR ->CS (null area) : NR+CS ->CS 
: CS+CS ->CS (two NR+CS->CS) 
Figure  Result of the th order 	lter scheduling using mixed arithmetic
steps data ow graph  DFG and control ow graph  CFG extraction operator type selection
scheduling and resource allocation These tasks are usually performed successively
However the operator type selection with mixed arithmetic implies operand type selection Such
a selection leads to the insertion of converters that has to be taken into account during scheduling
Thus the operator type selection has to be done while scheduling This constraint makes the
scheduling even more complicated but the problem can be simplied regarding some particularities
of mixed arithmetic We propose a modelisation of the design adapted to our problem
Our expertise in mixed arithmetic 
 have lead to the following hypotheses
 Constants and memory inputs should  obviously be non redundant
 Multiplication with one redundant operand can be implemented at a reasonable cost 
 
However when having both operands redundant the area increase is too important Thus
we impose at least one non redundant operand for every multiplication Moreover even if all
the operands are non redundant the same multiplier is used  the conversion from NR to R
is instantaneous as x  y   is a CS number if y is a NR number
 Among all the possible implementations of redundant addition we will use one that considers
a R  R  R adder  ie an adder with two redundant operands and a redundant result as
the concatenation of two R  NR  R adders  see Fig  We have proposed an original
algorithm for 
s complement CS addition in order to keep this property 
Hence we propose the following resource modelisation
An adder is modelised as c instances of the RNR R operator c  f  
g followed by zero
or one instance of a converter A multiplication is modelised as one instance of the R 	NR  R

th order lter Dierential equation
Area Delay cycles Area Delay cycles
Mixed n   n n  	  n   n  
NR n  n  n   	  n  n  

R n  n n  	 
n n  

FFT
Area Delay cycles
Mixed  n  n n 
NR  n  n  n 
R  n  n n 

Table 
 Results of dierent benchmarks using dierent arithmetics  with n bit inputs
resource followed by zero or one instance of a converter As the conversions may not be inserted
in the nal design they are called virtual conversions This modelisation can represent any kind
of operation as shown by Table  In this table all the operations are monocycle even the
CS  CS  CS addition  ie CS NR CS additions are chained Multicycle multiplications
are addressed in Section 
 This modelisation makes the operator type selection easier as the
p
e
o
a
t
i
o
n
s
r
p
e
o
a
t
i
o
n
s
r
M
u
l
t
A
d
d
i
o
t
i
n
M
u
l
t
A
d
d
i
o
t
i
n
◆NR      NR ◆CS       NR ◆CS       CS
CS + NR       CS
CS + NR       CS
CS + NR       CS
NR
CS + NR       CS
CS CS
CS + NR       CS
NR
CS + NR       CS
CS CS
* *
NR CS NRNR
* *
NR CS NRNR
  
  


  
  


  
  
  



  
  
  



  
  
  



NR
NR
R
Not Allowed
Not Allowed
NR
NRNR
NR
CS
NRNR
NRNR
CS
CSCS
Table  Resource modelisation according to the the operand and result types
choice has not to be done explicitly but is handled by the number of resources  for instance a
NR  NR  NR is considered as zero NR  R  R followed by one conversion However
the dierent steps of the HLS are modied Indeed as every operation is followed by a virtual
conversion the extracted DFG is specic to our problem Figure  shows a classical DFG and a
DFG with virtual conversions The conversion being virtual its output is not linked to any other
operation After scheduling there are two alternatives for a virtual conversion either it becomes
eective and the following operations may use the output of the converter or it disappears

The scheduling is also specic because it includes the operator selection and because an opera
tion of the DFG could disappear during scheduling This is the case for the virtual conversions
but also for the additions Indeed when we have an NR  NR  R addition zero instance of
NR  R  R is needed and an operation that succeeds this addition could be scheduled in the
same cycle as the virtual addition
A nal step is needed to specify the connections between converters and operators Figure 
shows a possible result the virtual conversions  and  have become eective conversions  and 
have disappeared Precedences are rebuilt to produce the scheduled DFG  SDFG regarding the
scheduling cycles of the conversion nodes
The main di culty lies in the scheduling due to the previous observations We already proposed
a solution to our problem based on an extended list scheduling  The principle of list scheduling
is to consider each cycle successively For a given cycle j all the candidate operations are scheduled
regarding the resource constraints and a priority function An operation is a candidate if all its
predecessors have already been scheduled The priority function could be for instance the mobility
 As Late As possible date Li  As Soon As Possible date Si The operations scheduled at cycle j are
those of highest priority regarding the number of resources Similarly we have extended this idea
to edges in order to nd which edges should be converted We rst determine the convertible edges
then compute their urgency The urgency function of edge eij  oi  oj is dened as U eij  NiLj T i
where N i is the number of operands which would be converted if a conversion was inserted after
operation oi and T  i is oi schedule The most urgent edges are converted regarding the number
of converters However since this is a greedy approach and since our problem needs a global view
the obtained results were not very convincing
Therefore we propose an ILP formulation which guarantees a completely global approach and
gives an optimal result The formulation and the results are presented in the following sections
-
-
-
-
   
 
 



  
  


 
 


 
 
 
 



   
   
   



  
  


+ *
cte cte
+ *
cte cte
A classical DFG Our specific DFG
S1 S2 S1 S2
E1 E2 E1 E2
Figure  A classical DFG and our specic DFG  black circles represent conversions
 ILP formulation
 Denitions
Scheduling is a very common application of ILP for examples formulations to the general problem
of performing scheduling and resource allocation simultaneously have been proposed  
! a
methodology to solve a scheduling problem in a dimensional design space including the usual

area and schedule length dimensions plus the clock length dimension using module libraries has
been described using ILP  Hwang et al  proposed dierent ILP formulations for dierent
classical scheduling problems Their formulations are based on two main constraints resource
constraints which are dened by the user and precedence constraints which are given by a DFG
The variables and constants they used were the following
 xi j   if operation oi is scheduled into cycle j! otherwise xi j  
 T is the nal number of cycles that we wish to minimize and Nt is the number of resources
of type t
 s is an overestimation of T  obtained by a list scheduling heuristic
 Li  resp Si is the latest  resp earliest possible time to schedule operation oi The schedul
ing is a classical ALAP scheduling considering that we have s cycles
We keep the same conventions and extend them to our specic problem if oi is a classical
operation  addition subtraction multiplication it is also related to variable xi j with j  Si Li
Our model inserts a virtual conversion ok after each operation Therefore we need a new variable
xk j representing the conversion
The operand types depend on the presence of preceding converters the operator type depends
on the presence of the following converter The link between converters and operators is handled
during resource constraints therefore we introduce new variables ci j counting the number of
redundant operands of addition oi  ie the number of resources used see Table 
  The formulation
Our formulation of a resource constraint scheduling problem using mixed arithmetic is presented
in Fig 
In order to simplify the explanation of the constraints one should keep in mind that
PLi
jSi j xi j
is equal to the cycle where oi is nally scheduled Therefore
Dk j 
LkX
jSk
j xk j 
LiX
jSi
j xi j
is the number of cycles between the schedules of ok and oi If Dk i   ok and oi are scheduled at
the same cycle To simplify the notation we use oti to express that operation oi is of type t Thus
 oconvp  o
conv
q   oaddi expresses that op and oq are two converters preceding  preceding means
that there is a data dependency an operation oi whose type is not addition The formulation can
be decomposed as follows
Temporal constraints
Equation  expresses that T is the last cycle of the scheduling  and is naturally the value that
should be minimized
Equation  expresses that a regular operation is scheduled only once
Equation 
 expresses that a virtual conversion may not be scheduled at all

xi j  N ci j  N
Minimize T  
Temporal constraints
If oi is a conversion
LiX
jSi
xi j    

Else
LiX
jSi
xi j    

oi without successors
LiX
jSi
 j xi j  T  
Resource constraints

j   s
X
oadd
i
ci j  Nadd  

j   s
X
oautre
i
xi j  Nautre  
Calculation of the ci j

j  Si Li
j  X
kSp
xp k 
j  X
kSq
xq k  
xi j  ci j 
  oconvp  oconvq   oaddi  

j  Si Li
j  X
kSp
xp k 
j  X
kSq
xq k  xi j 
 oconvp  oconvq   oaddi  
Data dependency constraints
LkX
jSk
j xk j 
LiX
jSi
j xi j   
 oaddi  oconvk  
LkX
jSk
j xk j 
LiX
jSi
j xi j   Li   
LkX
jSk
xk j  Li 
oaddi  oconvk  

 
LkX
jSk
j xk j 
LiX
jSi
j xi j 
LiX
jSi
ci j 
 oaddi  oconvk  

 
LkX
jSk
j xk j 
LiX
jSi
j xi j 
LiX
jSi
ci j   


 Li   
LkX
jSk
xk j   
 oaddi  oconvk
Figure  ILP formulation of the scheduling problem using mixed arithmetic


Resource constraints
Equation  expresses the resource constraint for additions as the number of resources used by an
addition oi at cycle j is equal to ci j
Equation  expresses the resource constraint for operations that are not additions including con
versions as the number of resources used by such an operation oi at cycle j is equal to xi j
Calculation of the ci j
op and oq are the two virtual converters preceding oi
Pj  
kSp xp k is equal to  if op is converted
before cycle j Thus the left side of equations  and  K is equal to the number of converters
preceding oi and scheduled before cycle j In other words K is the number of NR operands of oi
at cycle j If oi is an addition scheduled at cycle j then ci j  
  K If oi is not an addition
there should be at least one NR operand and thus K   As xi j   if operation oi is scheduled
at cycle j equations  and  express these two situations
Data dependency constraints
Equations    and 
 express the data dependencies between operations oi and ok  oi precedes
ok These equations are quite particular because virtual additions and virtual conversions may
not be scheduled at all changing operation precedence
If oi is not an addition and ok is not a conversion the data dependency equation   is the same as
Hwangs one It expresses that there should be at least one cycle between the oi and ok schedules
If ok is a conversion the previous equation is false when ok is not scheduled at all  ie 
j xk j  
Equation  xes this problem if ok is scheduled  ie jnxk j   equation  is equivalent to
equation  If ok is not scheduled at all  is always true
If oi is a virtual addition  
j ci j   an operation  except conversions succeeding oi could be
scheduled at the same cycle as oi In such a case equation  is equivalent to Dk i   which is
the correct expression If oi is not a virtual addition equation  is equivalent to equation  as we
have integral variables
Equation 
 is a mixture of equations  and  when oi is an addition and ok a conversion
Feedback outputs
Our formulation easily handles outputs that are fed back  such as P and Q in Fig  which
means that if a feedback output is converted the related primary input will be considered as
NR For instance the input of the subtracter o  can come from the converter  o succeeding the
multiplication In such a case o could be scheduled at any cycle between  and 
Multicycle and pipelined operations
The formulation does not handle multicycle operations However the extension is not di cult
as there are no specic arithmetic problems Equations related to data dependencies and resource
constraints have to be modied For example if Ki is the number of cycle needed for operation oi
Equation  does not give exactly the normal values to the ci j  However we have shown 
 that it does not
prevent to nd the optimal solution and reduce the complexity of the formulation

in the case of a multiplication followed by a conversion the equation becomes 
oaddi  oconvk 
LkX
jSk
j xk j 
LiX
jSi
j xi j   Li Ki 
LkX
jSk
xk j  Li
In the case of a multiplication if li is the latency of the multiplication  li  Ki if the operation
is not pipelined li   if there is a pipeline for each level of the multiplier the resource constraint
equation becomes

j   s
X
omult
i
jX
kj  li
xi k  Nmult
Figure  shows a possible result of our linear program The graph on the left has been scheduled
using one subtracter one multiplier and one converter the addition has one constant  thus NR
input and could use the output of converter   which represents the converted output of subtracter
  Thus this addition has two NR inputs  ie it is virtual which allows the subtracter  
 to
be scheduled into the same cycle Only two conversions were nally scheduled whereas there are
four operations
P Q
CS-NR->CS
CS-NR->CS
   
   


  
cte
cte
P Q
P Q
x3,3=0
x3,2=1
x3,1=0
x2,3=1, c2,3=2
x4,1=c4,1=0
x1,1=1, c1,1=1
x4,3=c4,3=0
P Q
+
cte cte
2
34 *
-
- 1
6
5
8
7
5
6
x4,2=1, c4,2=0 *
Cycle 1
Cycle 2
Cycle 3
Figure  Possible result of our linear program  right for the scheduling of the DFG presented on
the left
 Results
Our ILP formulation has been tested using LP SOLVE  see Table  The results are optimum
and the computation times remain small for small examples  the th order lter examples will be
discussed later Moreover even with small examples the ILP approach is very useful particularly
when it comes to consider feedback outputs which is a very di cult problem to deal with using
the heuristic approach  as it is a greedy algorithm
The computation time needed to solve an ILP formulation increases with the number of equa
tions and the number of variables This is not an absolute measure as large formulations can
sometimes be solved very quickly whereas smaller ones can take a huge amount of CPU time
However it gives quite a good estimation of this computation time Concerning scheduling these
two values  number of equations and number of variables depend on the number of operations n
and on the initial bound of the number of cycles s In our case both grow in O s	n The s value
is a large overestimation and considering the operation frames  dierence between Li and Si is
more accurate The largest examples could not be solved  see Table  thus we have looked for
solutions to solve high complexity benchmarks The next section addresses this problem

benchmarks Nb equ Nb var Nb op Nb cycles CPU

 way method  
   
 s
Fast Fourier Transform      s
Dierential equation  
    s
 way method 
  
  no sol
 way method classic  
 
   s
th order elliptic wave lter  EWF     no sol
th order EWF reduced form  
    no sol
th order EWF reduced form 
 
    no sol
th order EWF classic      min  s
Table  ILP results using LP SOLVE
 Overcoming the problem of drastic ILP computation time
ILP solvers usually work by relaxation A classical one is the relaxation into linear program  LP
the ILP is transformed into a LP which can be solved in polynomial time According to the integral
values of the LP result the initial ILP is decomposed into new ILP problems which are treated in
the same way The computation is fasten by choosing which of these subproblems should be solved
by relaxation into a linear program according to bound obtained by previous results  Branchand
Bound algorithm If the rst relaxation gives only integral results the ILP is solved It shows that
the number of variables and the number of equations is not an absolute measure of the complexity
of ILP resolution However large number of equations and large number of variables lead to large
linear program and generally to large number of LP resolutions and thus large computation
times The problem can even be worth as linear programs are solved by numerical algorithms
that are very dependent to accuracy large programs lead to bad accuracy that compromises the
stability of the algorithm Problems like the th order EW Filter could not be solved because of
this numerical instability We have applied some simple classical heuristics in order to reduce the
variable time frames  Si Li but it did not solve the problem neither  the reduced formulations
are presented in Table 
ILP is widely used in various other domains though and not only in order to solve small prob
lems For example  uses ILP for throughput and latency optimization when algorithmarchitecture
matching retiming and pipelining are considered simultaneously ILP is also used for DSP code
generation and embedded systems For instance 
 gives a solution to the problem of code com
paction with realtime constraints for processors oering instructionlevel parallelism!  presents
an ILPbased code placement method for embedded software to maximize hit ratios of instructions
caches ILP is also widespread for HWSW partitioning   
 In the eld of system level syn
thesis one can also cite  
 which deal with the optimization of heterogeneous multiprocessor
systems Another example is  where a static task execution schedule is generated along with
the structure of the multiprocessor system and with a mapping of subtasks to processors
Even if our ILP solver was not a professional one the problem we had is usual when deal
ing with ILP formulations Thus instead of looking for a solution specic to our scheduling to
overcome this problem we have studied a general methodology based on partitioning Some par
titioning techniques have been proposed in the literature For instance Hwang et al experimented
an approach  called zone scheduling They partition the set of cycles into zones and decides

which operation will be scheduled into a zone and which one will be delayed into the next zone
Their model can be turned into an optimal ILP scheduling a list scheduling or one in between
However their goal is more to nd better solutions with comparable computation times than
those achieved by list scheduling rather than nding near optimal solutions when ILP does not
succeeds with possibly still large computation times Depuydt et al have a solution based on
clustering techniques  They do not take into account the resource constraints but variable time
frames to reduce the register cost Moreover none of these methods takes into account the ILP size
In Section  we propose a general solution which partitions the problem into several small ILP
formulations separately solved and taking all the constraints into account Section  discusses
the results and extensions of this method
 Partitioning
	 Partitioning methodology
The initial DFG is partitioned into k parts  we call this a kpartition The kpartition of DFG 
 VE  with k as small as possible denes k data ow graphs DFGi   Vi Ei such that V 
V V    Vk and ViVj   if i  j Each partition can be considered as a separate design and
is scheduled using a separate ILP formulation We obtain several optimal local schedules which
are concatenated in order to obtain the nal global schedule The main di culty is to nd a
partitioning algorithm as there are two constraints to deal with all the interdependencies between
partitions and their size
The problem of partition interdependencies can be stated as follows if there is a constraint
between two operations there is an equation in the initial ILP formulation using variables related
to these operations If the two operations are in dierent partitions the initial equation is splitted
into two new equations  one for each formulation The result obtained with these equations will be
consistent with the authorized values dened by the initial equation However it may prevent to
nd an optimal global solution We call this a constraint violation Obviously their number should
be minimized Therefore we propose a general approach based on the ILP formulation which
consists in partitioning the set of operations each partition violating as few constraints as possible
 either data dependency or resource ones or  and being balanced in terms of ILP variables
Considering a simple DFG would not be satisfactory as a DFG only reects data dependency
constraints  see for instance the thorderfilter DFG in Fig  Our partitioning is based on a
reduced constraint graph extracted from the ILP formulation whose vertices represent operations
and edges represent constraints between operations Performing minimum edge cut partitioning
creates partitions with few constraint dependencies As each partition leads to an optimal partial
schedule the nal schedule obtained by the concatenation of the partial schedules is likely to be
a good approximation of the optimal one
A kpartition
In order to determine the best value of k one could iteratively try several decreasing values until
it leads to an infeasible ILP formulation starting with an npartition if all the partitioned
formulations can be solved try with a  n  partition and so on This solution is realistic
as the computation times are largely decreased using the partitioning method  see below and
particularly the comparison between benchmarks that had a solution with the whole formulation
and their partitioned solution Moreover one usually knows an approximate number of variables
 Nmax that his solver can handle Thus an e cient solution is to determine directly as a starting

Figure  th order 	lter data ow graph DFG
value a number of partition which has a good chance to be the optimal  for example it would beP
i
Li Si 
Nmax

with the scheduling problem
The outputs of a partition become the inputs of the following partition and they can be
represented using a redundant number system As our modelisation considers that the inputs are
non redundant we had to perform a small pretreatment to the new formulations to emulate these
redundant inputs This implies that when scheduling a partition all the precedent partitions
must have already been scheduled in order to know which input is redundant The inputs number
representations can be taken into account during partitioning  to give more accurate informations
on ILP size by making a bipartition after each local schedule rather than an initial kpartition
The method presented here used this bipartitioning However the method can be very easily
extended to direct kpartitions
Considering the data ow graph DFG   VE such that the ILP formulation related to DFG
could not be solved the partitioning method is described below  reduced constraint graphs are
dened in the next section! S RCGi is the size of the reduced constraint graph RCGi
We are dealing with partition i
DFG  to DFGi   have been scheduled
Built the reduced constraint graph RCGi

Figure  th order 	lter reduced constraint graph RCG
RCGi   RCVi RCEiWi where
RCVi  V n fV   V       Vi  g
Perform a bipartition on RCGi with minimum
edge cut of partition sizes
SRCGi
k i
and k i  SRCGi
k i

Partitioning with minimum cut is known to be a NPcomplete problem  but there are some
e cient heuristics  
 Our method has been implemented using the Fiduccia and Mattheyses
heuristic which is an improvement of Kernighan and Lin MinCut heuristic
As edges represent constraints the idea behind mincut partitioning is to minimize the con
straint violations Thus this algorithm is e cient if RCG is a good representation of the dierent
constraints We will now address the problem of the reduced constraint graph denition
  Reduced constraint graph
We have looked for a denition of the reduced constraint graph RCG that would not depend on
any particular problem Nevertheless there are a few observations that the graph should match
 The input of the ILP formulation is a DFG Thus the graph vertices represent operations of
the DFG

benchmarks Nb equ Nb var Nb op Nb cycles CPU time

 way method partition      s

 way method partition 
 
 
  
 
s

 way method optimal  
   
 s
Fast Fourier Transform partition  
    s
Fast Fourier Transform partition 
     s
Fast Fourier Transform optimal      s
Dierential equation partition      s
Dierential equation partition 
     s
Dierential equation optimal  
    s
way method partition  
     hr  min
way method partition 
 
    
 min  s
way meth Classic partition      s
way meth Classic partition      s
 way meth Classic optimal  
 
   s
thO Elliptic wave lter partition  
 
    hr  min
thO Elliptic wave lter partition 
     
 hrs  min
thO Elliptic wave lter partition  
    
hrs 
 min
thO EWF Classic partition      s
thO EWF Classic partition 
     
 s
thO EWF classic optimal      min  s
Table  ILP results using LP SOLVE after ILP based partitioning
 Our goal is to create partitions whose ILP formulations would take comparable computation
times Operations are related to equations and variable which are our measure of ILP com
putation time As every operation does not have the same inuence over the ILP computation
time  some are related to more equation andor variables than others the vertices must have
a weight w ej reecting their inuence over this computation time
 Edges must represent constraints and any constraint must be represented In fact this
solution should not even be related to a scheduling problem but more generally to the
problem of resolving large ILP formulations
We based our solution on a graph used by Pan Dong and Liu  to solve a problem of constraint
reduction in symbolic layout compaction from a set S of linear programming constraints of the
form xi  xj  b  we will say that x  cn if variable x appears in constraint cn they create a
directed graph G   VE such that each variable xi which appears in S is related to a vertex
vi  V  and such that each constraint cn  xixj  b cn  S is related to an edge e  vi  vj e  E
of weight b From this graph they solve a problem of subgraph reduction  ie nding an equivalent
graph with less edges
We have extended this representation to ILP from a constraint
P
j ai j xj  bi where xj
represents a variable related to operation Op xj we construct a complete graph CG   CVCE
where each ILP variable xj is related to a vertex vj  CV  It makes a constraint graph of variables
From this graph we perform a clustering phase which creates sets Ci  fxpjOp xp  oig  Ci

in
+
*
++
* +
++
+
+ +
*
*
*
t2 t13 t18 t26 t38 t33 t39
*
++*
+
+
+
+
+
+
*
+
+
Cycle 1
Cycle 2
Cycle 3
Cycle 4
Cycle 5
Cycle 6
Cycle 7
Cycle 8
Cycle 9
Cycle 10
Cycle 11
Cycle 12
Cycle 14
Cycle 13
Cycle 15
Cycle 16
   
   


: NR 
: R 
: Conv. node
    
    


   
   
   



   
   
   



   
   


    
    


   
   


   
   


   
   


    
    


   
   


   
   


   
   


   
   


    
    
    



   
   
   



   
   


 
 
 



t2 t13 t18 t26 t33 t39
cst
cst
cst
cst
out
t38
cst
cstcst
cst
P ’2P ’1
P ’3
P3
P2
P1
Figure  Result of the th order 	lter scheduling after a partition  partitions are P P
 and P
with a constraint graph partition and P P
 and P with a DFG partition
contains all the variables related to operation oi in order to get a graph of operations This
denes a reduced constraint graph RCG   RCVRCEW  as follows
From a set ILP  of ILP constraints
 
i if cn  ILP j  N jxj  cn Op xj  oi we construct a vertex Vi  RCV  weighted by
w Vi  jCij
 If cn  ILP j  j  N jxj   cn xj  cn  oi   Op xj    oi  Op xj we construct
an edge ei  i  RCE between Vi  and Vi 
This denition ts the previous observations as RCG is an operation graph whose vertices
are weighted by the number of ILP variables linked to an operation which has a great inuence
over the ILP computation time Furthermore the edges are constructed by each ILP constraint
explicitly and equally treated This method could be used for other problems than scheduling ones
The only condition is that ILP formulations have to be generated by acyclic graphs which is not a
severe limitation Fig  shows the reduced constraint graph for the th order 	lter design One can
see that data dependency constraints  ie the DFG Fig  are far from being the only constraints
of the problem


 Results
We have tried this solution with the th order elliptic wave 	lter using a partition for our
scheduling and operator type selection problem The th order 	lter partition of Fig  has
been obtained It denes partitions P P
 and P and the resulting scheduling has the same
number of cycles as the optimal scheduling using non redundant arithmetic  the scheduling using
mixed arithmetic is most likely to be the optimal one though we can not prove it Compared to
our partitions the partitions P P
 and P were obtained with a partition based upon the
DFG instead of the reduced constraint graph Obviously the DFG based partitioning could not be
exploited as the partition is not temporal  that is to say the rst operations in the rst partition
 This is not the case of our method which always gave exploitable solutions even though it
does not introduce any information specic to a scheduling method
Examples that did not need partitioning have also been tested in order to get an idea of the
degradation compared to the optimal Only one example had more cycles than the optimal the
dierential equation design   instead of  However in this case the extra cycle was due to the
junction between the two partitions the last cycle of the rst partition did not use all its resources
whereas an operation scheduled on the rst cycle of the next partition could be scheduled one cycle
before and use one of these available resources A simplied list scheduling managed to nd the
optimal result without changing the global scheduling the algorithm checks if an operation could
not be scheduled one cycle before It can be considered as a smart concatenation Another
interesting solution is to apply replication formulations to the partitioning  

 The critical
operations  ie scheduled on the last cycle are duplicated and introduced in the next partition
The concatenation is then made automatically
All the others examples managed to nd the optimal schedule Besides on every benchmark
the CPU time is largely decreased  see Table  This is particularly impressing with large examples
  and 
 times faster for the way method and the thorderfilter
 Conclusion
We have introduced a methodology to use redundant number systems and operators in order to
fasten designs without large increase in area thanks to the use of other kinds of arithmetic  non
redundant ones An ILP formulation has been proposed that nd an optimal solution for examples
of reasonable size An solution based on the partitioning of a constraint graph has been proposed
in order to overcome the problem of possible drastic ILP computation time
Acknowledgment
We would like to gratefully thank Regis Leveugle and Xavier Wendling from CSIINPG and Habib
Mehrez Nicolas Vaucher from MASIParis VI for their experimentations of mixed operators on
various technologies and Alain Darte for his constructive comments
References
 A Avizienis Signeddigit number representations for fast parallel arithmetic IRE Transac 
tions on electronic computers pages " 



 A Bender Milp based task mapping for heterogeneous multiprocessor systems In proceedings
of the European Design Automation Conference  EuroDAC 
 N N Binh M Imai and A Shiomi A new hwsw partitioning algorithm for synthesizing the
highest performance pipelined asips with multiple identical fus In proceedings of the European
Design Automation Conference  EuroDAC 
 WS Briggs and DW Matula A x multiply and add unit with redundant binary feedback
and single cycle latency In E Swartzlander MJ Irwin and G Jullien editors Proceedings
of the th Symposium on Computer Arithmetic 
 PK Chan VG Oklobdzija MDF Schlag and CD Thomborson Delay optimization of
carryskip adders and block carrylookahead In proceedings of the th IEEE Symposium on
Computer Arithmetic pages " June 
 S Chaudhuri S A Blythe and R A Walker An exact methodology for scheduling in a d
design space In Proceedings of the International Symposium on System synthesis ISSS 

 F Depuydt G Goossens and H De Man Clustering techniques for register optimization
during scheduling preprocessing In Louise Goto Satoshi! Trevillyan editor Proceedings of
the IEEE International Conference on Computer Aided Design pages 
"
 Santa Clara
CA November  IEEE Computer Society Press
 YG DeCasteloVide e Souza MPotkonjak and A C Parker Optimal ilpbased approach for
throughput optimization using simultaneous algorithmarchitecture matching and retiming
In Proceedings of the nd Design Automation Conference 
 CM duccia and RM Mattheyses A lineartime heuristic for improving network partitions
In Proceedings of the th Design Automation Conference 

 MR Garey and DS Johnson Computers and Intractability A Guide to the Theory of NP 
Completeness Ed WH Freeman and Co NewYork 
 C H Gebotys and M I Elmasry Optimal synthesis of highperformance architectures IEEE
Journal of Solid State Circuits 
  


 JL Hennessy and DA Patterson Computer Architecture A Quantitative Approach Morgan
Kaufmann Publishers 
 B Hochet AGuyot and JM Muller A way to build e cient carryskip adders IEEE
Transactions on Computers c  October 
 CT Hwang and YC Hsu Zone scheduling IEEE Transaction on Computer Aided Design
of Integrated Circuits and Systems 
 
" 
 CT Hwang JH Lee and YC Hsu A formal approach to the scheduling problem in high
level synthesis IEEE Transaction on Computer Aided Design  " April 
 J Hwang and A El Gamal Optimal replication for mincut partitioning In Proceedings of
ICCAD 

 IKoren Computer Arithmetic Algorithms Prentice Hall Englewoods Clis NJ 



 JSklansky Conditionalsum addition logic IRE Transactions on Electronic Computers EC


"
 
 I Karkowski and RHJM Otten An automatic hardwaresoftware partitioner based on the
possibilistic programming In proceedings of the European Design and Test Conference 
EDTC 

 BW Kernighan and S Lin An e cient heuristic procedure for partitioning graphs Bell
System Technical Journal  

 Peter Kornerup Digitset conversions Generalizations and applications IEEE Transactions
on Computers  

"
 May 


 C Kring and AR Newton A cellreplicating approach to mincutbased circuit partitioning
In Proceedings of ICCAD 

 B Landwehr P Marwedel and R D#omer Oscar Optimum simultaneous scheduling alloca
tion and resource binding based on integer programming In proceedings of the EuroDAC


 T Lang E Musoll and JCortadella Redundant adder for reduced output transitions In
Proceedings of the XI Conference on Design of Integrated Circuits and Systems 

 M Lehman and N Burla Skip techniques for highspeed carry propagation in binary arith
metic units IRE Transactions on Electronic Computers page  December 

 R Leupers and P Marwedel Timeconstrained code compaction for dsps In proceedings of
International Symposium on system synthesis  ISSS 

 CN Lyu and David W Matula Redundant binary booth recoding In S Knowles and WH
McAllister editors Proceedings of the th Symposium on Computer Arithmetic 

 C Mazenc Systemes de representation des nombres et arithmetique sur machines paralleles
PhD thesis Ecole Normale Superieure de Lyon 

 A Mignotte JM Muller and O Peyran Mixed arithmetic and operations research report
 LIP Ecole Normale Superieure de Lyon 
 Anne Mignotte JeanMichel Muller and Olivier Peyran Mixed arithmetics Introduction and
design structure In proceedings of MPCS 
 RK Montoye EHokonek and SL Runyan Design of the oatingpoint execution unit of
the ibm risc system IBM Journ of Res and Dev  " 

 R Niemann and P Marwedel Harwaresoftware partitioning using integer programming In
proceedings of the European Design and Test Conference  EDTC 
 Peichen Pan SaiKeung Dong and CL Liu Optimal graph constraint reduction for symbolic
layout compaction In Proceedings of the th Design Automation Conference 
 Olivier Peyran Synthese darchitectures integrees utilisant des arithmetiques redondantes PhD
thesis INPG 


 Dhananjay S Phatak and Israel Koren Hybred signeddigit number systems A unied frame
work for redundant number representations with bounded carry propagation chains IEEE
Transactions on computers  " August 
 S Prakash and A C Parker Sos Synthesis of applicationspecic heterogeneous multipro
cessor systems Journal of Parallel and Distributed Computing  "  

 M Schwiegershausen and P Pirsch A system level design methodology for the optimization
of heterogeneous multiprocessors In proceedings of the International Symposium on System
Level Synthesis  ISSS 
 H Tomiyama and H Yasuura Otpimal code placement of embedded software for instruction
caches In Proceedings of the European Design and Test Conference EDTC 


