Design method for numerical function generators using recursive segmentation and EVBDDs by Nagayama, Shinobu et al.
Calhoun: The NPS Institutional Archive
Faculty and Researcher Publications Faculty and Researcher Publications Collection
2007-12
Design method for numerical function
generators using recursive segmentation and EVBDDs
Nagayama, Shinobu
S. Nagayama, T. Sasao, and J. T. Butler, "Design method for numerical function
generators using recursive segmentation and EVBDDs," IEICE Transaction on
Fundamentals of Electronics, Communications and Computer Sciences, Vol.E90-A,
No.12, Dec. 2007, pp.2752-2761.
http://hdl.handle.net/10945/35831
2752
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.12 DECEMBER 2007
PAPER Special Section on VLSI Design and CAD Algorithms
Design Method for Numerical Function Generators
Using Recursive Segmentation and EVBDDs∗
Shinobu NAGAYAMA†a), Tsutomu SASAO††b), and Jon T. BUTLER†††c), Members
SUMMARY Numerical function generators (NFGs) realize arithmetic
functions, such as ex, sin(πx), and √x, in hardware. They are used in ap-
plications where high-speed is essential, such as in digital signal or graph-
ics applications. We introduce the edge-valued binary decision diagram
(EVBDD) as a means of reducing the delay and memory requirements in
NFGs. We also introduce a recursive segmentation algorithm, which di-
vides the domain of the function to be realized into segments, where the
given function is realized as a polynomial. This design reduces the size
of the multiplier needed and thus reduces delay. It is also shown that an
adder can be replaced by a set of 2-input AND gates, further reducing de-
lay. We compare our results to NFGs designed with multi-terminal BDDs
(MTBDDs). We show that EVBDDs yield a design that has, on the av-
erage, only 39% of the memory and 58% of the delay of NFGs designed
using MTBDDs.
key words: edge-valued binary decision diagrams (EVBDDs), recursive
segmentation, piecewise polynomial approximation, numerical function
generators (NFGs), programmable architecture
1. Introduction
The computation of arithmetic functions, such as trigono-
metric, logarithmic, square root, and reciprocal functions,
has a long history. More than 150 years ago, Charles Bab-
bage designed the diﬀerence machine to compute polynomi-
als that could be used to approximate other functions, such
as logarithms [27]. The introduction of electronic general
purpose computers and special languages like FORTRAN
improved upon the speed and ease at which arithmetic func-
tions could be calculated. Well into the beginning of the 21st
century, the goals remain the same — to compute functions
at high-speed and with relative ease by the user.
In this paper, we propose a design of a programmable
numerical function generator (NFG) that computes an arith-
metic function in fixed-point representation. It takes advan-
tage of large quantities of inexpensive, programmable logic
Manuscript received March 5, 2007.
Manuscript revised June 12, 2007.
Final manuscript received July 27, 2007.
†The author is with the Department of Computer Engineering,
Hiroshima City University, Hiroshima-shi, 731-3194 Japan.
††The author is with the Department of Computer Science and
Electronics, Kyushu Institute of Technology, Iizuka-shi, 820-8502
Japan.
†††The author is with the Department of Electrical and Computer
Engineering, Naval Postgraduate School, Monterey, CA 93943-
5121 USA.
∗This paper is an extension of [16].
a) E-mail: nagayama@ieee.org
b) E-mail: sasao@cse.kyutech.ac.jp
c) E-mail: jon butler@msn.com
DOI: 10.1093/ietfec/e90–a.12.2752
available in modern FPGAs. Because of FPGAs, there has
been recent interest in NFGs that realize polynomials that
approximate the given function [4], [6]–[8], [18], [25], [26].
In the past, designs have used uniform segments across the
function’s domain, where, in each segment, a (generally
diﬀerent) polynomial is used to realize the function. By
decreasing the segment size, any desired accuracy can be
achieved. Accuracy also depends on the order of the polyno-
mial used in the approximation. Linear [26] and higher or-
der approximations have been considered [4], [6]–[8], [18],
[25]. Linear approximation and uniform segmentation are
well suited for some functions like 2x, sin(πx), and cos(πx),
but are not appropriate for other functions like
√− ln(x) and
the entropy function −(x log2(x)+ (1− x) log2(1− x)). For
such functions, non-uniform segmentation produces realiza-
tions with the desired accuracy [3]. In this case, segments
are chosen to be as wide as possible while still achieving the
specified accuracy. As a result, the segment width is adapted
to the local characteristics of the function — wide segments
where the function is nearly linear and narrow segments
where the function is nonlinear. It follows that this yields
the fewest segments needed to achieve the given accuracy.
Since the coeﬃcients of the approximating polynomial are
stored in local memory, non-uniform segmentation oﬀers a
way to reduce the memory requirements of an NFG realized
by a memory-constrained FPGA.
In this paper, we propose a new segmentation algorithm
and a new programmable architecture. Specifically, we pro-
pose the edge-valued binary decision diagram (EVBDD) as
a way to design a programmable circuit that maps a given X
into a segment, where the function f (X) is realized by a spe-
cific polynomial. We also propose a recursive segmentation
algorithm that produces segments whose widths are chosen
especially to simplify the hardware, while adapting to the
degree of nonlinearity of f (X). This approach is a hybrid
of an approach [10], [11] that uses a special (non-optimum)
non-uniform segmentation and another [22], [24] that uses
the optimum non-uniform segmentation.
This paper is divided as follows. The next section in-
troduces preliminary concepts, including an introduction to
the EVBDD. In Sect. 3, we discuss the segmentation of the
domain, including a new recursive segmentation method. In
Sect. 4, we discuss the architecture of our proposed NFG.
Experimental results are shown for the realization of vari-
ous functions in Sect. 5. Here, we compare our results to
another programmable architecture that is designed using
the multi-terminal BDD (MTBDD). Finally, in Sect. 6, we
Copyright c© 2007 The Institute of Electronics, Information and Communication Engineers




2.1 Number Representation and Error
The value of the input variable X and the function value f (X)
are represented in fixed point. Specifically,
Definition 1: X has a fixed-point representation X =
(xl−1 xl−2 . . . x1 x0. x−1 x−2 . . . x−m)2, where xi ∈ {0,1}, l
is the number of bits in the integer part, and m is the num-
ber of bits in the fractional part. Each bit xi contributes 2i xi
to the value of X except, xl−1, which contributes −2l−1xl−1.
That is, the fixed-point representation is in 2’s complement.
Definition 2: Error is the absolute diﬀerence between the
exact value and the value produced by the hardware. Ac-
ceptable error is the maximum error that an NFG may as-
sume; it is usually a specification to be satisfied by the hard-
ware. Approximation error is the error caused by a func-
tion approximation. Acceptable approximation error is
the maximum approximation error that a function approxi-
mation may assume. Rounding error is the error caused by
removing certain least significant bits either by rounding or
by truncation.
Definition 3: Precision is the total number of bits for a
binary fixed-point representation. Specifically, n-bit pre-
cision specifies that n bits are used to represent the number;
that is, n = l+m. We assume that an n-bit precision NFG
has an n-bit input.
Definition 4: Accuracy is the number of bits in the frac-
tional part of a binary fixed-point representation. m-bit ac-
curacy specifies that m bits are used to represent the frac-
tional part of the number. When the maximum error is 2−m,
the accuracy can be expressed as 1 unit in the last place
(ULP). In this paper, an m-bit accuracy NFG is an NFG
with an m-bit fractional part of the input, an m-bit fractional
part of the output, and a 1 ULP error.
2.2 Edge-Valued Binary Decision Diagram
Definition 5: A binary decision diagram (BDD) [2] is a
rooted directed acyclic graph representing a logic function:
{0,1}n → {0,1}. The BDD is obtained by repeatedly apply-
ing the Shannon expansion to the logic function. Each func-
tion, including the original function and all sub-functions re-
sulting from applying the Shannon expansion, is represented
by a non-terminal node, unless that function is a trivial func-
tion, 0 or 1, in which case, it is represented by a terminal
node. A non-terminal node has two outgoing edges, a 0-
edge and a 1-edge, that correspond to the values of input
variables. A terminal node has no outgoing edges.
Definition 6: A multi-terminal BDD (MTBDD) [5], [19]
(a) Function table.
(b) MTBDD. (c) EVBDD.
Fig. 1 MTBDD and EVBDD for an integer function.
is an extension of the BDD, and represents an integer func-
tion: {0,1}n→ Z, where Z is a finite set of integers. Specif-
ically, it is a BDD in which the terminal nodes are not re-
stricted to 0 and 1. Rather, they are labeled by integer val-
ues.
Definition 7: An edge-valued BDD (EVBDD) [9], [19] is
an extension of the BDD, and represents an integer function.
An EVBDD consists of one terminal node representing 0
and non-terminal nodes with a weighted 1-edge, where the
weight is an integer. Note that, in the EVBDD, 0-edges have
weight 0.
Example 1: Figures 1(b) and (c) show an MTBDD and an
EVBDD for the integer function f defined by Fig. 1(a). In
Figs. 1(b) and (c), dashed lines and solid lines denote 0-
edges and 1-edges, respectively. Note that the EVBDD has
weighted 1-edges. In the MTBDD, terminal nodes repre-
sent function values. Thus, to evaluate the function, we tra-
verse the MTBDD from the root node to a terminal node
according to the input values, and obtain the function value
(an integer) from the terminal node. On the other hand, in
the EVBDD, we obtain the function value by summing the
weights of the edges traversed from the root node to the ter-
minal node. (End of Example)
3. Piecewise Polynomial Approximation Based on Non-
uniform Segmentation
3.1 Uniform and Non-uniform Segmentations
The realization of a function f (X) in hardware is done by
dividing the domain X of the function into segments. We
2754
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.12 DECEMBER 2007
choose non-uniform segments, which means that each seg-
ment width is chosen so that the given acceptable approxi-
mation error εa is just met. Therefore, if the function is close
to linear in a linear approximation, then a wide segment oc-
curs. And, if the function is highly nonlinear, a narrow seg-
ment occurs. In each case, the maximum error in the seg-
ment is εa. Finding an optimum segmentation is an impor-
tant part of the design process. Within each segment, a poly-
nomial is used to approximate f (X) in that segment. If the
segment is suﬃciently small, the polynomial will approxi-
mate f (X) to the desired accuracy. For example, if the seg-
ment is very small, even a linear approximation will be suf-
ficiently accurate. However, when the segment size is small,
too many segments may be required, and there may not be
enough memory to store all the required coeﬃcients. Thus,
for memory-constrained implementations (e.g. FPGA), it is
important to reduce the number of segments while achiev-
ing the desired accuracy. There are two methods to reduce
the number of segments.
One uses a higher order polynomial to approximate the
function. In general, a higher order polynomial results in
larger segments, and so reduces the number of segments.
However, for certain functions like
√− ln(X) and the en-
tropy function, just using a higher order polynomial cannot
reduce the number of segments eﬀectively. Most of existing
methods [4], [6]–[8], [18], [25], [26] use uniform segmenta-
tion, which partitions the domain into segments with the
same size. In such a segmentation, the most significant bits
of X are used to specify a segment, and the least significant
bits determine a point within that segment. The size of all
segments is the same as the smallest segment size needed to
achieve the desired accuracy. Therefore, depending on func-
tions, uniform segmentation can yield too many segments
even if a higher order polynomial is used [15], [17].
To reduce the number of segments for such functions,
there is another method, non-uniform segmentation. In this
method, segments are chosen to be as wide as possible while
still achieving the desired accuracy. Such an optimum non-
uniform segmentation yields the fewest segments for the
given function, and so reduces memory size to store all the
coeﬃcients [15], [22], [24].
(a) Uniform segmentation. (b) Non-uniform segmentation.
Fig. 2 Uniform and non-uniform segmentations of arcsin(X).
Example 2: Figure 2 shows uniform and non-uniform seg-
mentations of arcsin(X), where X has 6-bit accuracy, the
function is approximated by quadratic polynomials, and the
acceptable approximation error is 2−8. The number of uni-
form segments is 32, while the number of non-uniform seg-
ments is only 4. (End of Example)
3.2 Recursive Segmentation Algorithm
Although non-uniform segmentation yields fewer segments
than uniform segmentation, non-uniform segmentation re-
quires an additional circuit that maps a given X into a seg-
ment. Lee et al. [11] have proposed a special non-uniform
segmentation, hierarchical segmentation, to simplify the ad-
ditional circuit. However, since their method has only four
segmentation types which simplify the additional circuit, the
generated segmentation does not always adapt to the de-
gree of nonlinearity of the given function. In this section,
to reduce both hardware complexity and the number of seg-
ments, we present a new non-uniform segmentation method,
recursive segmentation, that is a hybrid of the method [11]
and our previous method [15], [22], [24].
Figure 3 shows the recursive segmentation algo-
rithm. The inputs for this algorithm are a numerical func-
tion f (X), a domain [A,B) for X, an accuracy min of
X, a polynomial order d, and an acceptable approxima-
tion error a. Then, this algorithm produces t segments
[A,P0), [P0,P1), . . . , [Pt−2,B) by recursively partitioning a
Input: Numerical function f (X), domain [A,B) for X, accuracy min of
X, polynomial order d, and acceptable approximation error εa.
Output: Segments [A,P0), [P0,P1), . . . , [Pt−2,B).
Step:
1. For [A,B), compute the maximum approximation error εd(A,B).
2. If εd(A,B) < εa or B−A ≤ 2−min , then stop.
3. Else, partition [A,B) into two segments [A,P) and [P,B), where
P = (A+B)/2.
4. Repeat Steps 1, 2, and 3 for each new segment recursively, until
the maximum approximation errors are smaller than εa in all
segments.
Fig. 3 Recursive segmentation algorithm for the domain.
NAGAYAMA et al.: DESIGN METHOD FOR NFGS USING RECURSIVE SEGMENTATION AND EVBDDS
2755
segment into two equal-sized segments until achieving the
acceptable approximation error a in all segments. Note
that this algorithm restricts the width wi of each segment to
wi = 2hi ×2−min , where hi is an integer. That is, the segmen-
tation points Pi are restricted to values of which the least
significant hi bits are 0 (i.e., Pi = (. . . p− j+1 p− j 00 . . . 0)2,
where j = min − hi). As shown in Fig. 3, the number of
segments depends on the maximum approximation error
εd(A,B). In this paper, we use the Chebyshev approxima-
tion polynomials. For a segment [S ,E] of f (X), the max-
imum approximation error of the dth-order Chebyshev ap-
proximation εd(S ,E) is given by [12]:
εd(S ,E) = 2(E−S )
d+1
4d+1(d+1)! maxS≤X≤E | f
(d+1)(X)|,
where f (d+1) is the (d+1)th-order derivative of f .
This algorithm can be applied to any given domain
[A,B). However, a wide domain necessarily requires a large
number of segments. In this case, we can reduce the given
domain to a narrower domain by using a range reduction
technique [1], [13], as is done with existing methods based
on uniform segmentation.
3.3 Computation of the Approximate Value
For each segment, f (X) is approximated by the corre-
sponding polynomial function g(X, i). That is, the approx-
imated value of f (X) is computed by g(X, i) = Cd(i)Xd +
Cd−1(i)Xd−1+ . . .+C0(i), where i is a segment index assigned
to each segment, and the coeﬃcients Cd(i),Cd−1(i), . . . ,C0(i)
are derived from the dth-order Chebyshev approximation
polynomial [12].
For each segment [S i,Ei), since S i ≤ X < Ei holds, we
can oﬀset X by S i to compute the polynomial g(X, i). By
using the oﬀset input (X − S i) instead of X, we reduce the
size of multipliers needed to compute g(X, i). By substitut-
ing X−S i+S i for X, we transform g(X, i) as follows:
g(X, i) = Cd(i)Xd +Cd−1(i)Xd−1 + . . .+C0(i)
= Cd(i)(X−S i+S i)d
+Cd−1(i)(X−S i+S i)d−1 + . . .+C0(i)
= Cd(i)(X−S i)d
+{Cd−1(i)+dCd(i)S i}(X−S i)d−1 + . . .
. . .+C0(i).







C j+k(i)S ki ( j = 0,1, . . . ,d−1).
Then, we have
g(X, i) = Cd(i)(X−S i)d +C′d−1(i)(X−S i)d−1 + . . .
. . .+C′0(i). (1)
This transformation reduces the multiplier size (see
Sect. 4.3). That is, instead of using the entire value X, as in
the approximation Cd(i)Xd+Cd−1(i)Xd−1+ . . .+C0(i) requir-
ing the maximum number of bits to represent X, we use (1)
to approximate the function, where typically smaller num-
ber of bits is needed to realize X−S i.
4. Architecture of the NFG
Figure 4 shows the architecture of the NFG based on a 2nd-
order polynomial. As shown in Fig. 4(a), polynomials of the
form (1) are realized using a segment index encoder (SIE), a
coeﬃcient memory, circuits for (X−S i)k (k = d,d−1, . . . ,2),
multipliers, and adders. Since modern FPGAs have logic el-
ements, synchronous memory blocks, and dedicated multi-
pliers, this architecture is eﬃciently implemented by those
hardware resources in an FPGA. This architecture can real-
ize any non-uniform segmentation. However, when recur-
sive segmentation is used, we can realize X − S i using 2-
input AND gates instead of an adder. As mentioned in the
previous section, the least significant hi bits of S i are 0, and
X−S i < 2hi ×2−min (i.e. xl−1 = sl−1, xl−2 = sl−2, . . . , x− j = s− j
and s− j−1 = s− j−2 = . . . = s−min = 0 because of the way S i is
chosen.) Therefore, X − S i has 1’s only in the least signifi-
cant hi bits, and these 1’s occur in exactly the same position
as the 1’s in X. Thus, as shown in Fig. 4(b), we realize X−S i
using AND gates driven on one side by S i, the complement
of S i. The SIE converts X into a segment index i. It re-
alizes the segment index function seg f unc(X) : {0,1}n →
{0,1, . . . , t− 1} shown in Fig. 5(a), where X has n bits, and t
denotes the number of segments.
4.1 Architecture of the SIE
Figure 5(b) shows an LUT cascade [22], [23] that realizes
seg f unc(X). The LUT cascade is obtained by functional
decomposition using an MTBDD for seg f unc(X) [20],
[21], and can realize any seg f unc(X), where the size of
the LUT cascade depends on the number of segments. [15]
Fig. 4 Architecture of the NFG based on 2nd-order polynomials.
2756
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.12 DECEMBER 2007
Fig. 5 Segment index encoders.
has shown that this size can be reduced by reducing the
number of segments. This section presents a new pro-
grammable architecture for the SIE that reduces the size and
delay time. Figure 5(c) shows the new architecture. To re-
alize seg f unc(X) using the SIE in Fig. 5(c), we represent
seg f unc(X) using an EVBDD. And then, by decomposing
the EVBDD, we obtain the SIE that consists of an LUT cas-
cade and adders. In an LUT cascade, the interconnecting
lines between adjacent LUT memories are called rails. In
this case, the rails represent sub-functions in the EVBDD.
The outputs from each LUT memory other than rails repre-
sent the sum of weights of edges. In this paper, we call such
outputs Arails (adder rails). To the best of our knowledge,
this is the first design method using an EVBDD to produce
the cascaded programmable architecture.
Example 3: By decomposing the MTBDD and EVBDD in
Fig. 1, we obtain the SIEs in Fig. 6. Figures 6(a) and (b)
illustrate the correspondences between each LUT memory
and decompositions of the MTBDD and the EVBDD, re-
spectively. In these figures, the column labeled as ‘ri’ in
the table of each LUT denotes the rails that represent sub-
functions in BDDs. The column ‘ai’ in Fig. 6(b) denotes
the Arails that represent the sum of weights of edges. In
the MTBDD, numbers assigned to edges that cut across the
horizontal lines represent sub-functions. In the EVBDD,
“(ai,ri)” assigned to edges that cut across the horizontal
lines represent the sum of weights and sub-functions, re-
spectively. The SIE in Fig. 6(a) requires 22 × 2+ 23 × 3+
24 × 3 = 80 bits and 3 levels (3 LUT memories). On the
other hand, the SIE in Fig. 6(b) requires 22 × 4+ 22 × 2 +
22 × 1 = 28 bits and 4 levels (3 LUT memories + 1 adder).
(End of Example)
This paper uses two terms: MT SIE and EV SIE denote the
SIEs designed using an MTBDD (Fig. 5(b)) and an EVBDD
(Fig. 5(c)), respectively. Both the MT SIE and the EV SIE
can realize any non-uniform segmentation. In both cases,
the size of LUT memories depends on the number of seg-
ments. Specifically,
(a) SIE using MTBDD (MT SIE).
(b) SIE using EVBDD (EV SIE).
Fig. 6 Example of SIEs.
Theorem 1: Let seg f unc(X) be a segment index func-
tion with t segments. Then, there exists an EV SIE for
seg f unc(X) with at most log2 t	 rails and log2 t	 Arails.
Proof: See Appendix.
The size of LUT memories and the number of levels of
an EV SIE depend on the decomposition of an EVBDD. To
obtain the optimum decomposition, we can use optimiza-
tion algorithms for heterogeneous multi-valued decision di-
agrams (MDDs) [14].
4.2 Programmability of the SIE
The LUT memories of the SIEs shown in Figs. 5(b) and (c)
are implemented by embedded RAMs (e.g. M4Ks) in an
FPGA. Thus, by changing the data for the LUT memories
and the coeﬃcient memory, a wide class of numerical func-
tions can be realized by a single architecture. Since just
changing the RAM data can switch numerical functions, we
can switch functions even while the FPGA is running.
Figure 7 and Fig. 8 show the details of the LUT cascade
and the control circuit for changing the RAM data, respec-
tively. In these figures, ‘mode’ denotes a signal to switch
NAGAYAMA et al.: DESIGN METHOD FOR NFGS USING RECURSIVE SEGMENTATION AND EVBDDS
2757
between the operation mode and the program mode of the
LUT cascade. The control circuit consists of a counter and
a decoder, and generates address and write enable signal for
each RAM sequentially.
To the best of our knowledge, a programmable archi-
tecture for the SIE has never before been proposed.
4.3 Reduction of the Size of the Multiplier
Since large multipliers have large delay, it is important to
reduce multiplier size. We do this in two ways; Reduce the
number of bits needed to represent 1. the coeﬃcients and 2.
the variables (X−S i).
To reduce the number of bits in the coeﬃcients, we use
a scaling method [10]. We first shift right the coeﬃcients.
Then, we apply rounding. Then, we do the actual multi-
plication. And, finally, we shift left the product to com-
pensate for the original shift right of the coeﬃcients. This
process is similar to floating point multiplication. A side
eﬀect is that rounding error is increased, since rounding oc-
curs on a smaller value. In applying this method, we choose
the largest exponent (right shift) that produces an error no
greater than the given acceptable error [15]. If this yields an
exponent of 0 (no right shift), in all segments, then we do
not use the scaling method.
To reduce the value of the variable X−S i, we make the
following observation. In each segment [S i,Ei), we have
X − S i < Ei − S i. Thus, reducing the segment width Ei − S i
reduces X − S i for X near Ei. However, this also increases
the number of segments, and thus the size of coeﬃcient
memory. We show a segment reduction technique that does
not increase the coeﬃcient memory size.
In an FPGA implementation, the coeﬃcient memory
in Fig. 4 has 2u words, where u = log2 t	 and t is the num-
ber of segments. Therefore, we can increase the number
of segments up to t = 2u without increasing the coeﬃcient
memory size. From Theorem 1, the size of the EV SIE also
depends on the value of u. Increasing the number of seg-
ments to t = 2u rarely increases the size of the EV SIE. We
reduce the size of segments by dividing the largest segment
Fig. 7 LUT cascade implemented by embedded RAMs.
Fig. 8 Control circuit for the SIE.
into two equal sized segments up to t = 2u.
5. Experimental Results
5.1 Number of Segments and Computation Time
Table 1 compares the number of segments for various seg-
mentation methods based on a 2nd-order Chebyshev ap-
proximation. In Table 1, “No. of uniform segs” shows the
number of uniform segments, “No. of nonuni. segs” shows
the number of non-uniform segments produced by [15],
and “Recursive” denotes the recursive segmentation method
shown in this paper. In the column “Recursive,” the sub-
column “No. of segs 1” shows the number of segments pro-
duced by the segmentation algorithm shown in Sect. 3. The
sub-column “No. of segs 2” shows the number of segments
produced by additionally applying the reduction method of
multiplier size shown in Sect. 4. The sub-column “Time”
shows the total CPU time, in milliseconds, for both the seg-
mentation algorithm and the reduction method of multiplier
size.
Table 1 shows that uniform segmentation requires ex-
cessively many segments to approximate certain functions,
such as tan(πX). Existing methods based on uniform seg-
mentation cannot implement those functions in conventional
FPGAs because the required coeﬃcient memory is too large.
Actually, many existing methods have not realized tan(πX)
in domain [0,0.5). This is because tan(πX) in [0,0.5) can be
computed by sin(πX)/cos(πX) or a combination of tan(πX)
in [0,0.25] and 1/ tan(πX′), where X′ = 0.5− X, and those
functions can be implemented by the existing methods.
However, these require multiple NFGs that realize elemen-
tary functions, such as sin, cos, or the reciprocal function.
On the other hand, the non-uniform or recursive segmen-
tation described here can compactly realize tan(πX) with a
single NFG, since non-uniform and recursive segmentation
methods require many fewer segments. For all functions in
Table 1, the non-uniform segmentation method [15] requires
the fewest segments among the three segmentation methods.
Although our recursive segmentation algorithm restricts the
segmentation points, it requires only up to 2.2 times more
Table 1 Number of segments for various segmentation methods.
X has 23-bit accuracy.
Acceptable approximation error: 2−25
Function Domain No. of No. of Recursive
f (X) [A,B) uniform nonuni. No. of No. of Time
segs segs segs 1 segs 2 [msec.]
eX [0,1) 128 67 103 128* 10
sin(πX) [0,0.5) 128 74 112 128* 10
tan(πX) [0,0.5) 4,194,304 4,594 5,723 8,192 1,600
arcsin(X) [0,1) 8,388,608 256 363 512 70√
X (0,1) 8,388,607 228 322 512 30√− ln(X) (0,1) 8,388,607 698 967 1,024 190
X ln(X) (0,1) 2,097,152 172 250 256 10
*Uniform segmentation is produced.
Environment: Sub Blade 2500 (Silver), UltraSPARC-IIIi 1.6GHz,
6GB memory, Solaris 9.
2758
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.12 DECEMBER 2007
segments than non-uniform segmentation [15]. That is, our
recursive segmentation algorithm generates a segmentation
appropriate to the given function, while restricting the seg-
mentation points. Thus, our recursive segmentation pro-
duces a small coeﬃcient memory for the given function. In
the next section, we show that recursive segmentation re-
duces the size of SIE, as well.
5.2 FPGA Implementation of SIEs
Table 2 compares the FPGA implementation results of the
MT SIE and EV SIE. In this table, “LUT size” shows the
total size of LUT memories used in the SIE, in bits. Note
that the size of LUT memory and LE, the number of logic
elements, are 0 for eX and sin(πX) when recursive segmen-
tation is used. This is because our algorithm generated uni-
form segments as shown in Table 1, and so an SIE was not
needed. In the experiment that produced the data in Table 2,
we optimized the decomposition of the MTBDDs and EVB-
DDs by requiring the size of each LUT memory used in
these SIEs to be 4K bits, the same as the RAM block (M4K)
of the FPGA.
Table 2 shows that, for optimum non-uniform segmen-
tation, the EV SIEs have smaller LUT memory size than
the MT SIEs. For example, for tan(πX), the LUT mem-
ory size of the EV SIE is only 10% of the size needed by
the MT SIE. For tan(πX), the LUT memory size of MT SIE
is quite large because the number of non-uniform segments
is large. From experiments with uniform segmentation, we
know that for tan(πX), the total memory size needed by the
NFG using the MT SIE is only 1.5% of the total memory
size needed by the NFG based on uniform segmentation (i.e.
the existing methods). However, this is still too large to im-
plement with the FPGA. By using the EV SIE, we can re-
duce the LUT memory size significantly, and make the NFG
implementable with the FPGA.
Our recursive segmentation can reduce both the LUT
memory size and the delay time of the MT SIEs. Espe-
cially, for X ln(X), using an MT SIE designed for recursive
segmentation has only 34% of the LUT memory size and
59% of the delay of the MT SIE designed for optimum non-
uniform segmentation.
Table 2 FPGA implementation of SIEs.
FPGA device: Altera Stratix EP1S10F484C5 (LE: 10,570, M4K: 60, M512: 90)
Logic synthesis tool: Altera QuartusII 5.0 (speed optimization, timing requirement of 200 MHz)
Function Optimum non-uniform segmentation Recursive segmentation
f (X) MT SIE EV SIE MT SIE EV SIE
LUT size LE Level Delay LUT size LE Level Delay LUT size LE Level Delay LUT size LE Level Delay
[bits] [nsec.] [bits] [nsec.] [bits] [nsec.] [bits] [nsec.]
eX 26,368 115 8 58.8 23,040 154 7 43.5 0 0 0 0 0 0 0 0
sin(πX) 26,880 115 8 54.9 23,552 151 7 45.1 0 0 0 0 0 0 0 0
tan(πX) 1,802,240 – 5 – 179,968 340 10 75.3 1,687,552 – 5 – 15,108 201 7 43.6
arcsin(X) 61,440 123 8 53.2 53,824 336 13 87.6 49,152 109 7 46.7 9,984 107 5 28.0√
X 61,440 123 8 53.2 57,408 289 11 75.5 44,544 107 7 46.4 10,752 112 5 27.2√− ln(X) 266,240 138 7 56.7 116,160 330 11 81.4 172,032 118 7 54.3 12,736 148 6 38.4
X ln(X) 61,440 129 8 52.5 48,400 250 10 63.8 20,992 67 5 30.8 8,960 74 4 22.9
–: It cannot be mapped into the FPGA due to insuﬃcient RAM blocks.
By using recursive segmentation and the EV SIE, we
can reduce both LUT memory size and delay time of SIEs
significantly. For all functions in Table 2, both LUT memory
size and delay time of the EV SIEs for recursive segmenta-
tion are much smaller for the MT SIEs. In terms of the num-
ber of LEs, the EV SIEs for recursive segmentation require
only up to 1.3 times more LEs than the MT SIEs. There-
fore, designing an EV SIE for recursive segmentation yields
faster and more compact SIEs than obtained by previous
methods. The design is formal and is easily programmed.
5.3 FPGA Implementation of NFGs
Table 3 compares the FPGA implementation results of our
NFGs using EV SIE (EVNFGs) with the existing NFGs us-
ing MT SIE (MTNFGs) [15], where EVNFGs are based on
recursive segmentation and MTNFGs are based on the op-
timum non-uniform segmentation. Both NFGs have 23-bit
precision (23-bit accuracy).
From Table 2 and Table 3, we can see that the LUT
memory size of MT SIE accounts for more than 2/3 of the
total memory size of the MTNFG. On the other hand, by
using recursive segmentation and EV SIE, the LUT mem-
ory size needed for the SIE can be reduced to less than 1/4
of the total memory size of the EVNFG. Thereby, the EVN-
FGs require only 21% to 64% of memory size needed for the
MTNFGs. For arcsin(X) and √X, as shown in Table 1, our
recursive segmentation requires a coeﬃcient memory that is
about twice as large as needed for the optimum non-uniform
segmentation. Nevertheless, by using EV SIEs, the total
memory sizes of EVNFGs can be reduced to about 60% of
the memory sizes of MTNFGs. Further, Table 3 shows that
EVNFGs require fewer LEs and levels (i.e., shorter latency)
than MTNFGs, and the delay time of EVNFGs is only about
24% to 94% of the delay time of MTNFGs.
To compare our NFG with the existing NFG based
on another non-uniform segmentation method (hierarchical
segmentation) shown in [11], we implemented our 24-bit
precision NFGs for X ln(X) and “humps” function using the
Xilinx Virtex-II FPGA (XC2V4000-6) and the Synplify Pre-
mier 8.5. The humps function is a quotient of polynomials
NAGAYAMA et al.: DESIGN METHOD FOR NFGS USING RECURSIVE SEGMENTATION AND EVBDDS
2759
Table 3 FPGA implementation of 23-bit precision (23-bit accuracy) NFGs.
FPGA device: Altera Stratix EP1S60F1020C5
(LE: 57,120, DSP: 144, M4K: 292, M512: 574)
Logic synthesis tool: Altera QuartusII 5.0
(speed optimization, timing requirement of 200 MHz)
Function MTNFG based on optimum nonuni. EVNFG based on recursive
f (X) Memory LE DSP Level Delay Memory LE DSP Level Delay
[bits] [nsec.] [bits] [nsec.]
eX 39,040 689 10 13 99.6 8,064 432 10 3 25.1
sin(πX) 36,864 635 10 13 99.1 7,936 395 10 3 28.3
tan(πX) 2,867,200 – 16 11 – 973,572 1,059 16 12 92.3
arcsin(X) 84,736 1,301 16 14 107.3 53,504 937 16 10 80.3√
X 83,712 1,041 16 14 116.5 53,760 917 16 10 77.2√− ln(X) 357,376 950 16 13 99.8 103,872 972 16 11 88.3
X ln(X) 83,200 988 16 14 116.0 31,744 989 16 9 70.4
–: It cannot be mapped into the FPGA due to insuﬃcient RAM blocks.
Table 4 FPGA implementation of 24-bit precision NFGs.
FPGA device: Xilinx Virtex-II XC2V4000-6
Logic synthesis tool: Synplify Premier Ver. 8.5
Function NFG in [11] EVNFG
f (X) Memory Slice Mult. Level Delay Memory Slice Mult. Level Delay
[bits] [nsec.] [bits] [nsec.]
X ln(X) 40,446 871 10 14 103.7 34,560 454 5 9 64.7
humps NA 409 4 13 82.8 91,648 189 4 8 55.9
humps = 0.0004x+0.0002
x4−1.96x3+1.348x2−0.378x+0.0373 .
Table 4 compares the FPGA implementation results of our
NFGs and the NFGs shown in [11]. For the humps function,
the memory size of the NFG is not shown in [11].
From these results, we can see that our NFGs using
recursive segmentation and the EV SIE can realize a wide
range of functions faster and more compactly than existing
NFGs.
5.4 Comparison of Design Methods
As for the segmentation, we have two methods: optimum
non-uniform and recursive, and as for the SIE, we have two
methods: MT SIE and EV SIE. Thus, there exist four diﬀer-
ent design methods. Table 5 compares four design methods:
they are abbreviated as Non-uniform MT, Recursive MT,
Non-uniform EV, and Recursive EV.
Roughly speaking, the non-uniform segmentation pro-
duces a smaller coeﬃcient memory, but a larger and slower
SIE. On the other hand, the recursive segmentation produces
a larger coeﬃcient memory, but a smaller and faster SIE.
As for the SIE, the EV SIE requires smaller LUT memories
than the MT SIE. However, the EV SIE requires a cascade
of adders that often makes the NFG slower than one with
the MT SIE.
Non-uniform MT produces a small coeﬃcient mem-
ory, but its SIE has the largest LUT memory size among
the four methods. Thus, Non-uniform MT results in NFGs
with large memory size.
Recursive MT produces a smaller and faster SIE than
Non-uniform MT. However, the reduction of LUT memory
Table 5 Comparison of design methods.
Segmentation methods
SIEs Non-uniform Recursive
· Smaller coeﬃcient memory · Larger coeﬃcient memory
MT
· Largest and slower SIE · Smaller and faster SIE
· Smaller coeﬃcient memory · Larger coeﬃcient memory
EV
· Larger and slowest SIE · Smallest and fastest SIE
size in the SIE is insuﬃcient to compensate for the increase
in the coeﬃcient memory. Thus, NFGs with Recursive MT
are faster than with Non-uniform MT, but still require large
memory size.
Non-uniform EV produces a small coeﬃcient mem-
ory and the SIE with smaller LUT memories than Non-
uniform MT. However, the SIE is the slowest among the
four methods. Thus, NFGs with Non-uniform EV require
smaller memory size than with Non-uniform MT, but they
are the slowest among the four methods.
Recursive EV produces the smallest and the fastest SIE
among the four. And, the reduction of LUT memory size in
the SIE is suﬃcient to compensate for the increase in the
coeﬃcient memory. Thus, NFGs with Recursive EV are the
smallest and the fastest among the four.
6. Concluding Remarks
We have presented design methods for numerical function
generators using recursive segmentation and EVBDDs. Our
recursive segmentation is a hybrid approach of an optimum
non-uniform segmentation that produces the fewest seg-
ments and a segmentation that reduces hardware complexity.
2760
IEICE TRANS. FUNDAMENTALS, VOL.E90–A, NO.12 DECEMBER 2007
Thus, our recursive segmentation reduces the sizes of both
the coeﬃcient memory and the SIE. We have proposed a
new programmable architecture and its design method using
EVBDDs. We have shown that using both the new segmen-
tation method and the new architecture can produce faster
and more compact programmable NFGs than the existing
NFGs. We also show that an adder can be replaced by a set
of 2-input AND gates, thus reducing the delay. Experimen-
tal results show
1. recursive segmentation yields MT SIEs that have only
49% of the LUT memory size and 55% of the delay of
MT SIEs based on optimum segmentation
2. by using EVBDDs to realize recursive segmentation,
we reduce LUT memory size and delay of the segment
index encoders. On the average, this yields EV SIEs
that require only 8% of the LUT memory size and 36%
of the delay of segment index encoders designed by
optimum non-uniform segmentation, and
3. overall, our NFGs require, on the average, only 39%
of the memory and 58% of the delay associated with
NFGs based on MT SIEs and optimum non-uniform
segmentation.
These results are for a suite of seven functions, including ex,
sin(πx), and √x.
Acknowledgments
This research is partly supported by the Grant in Aid for Sci-
entific Research of the Japan Society for the Promotion of
Science (JSPS), funds from Ministry of Education, Culture,
Sports, Science, and Technology (MEXT) via Kitakyushu
innovative cluster project, a contract with the National Se-
curity Agency, the MEXT Grant-in-Aid for Young Scientists
(B), 18700048, 2007, and Hiroshima City University Grant
for Special Academic Research (General Studies), 6101,
2007. The comments of two reviewers were useful in im-
proving this paper. Discussion with Prof. Yukihiro Iguchi
improved this paper.
References
[1] N. Brisebarre, D. Defour, P. Kornerup, J.-M. Muller, and N. Revol,
“A new range-reduction algorithm,” IEEE Trans. Comput., vol.54,
no.3, pp.331–339, March 2005.
[2] R.E. Bryant, “Graph-based algorithms for Boolean function manip-
ulation,” IEEE Trans. Comput., vol.C-35, no.8, pp.677–691, Aug.
1986.
[3] A. Cantoni, “Optimal curve fitting with piecewise linear functions,”
IEEE Trans. Comput., vol.C-20, no.1, pp.59–67, Jan. 1971.
[4] J. Cao, B.W.Y. Wei, and J. Cheng, “High-performance architec-
tures for elementary function generation,” Proc. 15th IEEE Symp.
on Computer Arithmetic (ARITH’01), pp.136–144, Vail, Colorado,
June 2001.
[5] E.M. Clarke, K.L. McMillan, X. Zhao, M. Fujita, and J. Yang,
“Spectral transforms for large Boolean functions with applications
to technology mapping,” Proc. 30th ACM/IEEE Design Automation
Conference, pp.54–60, June 1993.
[6] D. Defour, F. de Dinechin, and J.-M. Muller, “A new scheme for
table-based evaluation of functions,” 36th Asilomar Conference on
Signals, Systems, and Computers, pp.1608–1613, Pacific Grove,
California, Nov. 2002.
[7] J. Detrey and F. de Dinechin, “Table-based polynomials for
fast hardware function evaluation,” 16th IEEE Inter. Conf.
on Application-Specific Systems, Architectures, and Processors
(ASAP’05), pp.328–333, 2005.
[8] V.K. Jain, S.A. Wadekar, and L. Lin, “A universal nonlinear com-
ponent and its application to WSI,” IEEE Trans. Compon. Hybrids
Manuf. Technol., vol.16, no.7, pp.656–664, Nov. 1993.
[9] Y.-T. Lai, M. Pedram, and S.B. Vrudhula, “EVBDD-based algo-
rithms for linear integer programming, spectral transformation and
functional decomposition,” IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst., vol.13, no.8, pp.959–975, Aug. 1994.
[10] D.-U. Lee, W. Luk, J. Villasenor, and P.Y. K. Cheung, “Non-
uniform segmentation for hardware function evaluation,” Proc. Inter.
Conf. on Field Programmable Logic and Applications, pp.796–807,
Lisbon, Portugal, Sept. 2003.
[11] D.-U. Lee, W. Luk, J. Villasenor, and P.Y. K. Cheung, “Hierarchical
segmentation schemes for function evaluation,” Proc. IEEE Conf.
on Field-Programmable Technology, pp.92–99, Tokyo, Japan, Dec.
2003.
[12] J.H. Mathews, Numerical Methods for Computer Science, Engineer-
ing and Mathematics, Prentice-Hall, Englewood Cliﬀs, NJ, 1987.
[13] J.-M. Muller, Elementary Function: Algorithms and Implementa-
tion, Birkhauser Boston, Secaucus, NJ, 1997.
[14] S. Nagayama and T. Sasao, “On the optimization of heterogeneous
MDDs,” IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.,
vol.24, no.11, pp.1645–1659, Nov. 2005.
[15] S. Nagayama, T. Sasao, and J.T. Butler, “Compact numerical func-
tion generators based on quadratic approximation: Architecture and
synthesis method,” IEICE Trans. Fundamentals, vol.E89-A, no.12,
pp.3510–3518, Dec. 2006.
[16] S. Nagayama, T. Sasao, and J.T. Butler, “Numerical function gen-
erators using edge-valued binary decision diagrams,” Proc. Asia
and South Pacific Design Automation Conference (ASPDAC’07),
pp.535–540, Yokohama, Japan, 2007.
[17] S. Nagayama, T. Sasao, and J.T. Butler, “Design method for nu-
merical function generators based on polynomial approximation for
FPGA implementation,” Proc. 10th EUROMICRO Conference on
Digital System Design Architectures, Methods and Tools (DSD’07),
pp.280–287, Germany, Aug. 2007.
[18] J.-A. Pin˜eiro, S.F. Oberman, J.-M. Muller, and J.D. Bruguera,
“High-speed function approximation using a minimax quadratic in-
terpolator,” IEEE Trans. Comput., vol.54, no.3, pp.304–318, March
2005.
[19] T. Sasao and M. Fujita, eds., Representations of Discrete Functions,
Kluwer Academic Publishers, 1996.
[20] T. Sasao, M. Matsuura, and Y. Iguchi, “A cascade realization of
multiple-output function for reconfigurable hardware,” Inter. Work-
shop on Logic Synthesis (IWLS’01), pp.225–230, Lake Tahoe, CA,
June 2001.
[21] T. Sasao and M. Matsuura, “A method to decompose multiple-output
logic functions,” 41st Design Automation Conference, pp.428–433,
San Diego, CA, June 2004.
[22] T. Sasao, J.T. Butler, and M.D. Riedel, “Application of LUT cas-
cades to numerical function generators,” Proc. 12th Workshop on
Synthesis and System Integration of Mixed Information Technolo-
gies (SASIMI’04), pp.422–429, Kanazawa, Japan, Oct. 2004.
[23] T. Sasao, S. Nagayama, and J.T. Butler, “Programmable numerical
function generators: Architectures and synthesis method,” Proc. In-
ter. Conf. on Field Programmable Logic and Applications (FPL’05),
pp.118–123, Tampere, Finland, Aug. 2005.
[24] T. Sasao, S. Nagayama, and J.T. Butler, “Numerical function gen-
erators using LUT cascades,” IEEE Trans. Comput., vol.56, no.6,
pp.826–838, June 2007.
[25] M.J. Schulte and E.E. Swartzlarnder, “Hardware designs for exactly
rounded elementary functions,” IEEE Trans. Comput., vol.43, no.8,
NAGAYAMA et al.: DESIGN METHOD FOR NFGS USING RECURSIVE SEGMENTATION AND EVBDDS
2761
pp.964–973, Aug. 1994.
[26] J.E. Stine and M.J. Schulte, “The symmetric table addition method
for accurate function approximation,” J. VLSI Signal Processing,
vol.21, no.2, pp.167–177, June 1999.
[27] M.R. Williams, History of Computing Technology, IEEE Computer
Society Press, Los Alamitos, CA, 1997.
Appendix: Proof of Theorem 1
Theorem 1: Let seg f unc(X) be a segment index func-
tion with t segments. Then, there exists an EV SIE for
seg f unc(X) with at most log2 t	 rails and log2 t	 Arails.
Proof: The second part of the hypothesis concerning the
number of Arails was proven in [22]. Specifically, it was
shown that
Theorem A: [22] Let seg f unc(X) be a segment index
function with t segments. Then, there exists an LUT cas-
cade for seg f unc(X) with at most log2 t	 rails.
Each node in an MTBDD represents the Shannon expan-
sion: xi · f1 + x¯i · f0, where f0 and f1 are sub-functions with
respect to xi = 0 and xi = 1, respectively. On the other hand,
each node in an EVBDD represents the expansion:
xi(α+ f ′1)+ x¯i · f0,
where f1 = α + f ′1, and α is a constant value of the sub-
function f1. Note that, in an EVBDD, α is the weight of a
1-edge. Let µe be the number of distinct sub-functions pro-
duced by this expansion, and let µs be the number of distinct
sub-functions by the Shannon expansion. Then, we have
µe ≤ µs.
Theorem A shows that seg f unc(X) can be represented by
an MTBDD in which the number of distinct sub-functions
with respect to each xi is at most t. Thus, there exists
an EVBDD for seg f unc(X) that has at most t distinct
sub-functions with respect to each xi. In an EVBDD for
seg f unc(X), the sum of weights of edges on a path is at
most t, since each weight is a non-negative integer. There-
fore, we have Theorem 1.
Shinobu Nagayama received the B.S.
and M.E. degrees from the Meiji University,
Kanagawa, Japan, in 2000 and 2002, respec-
tively, and the Ph.D. degree in computer science
from the Kyushu Institute of Technology, Iizuka,
Japan, in 2004. He is now a research associate
at the Hiroshima City University, Hiroshima,
Japan. He received the Outstanding Contribu-
tion Paper Award from the IEEE Computer So-
ciety Technical Committee on Multiple-Valued
Logic (MVL-TC) in 2005 for a paper presented
at the International Symposium on Multiple-Valued Logic in 2004, and the
Excellent Paper Award from the Information Processing Society of Japan
(IPS) in 2006. His research interest includes numerical function generators,
decision diagrams, software synthesis, and embedded systems.
Tsutomu Sasao received the B.E., M.E.,
and Ph.D. degrees in electronics engineering
from Osaka University, Osaka, Japan, in 1972,
1974, and 1977, respectively. He has held
faculty/research positions at Osaka University,
Japan, the IBM T.J. Watson Research Center,
Yorktown Heights, New York, and the Naval
Postgraduate School, Monterey, California. He
is now a Professor of the Department of Com-
puter Science and Electronics at the Kyushu In-
stitute of Technology, Iizuka, Japan. His re-
search areas include logic design and switching theory, representations of
logic functions, and multiple-valued logic. He has published more than
nine books on logic design, including Logic Synthesis and Optimization,
Representation of Discrete Functions, Switching Theory for Logic Synthe-
sis, and Logic Synthesis and Verification, Kluwer Academic Publishers,
1993, 1996, 1999, and 2001, respectively. He has served as Program Chair-
man for the IEEE International Symposium on Multiple-Valued Logic (IS-
MVL) many times. Also, he was the Symposium Chairman of the 28th
ISMVL held in Fukuoka, Japan, in 1998. He received the NIWA Memorial
Award in 1979, Distinctive Contribution Awards from the IEEE Computer
Society MVL-TC for papers presented at ISMVLs in 1986, 1996, 2003 and
2004, and Takeda Techno-Entrepreneurship Award in 2001. He has served
as an Associate Editor of the IEEE Transactions on Computers. He is a
fellow of the IEEE.
Jon T. Butler received the B.E.E. and
M.Engr. degrees from Rensselaer Polytechnic
Institute, Troy, New York, in 1966 and 1967, re-
spectively. He received the Ph.D. degree from
The Ohio State University, Columbus, in 1973.
Since 1987, he has been a professor at the
Naval Postgraduate School, Monterey, Califor-
nia. From 1974 to 1987, he was at North-
western University, Evanston, Illinois. During
that time, he served two periods of leave at the
Naval Postgraduate School, first as a National
Research Council Senior Postdoctoral Associate (1980–1981) and second
as the NAVALEX Chair Professor (1985–1987). He served one period of
leave as a foreign visiting professor at the Kyushu Institute of Technol-
ogy, Iizuka, Japan. His research interests include logic optimization and
multiple-valued logic. He has served on the editorial boards of the IEEE
Transactions on Computers, Computer, and IEEE Computer Society Press.
He has served as the editor-in-chief of Computer and IEEE Computer So-
ciety Press. He received the Award of Excellence, the Outstanding Con-
tributed Paper Award, and a Distinctive Contributed Paper Award for pa-
pers presented at the International Symposium on Multiple-Valued Logic.
He received the Distinguished Service Award, two Meritorious Awards, and
nine Certificates of Appreciation for service to the IEEE Computer Society.
He is a fellow of the IEEE.
