Programmable Architectures and design methods for two-variable numeric function generators by Nagayama, Shinobu et al.
Calhoun: The NPS Institutional Archive
Faculty and Researcher Publications Faculty and Researcher Publications Collection
2010-02
Programmable Architectures and design
methods for two-variable numeric function generators
Nagayama, Shinobu
S. Nagayama, T. Sasao and J. T. Butler, "Programmable Architectures and design
methods for two-variable numeric function generators," IPSJ Transactions on System
LSI Design Methodology, Vol. 3, pp.118-129, Feb. 2010.
http://hdl.handle.net/10945/35855
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010)
Regular Paper
Programmable Architectures and Design Methods for
Two-Variable Numeric Function Generators 1
Shinobu Nagayama,†1 Tsutomu Sasao†2
and Jon T. Butler†3
This paper proposes programmable architectures and design methods for nu-
meric function generators (NFGs) of two-variable functions. To realize a two-
variable function in hardware, we partition a given domain of the function into
segments, and approximate the function by a polynomial in each segment. This
paper introduces two planar segmentation algorithms that eﬃciently partition a
domain of a two-variable function. This paper also introduces a design method
for symmetric two-variable functions (i.e. f(X,Y ) = f(Y,X)). This method
can reduce the memory size needed for symmetric functions by nearly half with
small speed penalty. The proposed architectures allow a systematic design of
various two-variable functions. We compare our approach with one based on a
one-variable NFG. FPGA implementation results show that, for a complicated
function, our NFG achieves 57% of memory size and 60% of delay time of a
circuit designed based on a one-variable NFG.
1. Introduction
The ability to compute numeric functions at a high speed is important in
many applications 12), including 3D computer graphics, hardware accelerators
for technical computing packages, direct digital frequency synthesizers 4), and
digital signal processing. Various design methods for numeric function generators
(NFGs) have been devised for numeric functions on one variable 5),10),14),15),18)–20).
Only a few methods exist for multi-variable functions (e.g.,
√
X2 + Y 2 + Z2 and
arctan(X/Y )) 6),7),22). However, these methods are function-speciﬁc; diﬀerent
functions require diﬀerent methods. As far as we know, no systematic design
†1 Hiroshima City University
†2 Kyushu Institute of Technology
†3 Naval Postgraduate School
1 This paper is an extension of two papers 16), 17).
method exists for generic multi-variable functions.
A straightforward design method for arbitrary multi-variable function is to use
a single memory in which the address is a combination of values of variables and
the content of that address is the corresponding value of function. This method
produces a fast implementation, but requires a 2mn-word memory to implement
an m-variable function with n bits for each variable. Thus, unlike one-variable
functions, even for a computation with a small number of bits, this method is
impractical because of large memory size.
To produce a practical implementation, multi-variable functions are often de-
signed in a conventional (trivial) manner that uses a combination of one-variable
function generators, multipliers, and adders 6),7). For example, the function√
X2 + Y 2 + Z2 can be realized using three circuits, each realizing a2, two adders,
and a square root circuit. This design method may require small memory size.
However, depending on the function implemented, it can produce a slow imple-
mentation because of long path delays. Also, such circuits make error analysis
harder. That is, guaranteeing output accuracy becomes harder. Also, there are
many multi-variable functions that cannot be decomposed into one-variable func-
tions, such as probability distributions that are functions of the random variable
and a parameter, like variance.
This paper proposes a systematic design method for two-variable functions.
Since our design method is based on a piecewise polynomial approximation, ar-
chitectures are simple even for complicated functions. To approximate a given
function using piecewise polynomials, we introduce two planar segmentation al-
gorithms that eﬃciently partition a given domain of a two-variable function.
We also introduce programmable architectures that can realize a wide range of
two-variable functions.
The rest of this paper is organized as follows: Section 2 introduces the number
representation and the decision diagrams used in this paper. Section 3 presents
two planar segmentation algorithms and a polynomial approximation method
using bilinear interpolation. Section 4 presents programmable architectures for
two-variable functions. Section 5 presents an architecture and a design method
for symmetric two-variable functions. Section 6 evaluates performance of our
segmentation algorithms and architectures for two-variable functions. And, Sec-
118 c© 2010 Information Processing Society of Japan
119 Programmable Architectures and Design Methods for Two-Variable NFGs
tion 7 concludes the paper. An error analysis for our NFGs is omitted because
it is the almost same as Refs. 14), 19).
2. Preliminaries
2.1 Number Representation and Errors
Definition 1 A numeric function generator (NFG) is a logic circuit that
computes approximated values for a numeric (real) function within some given
acceptable error ε. A one-variable NFG is a logic circuit for a one-variable
numeric function f(X), whose input is X, and output is an approximated value
for f(X). A two-variable NFG is a logic circuit for a two-variable numeric
function f(X,Y ), whose inputs are X and Y , and output is an approximated
value for f(X,Y ).
Definition 2 A value X represented by the binary fixed-point represen-
tation is denoted by
X = (xl−1 xl−2 . . . x1 x0. x−1 x−2 . . . x−m),
where xi ∈ {0, 1}, l is the number of bits in the integer part, and m is the number
of bits in the fractional part. Each bit xi contributes 2ixi to the value of X,
except xl−1, which contributes −2l−1xl−1. That is, the ﬁxed-point representation
is in two’s complement.
Definition 3 Error is the absolute diﬀerence between the exact value and
the value produced by the hardware. Acceptable error is the maximum error
that an NFG may assume; it is usually a speciﬁcation to be satisﬁed by the hard-
ware. Approximation error is the error caused by a function approximation.
Acceptable approximation error is the maximum approximation error that a
function approximation may assume. Rounding error is the error caused by a
binary ﬁxed-point representation.
Definition 4 Accuracy is the number of bits in the fractional part of a binary
ﬁxed-point representation. m-bit accuracy speciﬁes that m bits are used to
represent the fractional part of the number. When the maximum error is 2−m,
the accuracy is no greater than 1 unit in the last place (ULP) 12). In this
paper, an m-bit accuracy NFG is an NFG with an m-bit fractional part of the
inputs, an m-bit fractional part of the output, and a 1 ULP error.
2.2 Decision Diagrams
The proposed design uses binary decision diagrams.
Definition 5 A binary decision diagram (BDD) 2),11) is a rooted directed
acyclic graph (DAG) representing a logic function. The BDD is obtained by
recursively applying the Shannon expansion f = xif0+xif1 to the logic function,
where f , f0, and f1 are represented by nodes. There are two types of nodes,
terminal nodes that are labeled by the two function values, 0 and 1, and non-
terminal nodes that are labeled by variable names. Each non-terminal node has
two unweighted outgoing edges labeled 0 and 1, corresponding to the value of the
node’s variable. The terminal nodes have no outgoing edges. We consider only
ordered BDDs, where the order of the variables is the same for every path from the
root node to a terminal node. We consider only reduced BDDs, where identical
subtrees are combined into a single tree.
Definition 6 A multi-terminal BDD (MTBDD) 3) is an extension of a
BDD, that represents an integer-valued function: {0, 1}n → S ⊆ Z, where S is
a ﬁnite subset of the set Z of integers. In an MTBDD, the terminal nodes are
labeled by values of S.
Definition 7 An edge-valued BDD (EVBDD) 8),9) is also an extension
of a BDD, that represents an integer-valued function. The EVBDD is obtained
by repeatedly applying the expansion f = xif0 + xi(f ′1 + α) to the integer-valued
function, where f1 = f ′1 + α, and α is the constant term of f1. In an EVBDD,
all 1-edges have an integer weight and all 0-edges have weight 0. There is only
one terminal node; it is labeled 0. The incoming edge into the root node can have
a non-zero weight. A non-zero weight α on the incoming edge of the root node
adds α to all sums associated with all paths from the root to the terminal node
of the EVBDD. Indeed, it occurs when the EVBDD is a sub-EVBDD to a larger
EVBDD.
Example 1 Figure 1 (b) and (c) show an MTBDD and an EVBDD for the
integer-valued function f deﬁned by Fig. 1 (a). In Fig. 1 (b) and (c), dashed lines
and solid lines denote 0-edges and 1-edges, respectively. Note the non-zero weights
on 1-edges of the EVBDD. In the MTBDD, terminal nodes represent function
values. Thus, to evaluate the function, we traverse the MTBDD from the root
node to a terminal node according to the input values, and obtain the function
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
120 Programmable Architectures and Design Methods for Two-Variable NFGs
x1 y1 x0 y0 f x1 y1 x0 y0 f
0 0 0 0 0 1 0 0 0 2
0 0 0 1 0 1 0 0 1 2
0 0 1 0 0 1 0 1 0 2
0 0 1 1 0 1 0 1 1 2
0 1 0 0 1 1 1 0 0 3
0 1 0 1 1 1 1 0 1 4
0 1 1 0 1 1 1 1 0 5
0 1 1 1 1 1 1 1 1 6
(a) Function table. (b) MTBDD (c) EVBDD
Fig. 1 MTBDD and EVBDD for an integer-valued function.
value (an integer) from the terminal node. On the other hand, in the EVBDD,
we obtain the function value by summing the weights of the edges traversed from
the root node to the terminal node. (End of Example)
3. Piecewise Polynomial Approximation Based on Planar Segmen-
tation
3.1 Planar Segmentation Problem
To approximate a given two-variable function by piecewise polynomials, we
partition a given domain of the function into segments, and approximate the
function by a polynomial in each segment. By narrowing segments, and thus in-
creasing the number of segments, we can decrease the approximation error to the
desired value. In this case, the memory size and speed of an NFG are strongly de-
pendent on segmentation of domain. Thus, to design fast and compact NFGs, we
need to solve the following segmentation problem: Given a two-variable function,
its domain, and acceptable approximation error, ﬁnd an optimum segmentation.
To ﬁnd an optimum segmentation, we consider the following:
( 1 ) number of words in the coeﬃcients memory, which is the number of seg-
ments, and
( 2 ) complexity of hardware to realize segmentation, called the segment index
encoder, which maps values of X and Y to a segment number.
Fewer segments are preferred because the number of segments directly aﬀects
the size of the coeﬃcients memory of the NFG. But, the complexity of the segment
index encoder is important as well. Even if the number of segments is minimum,
a large NFG is produced if the segment index encoder is very large.
For one-variable functions, since the domain is formed in one-dimension (line),
any segmentation can be realized compactly. Thus, we considered only the num-
ber of segments to ﬁnd an optimum segmentation 14),19). On the other hand, for
two-variable functions, since the domain is formed in two-dimensions (plane),
the segment index encoders tend to be much more complex than for one-variable
functions. Thus, to ﬁnd the optimum design of two-variable NFGs, it is necessary
to carefully consider the complexity of the segment index encoder.
For one-variable functions, we have proposed linear segmentation algo-
rithms 14),19) to ﬁnd an optimum segmentation of a linear domain (an approxima-
tion with the fewest segments) eﬃciently. However, for two-variable functions, a
planar segmentation algorithm is now required to ﬁnd an optimum segmentation
of a planar domain. In planar segmentations, we have a higher degree of freedom,
and thus, ﬁnding an optimum segmentation becomes much more diﬃcult than
in linear segmentation. Because many segments may be involved in a practical
design, the time needed to ﬁnd an optimum segmentation can be very long. To
produce an eﬃcient planar segmentation in a short computation time, we fo-
cus on heuristic planar segmentation algorithms. The next subsection presents
two heuristic planar segmentation algorithms that produce an eﬃcient planar
segmentation by regularly partitioning a given domain using squares.
3.2 Planar Segmentation Algorithms
We ﬁrst present a recursive planar segmentation algorithm to reduce the hard-
ware complexity of both the coeﬃcients memory (the number of segments) and
the segment index encoder. Figure 2 shows this algorithm. Inputs of the algo-
rithm are a numeric function f(X,Y ), a domain {[Xb,Xe), [Yb, Ye)} for X and
Y , an accuracy min of X and Y , and an acceptable approximation error εa.
This algorithm begins by computing an approximate polynomial g(X,Y ). This
is an initial approximation. If that approximation error ε is larger than the given
acceptable error εa, then the domain is partitioned into four equal-sized square
segments. For each segment, an approximate polynomial is computed again. The
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
121 Programmable Architectures and Design Methods for Two-Variable NFGs
Input: Numeric function f(X,Y ), domain {[Xb, Xe), [Yb, Ye)} for X and Y , accuracy
min of X and Y , and acceptable approximation error εa.
X and Y are represented in the same number of bits.
Output: Segments {[Xb, P0), [Yb, Q0)}, {[Xb, P0), [Q0, Q1)} . . ., {[Pr−1, Xe), [Qr−1, Ye)},
and correction values v0, v1, . . . , vk−1.
Step:
1. For {[Xb, Xe), [Yb, Ye)}, compute an approximate polynomial g(X,Y ).
2. Compute the maximum positive error maxfg = max{f(X,Y )− g(X,Y )}.
3. Compute the maximum negative error minfg = min{f(X,Y )− g(X,Y )}.
4. Compute approximation error ε = (maxfg − minfg)/2 and correction values
v = (maxfg + minfg)/2.
5. If ε < εa or (Xe −Xb) ≤ 2−min , then stop.
6. Else, partition {[Xb, Xe), [Yb, Ye)} into four segments {[Xb, P ), [Yb, Q)},
{[Xb, P ), [Q, Ye)}, {[P,Xe), [Yb, Q)}, and {[P,Xe), [Q,Ye)}, where P = (Xb +
Xe)/2 and Q = (Yb + Ye)/2.
7. Repeat Steps 1, 2, . . . , 6 for each new segment recursively, until the maximum
approximation errors are smaller than εa in all segments.
Fig. 2 Recursive planar segmentation algorithm.
same process is recursively repeated until all segments have approximation errors
smaller than εa. Note that this algorithm creates a segment of size wi×wi, where
wi = 2hi × 2−min and hi is an integer. That is, all the segmentation points Pi
and Qi are restricted to values such that the least signiﬁcant hi bits are 0 (i.e.,
Pi = (. . . p−j+1 p−j 00 . . . 0), where j = min−hi). This restriction contributes
to reduce the complexity of the segment index encoder.
Next, we present the planar uniform segmentation algorithm. Since the re-
cursive planar segmentation algorithm produces non-uniform segmentation, a
segment index encoder is needed to compute a segment number from values of
X and Y . However, in a uniform segmentation where the number of segments is
a power of 2, a segment index encoder is not necessary because a segment num-
ber is obtained by the most signiﬁcant bits of X and Y (see Fig. 3 (b)). This
eliminates the delay of the segment index encoder, and produces fast NFGs. To
produce a uniform segmentation, we begin by ﬁnding the smallest square seg-
ment needed to achieve the acceptable approximation error using the recursive
segmentation algorithm shown in Fig. 2. Then, we partition a given domain into
square segments all with the same size as the smallest segment.
3.3 Approximation Using Bilinear Interpolation Polynomials
For g(X,Y ) in Fig. 2, we can use any approximating polynomial. In general,
higher-order polynomials require fewer segments. However, for multi-variable
functions, using higher-order polynomials is not always eﬀective in reducing the
memory size of NFGs. This is because, for multi-variable polynomials, higher
polynomial order requires many more polynomial coeﬃcients. Also, higher-order
polynomials produce slower NFGs. Thus, for polynomial approximation meth-
ods, reducing memory size with a small speed penalty is a key issue. To accom-
plish this, we use the bilinear interpolation polynomials 21).
Bilinear interpolation is an extension of linear interpolation. It interpo-
lates two-variable functions f(X,Y ) using four points. In Fig. 2, to interpo-
late f(X,Y ) in each segment {[Bx, Ex), [By, Ey)}, we use four corner points of
the segment: (Bx, By), (Bx, Ey), (Ex, By), and (Ex, Ey). Let fbb = f(Bx, By),
fbe = f(Bx, Ey), feb = f(Ex, By), and fee = f(Ex, Ey). Then, the bilinear
interpolation g(X,Y ) is given by:
g(X,Y ) =
fbb × (Ex −X)× (Ey − Y ) + feb × (X −Bx)× (Ey − Y )
(Ex −Bx)× (Ey −By)
+
fbe × (Ex −X)× (Y −By) + fee × (X −Bx)× (Y −By)
(Ex −Bx)× (Ey −By) .
By expanding and rearranging this, we obtain the following form:
g(X,Y ) = CxyXY + CxX + CyY + C0,
where
Cxy =
fbb − feb − fbe + fee
(Ex −Bx)(Ey −By) ,
Cx =
−fbbEy + febEy + fbeBy − feeBy
(Ex −Bx)(Ey −By) ,
Cy =
−fbbEx + febBx + fbeEx − feeBx
(Ex −Bx)(Ey −By) , and
C0 =
fbbExEy − febBxEy − fbeExBy + feeBxBy
(Ex −Bx)(Ey −By) .
To reduce the approximation error, the maximum positive error maxfg and the
maximum negative error minfg are equalized by a vertical shift of g(X,Y ) with
a correction value v = (maxfg + minfg)/2. Thus, the approximation error is
(maxfg −minfg)/2, and the approximating polynomial is g(X,Y ) + v.
For each segment {[Bx, Ex), [By, Ey)}, since Bx ≤ X < Ex and By ≤ Y <
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
122 Programmable Architectures and Design Methods for Two-Variable NFGs
Ey hold, we can oﬀset X and Y by Bx and By to compute the approximating
polynomial g(X,Y )+v. By using the oﬀset inputs (X−Bx) and (Y −By) instead
of X and Y , we reduce the size of multipliers needed to compute g(X,Y ) + v.
By substituting X − Bx + Bx and Y − By + By for X and Y respectively, we
transform g(X,Y ) + v as follows:
g(X,Y ) + v = Cxy(X −Bx + Bx)(Y −By + By) + Cx(X −Bx + Bx)
+Cy(Y −By + By) + C0 + v
= Cxy(X −Bx)(Y −By) + (Cx + CxyBy)(X −Bx)
+(Cy + CxyBx)(Y −By) + CxyBxBy + CxBx + CyBy + C0 + v
= Cxy(X −Bx)(Y −By) + C ′x(X −Bx) + C ′y(Y −By) + C ′0, (1)
where C ′x = Cx + CxyBy, C
′
y = Cy + CxyBx, and C
′
0 = CxyBxBy + CxBx +
CyBy + C0 + v.
4. Programmable Architectures for Two-Variable NFGs
4.1 Architectures Based on Recursive and Uniform Segmentations
Figure 3 shows two architectures for two-variable NFGs realizing (1). Fig-
ure 3 (a) and (b) show architectures based on recursive segmentation and uniform
segmentation, respectively. The segment index encoder converts values of X and
Y into a segment number. This, in turn, is applied as the address input of the co-
(a) Architecture based on recursive seg-
mentation.
(b) Architecture based on uniform seg-
mentation.
Fig. 3 Architectures for two-variable NFGs using bilinear interpolation.
eﬃcients memory. The coeﬃcients are applied to adders and multipliers to form
the polynomial value g(X,Y )+v. Note that Fig. 3 (a) uses bitwise AND gates to
compute X −Bx and Y −By. In recursive segmentation, we can realize X −Bx
and Y −By using AND gates driven on one side by Bx and By, respectively 15).
Note that Fig. 3 (b) has neither a segment index encoder nor bitwise AND
gates. In uniform segmentation, the segment index encoder and bitwise AND
gates are not necessary. This is because a segment number is obtained by the
most signiﬁcant bits of X and Y , and X − Bx and Y − By, which are realized
with bitwise AND gates in Fig. 3 (a), are obtained by the least signiﬁcant bits.
4.2 Architecture and Design Method for Segment Index Encoder
The segment index encoder realizes the segment index function: {0, 1}n ×
{0, 1}n → {0, 1, . . . , k − 1} shown in Fig. 4 (a), where X and Y have n bits, and
k denotes the number of segments. We realize this function with the architecture
shown in Fig. 4 (b). In this architecture, the values of interconnecting lines be-
tween adjacent LUT memories represent sub-functions in the EVBDD (labeled
rails), and the outputs from each LUT memory to the adders tally the function
value (labeled Arails). Consider the design of the LUT cascade and adders in
Fig. 4 (b), given the segmentation produced in Fig. 2.
We begin by representing the segment index function using an MTBDD. Fig-
ure 5 illustrates the relationship between recursive segmentation and MTBDDs.
Segments Index
Xb ≤ X < P0
Yb ≤ Y < Q0 0
Xb ≤ X < P0







Pr−1 ≤ X < Ye
Qr−1 ≤ X < Ye k − 1
(a) Segment index function. (b) LUT cascade and adders 15).
Fig. 4 Segment index encoder.
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
123 Programmable Architectures and Design Methods for Two-Variable NFGs
(a) Two segments. (b) Four segments. (c) Seven segments.
Fig. 5 Relationship between recursive segmentation and MTBDDs.
Fig. 6 Decomposition of the EVBDD.
Then, we convert the MTBDD into an EVBDD. By decomposing the EVBDD,
as shown in Fig. 6, we obtain the architecture in Fig. 4 (b). In Fig. 6, the column
labeled as ‘ri’ in the table of each LUT memory denotes the rails that represent
sub-functions in the EVBDD. And, the column ‘ai’ in Fig. 6 denotes the Arails
that represent the sum of weights of edges. In the EVBDD, “(ai, ri)” assigned
to edges that cut across the horizontal lines represents the sum of weights and
sub-functions, respectively. For more detail on this architecture, see 15).
In this architecture, the size of LUT memories realizing the recursive segmen-
tation depends on the number of segments. Speciﬁcally,
Theorem 1 Let seg func(X,Y ) be a segment index function obtained by a
recursive planar segmentation. The segment index function can be realized by the
segment index encoder shown in Fig. 4 (b) with at most log2 k rails and at most
log2 k Arails, where k is the number of segments.
Proof: See Appendix.
The segment index encoder satisfying Theorem 1 is obtained when the variable
order for the EVBDD is xl−1, yl−1, xl−2, yl−2, . . . , x−m, y−m from the top to the
bottom (see the proof in Appendix). We can also use the optimization techniques
for multi-valued decision diagrams presented in 13) to optimize the variable order
for EVBDD.
In our architectures, the coeﬃcients memory and the LUT memories of the
segment index encoder are implemented by RAMs. Thus, by changing the data
for the coeﬃcients memory and the LUT memories, a wide class of two-variable
functions can be realized by a single architecture.
5. Design Method for Symmetric Functions
Definition 8 A two-variable function f(X,Y ) is symmetric if f(X,Y ) =
f(Y,X).
Symmetric functions are commonly found in practical applications of NFGs. For
example,
√
X2 + Y 2, which is used in converting from rectangular to polar coor-
dinates, is symmetric. This section presents an architecture and a design method
taking advantage of the function’s symmetry.
Definition 9 A segmentation is symmetric if for every segment {[Bx1, Ex1),
[By1, Ey1)} such that Bx1 	= By1 or Ex1 	= Ey1, there is another segment
{[Bx2, Ex2), [By2, Ey2)} such that Bx1 = By2, Ex1 = Ey2, By1 = Bx2, and
Ey1 = Ex2. Symmetric segments are a pair of such segments.
Lemma 1 Let f(X,Y ) be a symmetric function, and let g1(X,Y ) and
g2(X,Y ) be bilinear interpolations of f(X,Y ) for symmetric segments. Then,
g1(X,Y ) = g2(Y,X).
Proof: See Appendix.
Theorem 2 The segmentation of a symmetric function produced by the re-
cursive planar segmentation algorithm is symmetric.
Proof: See Appendix.
From Lemma 1 and Theorem 2, we can use only one bilinear interpolation to
approximate the given symmetric function in symmetric segments. By assigning
the same segment index to symmetric segments, we can reduce the size of the
coeﬃcients memory by nearly half.
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
124 Programmable Architectures and Design Methods for Two-Variable NFGs
Table 1 Number of segments for two segmentation methods.
No. Function Domain X and Y have 8-bit accuracy X and Y have 12-bit accuracy
f(X,Y ) (Acceptable approx. error: 2−10) (Acceptable approx. error: 2−14)
X Y Number of segments Rs1 Rs2 Time [sec.] Number of segments Rs1 Rs2 Time [sec.]
Uni. Recur. Sym. [%] [%] Uni. Recur. Uni. Recur. Sym. [%] [%] Uni. Recur.
f0 sin(πX)
√
Y [0, 1) (0, 1) 16,384 997 N/A 6 N/A 0.69 0.06 16,773,120 29,875 N/A 0.2 N/A 9.26 1.97
f1 sin(πXY ) [0, 1) [0, 1) 1,024 508 263 50 26 0.07 0.03 16,384 8,389 4,232 51 26 1.00 0.42
f2 X4Y 5 [0, 1) [0, 1) 4,096 193 N/A 5 N/A 0.30 0.02 65,536 3,592 N/A 5 N/A 4.73 0.33
f3 1/
√
X2 + Y 2 (0, 1) (0, 1) 65,025 2,344 1,195 4 2 0.02 0.07 16,769,025 103,046 51,687 0.6 0.3 6.10 3.91
f4 XY/
√
X2 + Y 2 (0, 1) (0, 1) 4,096 256 139 6 3 0.12 0.01 1,048,576 4,114 2,104 0.4 0.2 28.05 0.16
f5 WaveRings [0, π] [0, π] 10,201 949 490 9 5 0.85 0.04 646,416 16,278 8,202 3 1 24.51 0.76
f6 Sombrero (0, 8) (0, 8) 4,096 1,180 607 29 15 0.21 0.06 65,536 18,664 9,398 28 14 3.40 0.93
f7
√
X2 + Y 2 (0, 1) (0, 1) 4,096 226 121 6 3 0.17 0.01 1,048,576 4,093 2,083 0.4 0.2 40.58 0.22
f8
3√X3 + Y 3 (0, 1) (0, 1) 4,096 232 127 6 3 0.33 0.02 1,048,576 3,955 2,027 0.4 0.2 78.21 0.41
Uni.: Uniform segmentation. Recur.: Recursive segmentation. Rs1: (No. of segments in Recur.) / (No. of segments in Uni.) × 100 (%).
Sym.: Symmetric segments are counted as one segment. Rs2: (No. of segments in Sym.) / (No. of segments in Uni.) × 100 (%).
Experiment environment: Sub Blade 2500 (Silver), UltraSPARC-IIIi 1.6GHz, 6GB memory, Solaris 9.
Fig. 7 Architecture for two-variable NFGs for symmetric functions.
Figure 7 shows an architecture for symmetric functions. Here, the coeﬃcients
memory stores only data for segments such that X ≤ Y . For other segments,
approximated values are computed using Lemma 1. Since the comparator and
multiplexers operate in parallel with the segment index encoder, there is no speed
penalty due to these additional circuits.
6. Experimental Results
6.1 Number of Segments and Computation Time for Algorithms
Table 1 shows the number of segments produced by the two segmentation
algorithms presented in Section 3, and their computation time for various func-
tions 1). This table also shows the number of symmetric segments for symmetric




X2 + Y 2
)
√




X2 + Y 2
)
√
X2 + Y 2
Table 1 shows that, for all functions except sin(πXY ) and Sombrero, the re-
cursive segmentation algorithm produces many fewer segments than the uniform
segmentation algorithm. Especially, for higher accuracy, the number of segments
needed in recursive segmentation is only a few percent of the number of seg-
ments needed in uniform segmentation. And, the number of symmetric segments
is even smaller. Thus, the recursive segmentation algorithm and the symmetric
technique signiﬁcantly reduce the number of words in the coeﬃcients memory.
For sin(πXY ) and Sombrero, the additional segments needed in uniform seg-
mentation are not so large even for higher accuracy. This means that, for these
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
125 Programmable Architectures and Design Methods for Two-Variable NFGs
Table 2 Total memory sizes needed for the proposed NFGs.
No. 8-bit accuracy NFGs 12-bit accuracy NFGs
Uniform Recursive Sym. Rm1 Rm2 Uniform Recursive Sym. Rm1 Rm2
f0 409,600 57,732 N/A 14 N/A 201,277,440 2,167,788 N/A 1 N/A
f1 37,888 34,580 19,417 91 51 737,280 701,311 356,164 95 48
f2 118,784 13,817 N/A 12 N/A 2,621,440 226,644 N/A 9 N/A
f3 1,040,400 175,760 97,236 17 9 402,456,600 9,412,758 4,698,276 2 1
f4 118,784 16,064 10,145 14 9 34,603,008 293,330 153,176 0.8 0.4
f5 397,839 71,981 39,986 18 10 27,149,472 1,559,560 797,279 6 3
f6 143,360 74,896 40,772 52 28 2,818,048 1,487,068 757,238 53 27
f7 118,784 14,908 9,334 13 8 34,603,008 287,868 153,291 0.8 0.4
f8 135,168 15,512 9,658 11 7 38,797,312 294,328 154,309 0.8 0.4
Rm1: Recursive / Uniform × 100 (%). Rm2: Sym. / Uniform × 100 (%).
functions, the uniform segmentation method also produces NFGs with reasonable
size.
In addition, Table 1 shows that both algorithms produce segments with small
CPU time. Such quick segmentation is useful to reduce design time for NFGs.
6.2 Memory Sizes Needed for Numeric Function Generators
Table 2 compares total memory sizes needed for the three NFGs shown in
Fig. 3 and Fig. 7. Note that the NFGs based on recursive segmentation have two
kinds of memories: coeﬃcients memory and LUT memory. The memory size
shown is the sum of the coeﬃcients memory size and the LUT memory sizes.
Table 2 shows that, for all functions, NFGs based on recursive segmentation
require smaller memory size than NFGs based on uniform segmentation, even
though NFGs based on recursive segmentation have a segment index encoder.
For example, for f4(X,Y ) = XY/
√
X2 + Y 2, the 12-bit accuracy NFG using
recursive segmentation requires only 0.8% of memory required by uniform seg-
mentation. Especially for symmetric functions, using the symmetric technique
shown in Section 5 reduces the memory size signiﬁcantly.
To understand the relation between memory size and accuracy, we designed
NFGs for XY/
√
X2 + Y 2 with various accuracies. Figure 8 plots memory sizes
of the NFGs for 4 to 16-bit accuracies. There are four curves:
( 1 ) a single look-up table in which the values assigned to X and Y form an
address and the contents of that address is f(X,Y ),
( 2 ) NFGs using uniform segmentation,
Fig. 8 Memory size versus accuracy for XY/
√
X2 + Y 2.
( 3 ) NFGs using recursive non-uniform segmentation, and
( 4 ) NFGs using the symmetric technique.
Interestingly, for this function, the memory size of the NFGs using uniform seg-
mentation increases in the same way as the memory size of a single look-up table.
On the other hand, the memory sizes of the NFGs using recursive segmentation
and the NFGs using symmetric technique increase much more slowly than the
other two. For 16-bit accuracy, the memory sizes of the NFG using recursive
segmentation and the NFG using symmetric technique are only 0.06% and 0.03%
of that of the NFG using uniform segmentation, respectively.
6.3 FPGA Implementation Results
To show the merits and demerits of the three proposed methods, we compare
the performance of the NFGs designed by the three methods. We implemented
12-bit accuracy NFGs using the Altera Stratix III FPGA. Since the FPGA has
adaptive look-up tables (ALUTs) that can realize fast adders, synchronous mem-
ory blocks, and dedicated multipliers, our NFGs are eﬃciently implemented by
those hardware resources in the FPGA. Table 3 compares the FPGA implemen-
tation results of the NFGs. In this table, the three columns labeled “Delay” show
the total delay time of each NFG from the input to the output, in nanoseconds.
The NFGs based on uniform segmentation require fewer pipeline stages and
have shorter delay than the recursive segmentation because they have no segment
index encoder. However, for six functions, the memory needed for NFGs based
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
126 Programmable Architectures and Design Methods for Two-Variable NFGs
Table 3 FPGA implementation of 12-bit accuracy NFGs.
FPGA device: Altera Stratix III (EP3SL340F1517C2) Logic synthesis tool: Altera QuartusII 9.0
No. Uniform segmentation Recursive segmentation Symmetric method
#ALUTs #DSPs Freq. #stages Delay #ALUTs #DSPs Freq. #stages Delay #ALUTs #DSPs Freq. #stages Delay
[MHz] [ns] [MHz] [ns] [MHz] [ns]
f0 – 0 – 1 – 440 6 230 15 65 N/A N/A N/A N/A N/A
f1 49 5 203 4 20 271 10 191 9 47 297 9 191 10 52
f2 206 0 306 4 13 266 8 187 10 53 N/A N/A N/A N/A N/A
f3 – 0 – 1 – – 7 – 18 – 644 7 174 16 92
f4 – 0 – 4 – 220 6 228 10 44 273 6 222 11 50
f5 – 3 – 4 – 477 10 221 13 59 493 10 221 13 59
f6 153 4 230 4 17 336 8 192 11 57 293 8 191 11 57
f7 – 1 – 4 – 237 6 231 10 43 279 6 231 11 48
f8 – 1 – 4 – 236 8 199 10 50 255 8 199 10 50
–: NFGs cannot be mapped into the FPGA due to insuﬃcient memory blocks.
#ALUTs: Number of ALUTs. #DSPs: Number of 18-bit × 18-bit DSP units. Freq.: Operating frequency. #stages: Number of pipeline stages.
on uniform segmentation is so large that they could not be implemented on
the FPGA. Note that NFGs that have only one pipeline stage in Table 3 are
realized with a single look-up table due to the excessively many segments. On
the other hand, for all functions except for f3(X,Y ) = 1/
√
X2 + Y 2, the NFGs
based on recursive segmentation do not require excessive memory size and can
be implemented on the FPGA. Further, the successful implementations achieve
a high operating frequency. Since the symmetric technique signiﬁcantly reduces
memory size, even function f3 can be implemented with the FPGA. But, the
symmetric technique has some speed penalty because it produces a slightly more
complex segment index encoder.
In this way, the three methods have diﬀerent merits and demerits, and thus, we
can use the three methods appropriately depending on applications and numeric
functions.
Although the recursive segmentation and symmetric technique have some speed
penalty as shown in Table 3, the penalty is reasonable. To show that, we compare
our NFGs with an NFG designed in a conventional (trivial) manner that uses a
combination of one-variable NFGs and basic operations like addition and mul-
tiplication. We implemented f4(X,Y ) = XY/
√
X2 + Y 2 with the same FPGA
using a one-variable NFG for 1/
√
X, two squaring circuits, an adder, and two
multipliers. The one-variable NFG was realized by the method shown in 15),
Table 4 FPGA implementation of various NFGs for XY/
√
X2 + Y 2.
FPGA device: Altera Stratix III (EP3SL340F1517C2)
Logic synthesis tool: Altera QuartusII 9.0
Memory #LEs #DSPs Freq. #stages Delay
NFGs [bits] [MHz] [nsec.]
12-bit accuracy
One-variable 269,136 266 10 192 14 73
Uniform 34,603,008 – 0 – 4 –
Recursive 293,330 220 6 228 10 44
Symmetric 153,176 273 6 222 11 50
which is based on linear approximation and non-uniform segmentation. Table 4
compares the results with our NFGs.
Our NFG based on recursive segmentation requires fewer ALUTs and DSPs
than the implementation using one-variable NFG, and has much shorter delay.
Especially, the NFG designed by the symmetric method achieves both less mem-
ory and shorter delay. This shows that the speed penalties caused by the recursive
segmentation and the symmetric method are small enough.
From these results, we can see that by designing two-variable functions using
one-variable NFGs, the required memory size can be reduced signiﬁcantly. How-
ever, depending on the functions, it can produce a slow implementation because
of additional logic, such as multipliers. Also, complicated architectures using
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
127 Programmable Architectures and Design Methods for Two-Variable NFGs
one-variable NFGs make error analysis harder, and it is harder to guarantee out-
put accuracy. This increases design time. On the other hand, for a large class of
functions, we can automatically generate fast and compact NFGs whose output
accuracy is guaranteed.
7. Concluding Remarks
We have proposed programmable architectures and design methods for numeric
function generators of two-variable functions. To realize a two-variable function
in hardware, we partition the given domain of the function into segments, and
approximate the given function by a polynomial in each segment. In this paper,
we presented two planar segmentation algorithms which partition a given do-
main of two-variable function eﬃciently. We also presented a design method for
symmetric two-variable functions. To the best of our knowledge, these are the
ﬁrst systematic design methods based on piecewise polynomial approximation
for two-variable functions. Experimental results showed that for a complicated
function, our automatically generated NFG achieves higher performance than the
NFG that is manually designed in a conventional manner.
In the proposed architectures, the coeﬃcients memory and the LUT memories
of the segment index encoder can be implemented by embedded RAMs in an
FPGA (e.g., M4Ks in Altera FPGAs). Thus, by changing the data for the co-
eﬃcients memory and the LUT memories, a wide class of two-variable functions
can be realized by a single architecture. Since just changing the RAM data can
switch functions, we can switch functions without reprogramming the FPGA.
The algorithms and architectures presented in this paper can be easily extended
to functions with three or more variables.
Acknowledgments This research is partly supported by the Grant in Aid
for Scientiﬁc Research of the Japan Society for the Promotion of Science (JSPS),
Knowledge Cluster Initiative (the second stage) of MEXT (Ministry of Education,
Culture, Sports, Science and Technology), a contract with the National Security
Agency, and the MEXT Grant-in-Aid for Young Scientists (B), 20700051, 2009.
References
1) Anton, H.: Multivariable Calculus, John Wiley & Sons, Inc. (1995).
2) Bryant, R.E.: Graph-based algorithms for boolean function manipulation, IEEE
Trans. Comput., Vol.C-35, No.8, pp.677–691 (1986).
3) Clarke, E.M., McMillan, K.L., Zhao, X., Fujita, M. and Yang, J.: Spectral trans-
forms for large Boolean functions with applications to technology mapping, Proc.
30th ACM/IEEE Design Automation Conference, pp.54–60 (June 1993).
4) De Caro, D. and Strollo, A.G.M.: High-performance direct digital frequency syn-
thesizers using piecewise-polynomial approximation, IEEE Trans. Circuit and Sys-
tems, Vol.52, No.2, pp.324–337 (2005).
5) Detrey, J. and de Dinechin, F.: Table-based polynomials for fast hardware function
evaluation, 16th IEEE Inter. Conf. Application-Specific Systems, Architectures, and
Processors (ASAP’05 ), pp.328–333 (2005).
6) Gutierrez, R. and Valls, J.: Implementation on FPGA of a LUT based atan(y/x)
operator suitable for synchronization algorithms, Proc. IEEE Conf. Field Pro-
grammable Logic and Applications, pp.472–475 (Aug. 2007).
7) Huang, Z. and Ercegovac, M.D.: FPGA implementation of pipelined on-line scheme
for 3-D vector normalization, Proc. 9th Annual IEEE Symp. Field-Programmable
Custom Computing Machines (FCCM’01 ), pp.61–70 (Apr. 2001).
8) Lai, Y-T. and Sastry, S.: Edge-valued binary decision diagrams for multi-level
hierarchical verification, Proc. 29th ACM/IEEE Design Automation Conference,
pp.608–613 (1992).
9) Lai, Y-T., Pedram, M. and Vrudhula, S.B.: EVBDD-based algorithms for linear
integer programming, spectral transformation and functional decomposition, IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst., Vol.13, No.8, pp.959–975 (1994).
10) Lee, D.-U., Luk, W., Villasenor, J. and Cheung, P.Y.K.: Hierarchical segmentation
schemes for function evaluation, Proc. IEEE Conf. Field-Programmable Technology,
Tokyo, Japan, pp.92–99 (Dec. 2003).
11) Meinel, C. and Theobald, T.: Algorithms and Data Structures in VLSI Design:
OBDD — Foundations and Applications, Springer (1998).
12) Muller, J.-M.: Elementary Function: Algorithms and Implementation, Birkhauser
Boston, Inc., New York, NY, second edition (2006).
13) Nagayama, S. and Sasao, T.: On the optimization of heterogeneous MDDs, IEEE
Trans. CAD, Vol.24, No.11, pp.1645–1659 (2005).
14) Nagayama, S., Sasao, T. and Butler, J.T.: Compact numerical function genera-
tors based on quadratic approximation: Architecture and synthesis method, IEICE
Trans. Fundamentals, Vol.E89-A, No.12, pp.3510–3518 (2006).
15) Nagayama, S., Sasao, T. and Butler, J.T.: Design method for numerical function
generators using recursive segmentation and EVBDDs, IEICE Trans. Fundamen-
tals, Vol.E90-A, No.12, pp.2752–2761 (2007).
16) Nagayama, S., Butler, J.T. and Sasao, T.: Programmable numerical function gen-
erators for two-variable functions, EUROMICRO Conference on Digital System
Design (DSD-2008 ), pp.891–898 (Sep. 2008).
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
128 Programmable Architectures and Design Methods for Two-Variable NFGs
17) Nagayama, S., Sasao, T. and Butler, J.T.: Numerical function generators using
bilinear interpolation, Proc. IEEE International Conference on Field Programmable
Logic and Applications, pp.463–466 (Sep. 2008).
18) Pin˜eiro, J.-A., Oberman, S.F., Muller, J.-M. and Bruguera, J.D.: High-speed func-
tion approximation using a minimax quadratic interpolator, IEEE Trans. Comp.,
Vol.54, No.3, pp.304–318 (2005).
19) Sasao, T., Nagayama, S. and Butler, J.T.: Numerical function generators using
LUT cascades, IEEE Trans. Comput., Vol.56, No.6, pp.826–838 (2007).
20) Schulte, M.J. and Stine, J.E.: Approximating elementary functions with symmetric
bipartite tables, IEEE Trans. Comput., Vol.48, No.8, pp.842–847 (1999).
21) Spa¨th, H.: Two Dimensional Spline Interpolation Algorithms, A K Peters, Ltd.,
Wellesley, MA (1995).
22) Takagi, N. and Kuwahara, S.: A VLSI algorithm for computing the Euclidean
norm of a 3D vector, IEEE Trans. Comput., Vol.49, No.10, pp.1074–1082 (2000).
Appendix
This appendix shows the proofs of Theorem 1, Lemma 1, and Theorem 2.
The proof of Theorem 1 is based on a theorem proven in Ref. 15). Speciﬁcally,
it was shown that
Theorem A Let g(Z) be a k-valued monotone increasing function. The func-
tion g(Z) can be realized by the segment index encoder shown in Fig. 4 (b) with
at most log2 k rails and log2 k Arails 15).
Theorem 1 Let seg func(X,Y ) be a segment index function obtained by a
recursive planar segmentation. The segment index function can be realized by the
segment index encoder shown in Fig. 4 (b) with at most log2 k rails and at most
log2 k Arails, where k is the number of segments.
Proof: By forming a variable
Z = (xl−1 yl−1 xl−2 yl−2 . . . x−m y−m)
from X and Y , seg func(X,Y ) obtained by the recursive planar segmentation
algorithm can be converted into a k-valued monotone increasing function g(Z).
Therefore, from Theorem A, we have this theorem.
Lemma 1 Let f(X,Y ) be a symmetric function, and let g1(X,Y ) and
g2(X,Y ) be bilinear interpolations of f(X,Y ) for symmetric segments. Then,
g1(X,Y ) = g2(Y,X).
Proof: Let g1(X,Y ) = Cxy1XY + Cx1X + Cy1Y + C01 and g2(X,Y ) =
Cxy2XY +Cx2X +Cy2Y +C02. From the deﬁnition of bilinear interpolation, in
symmetric segments, the following hold: Cxy1 = Cxy2, Cx1 = Cy2, Cy1 = Cx2,
and C01 = C02. Therefore, we have the lemma.
To prove Theorem 2, we deﬁne the following:
Definition A Diagonal segment {[Bx, Ex), [By, Ey)} is a segment such
that Bx = By and Ex = Ey.
Theorem 2 The segmentation of a symmetric function produced by the re-
cursive planar segmentation algorithm is symmetric.
Proof: Let {[Bx1, Ex1), [By1, Ey1)} and {[Bx2, Ex2), [By2, Ey2)} be symmetric
segments. Since (Bx1 + Ex1)/2 = (By2 + Ey2)/2 and (By1 + Ey1)/2 = (Bx2 +
Ex2)/2, the segmentation of the symmetric segments into four equal-sized square
segments is symmetric. The segmentation of diagonal segment into four equal-
sized square segments is also symmetric.
From Lemma 1, the maximum approximation errors caused in symmetric seg-
ments are equal. Thus, if a segment is partitioned, then another segment sym-
metric to the segment is also partitioned.
Therefore, the recursive planar segmentation algorithm produces a symmetric
segmentation for a symmetric function.
(Received May 7, 2009)
(Revised September 1, 2009)
(Accepted October 31, 2009)
(Released February 15, 2010)
(Recommended by Associate Editor: Yusuke Matsunaga)
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
129 Programmable Architectures and Design Methods for Two-Variable NFGs
Shinobu Nagayama received the B.S. and M.E. degrees from
Meiji University, Kanagawa, Japan, in 2000 and 2002, respectively,
and the Ph.D. degree in computer science from Kyushu Institute
of Technology, Iizuka, Japan, in 2004. He is now a lecturer at Hiro-
shima City University, Hiroshima, Japan. He received the Out-
standing Contribution Paper Award from the IEEE Computer So-
ciety Technical Committee on Multiple-Valued Logic (MVL-TC)
in 2005 for a paper presented at the International Symposium on Multiple-Valued
Logic in 2004, and the Excellent Paper Award from the Information Processing
Society of Japan (IPS) in 2006. His research interest includes numeric function
generators, decision diagrams, software synthesis, and embedded systems.
Tsutomu Sasao received the B.E., M.E., and Ph.D. degrees in
electronics engineering from Osaka University, Osaka, Japan, in
1972, 1974, and 1977, respectively. He has held faculty/research
positions at Osaka University, Japan, the IBM T.J. Watson Re-
search Center, Yorktown Heights, New York, and the Naval Post-
graduate School, Monterey, California. He is now a Professor of
the Department of Computer Science and Electronics at Kyushu
Institute of Technology, Iizuka, Japan. His research areas include logic design and
switching theory, representations of logic functions, and multiple-valued logic. He
has published more than nine books on logic design, including Logic Synthesis
and Optimization, Representation of Discrete Functions, Switching Theory for
Logic Synthesis, and Logic Synthesis and Veriﬁcation, Kluwer Academic Pub-
lishers, 1993, 1996, 1999, and 2001, respectively. He has served as Program
Chairman for the IEEE International Symposium on Multiple-Valued Logic (IS-
MVL) many times. Also, he was the Symposium Chairman of the 28th ISMVL
held in Fukuoka, Japan, in 1998. He received the NIWA Memorial Award in
1979, Distinctive Contribution Awards from the IEEE Computer Society MVL-
TC for papers presented at ISMVLs in 1986, 1996, 2003 and 2004, and Takeda
Techno-Entrepreneurship Award in 2001. He has served as an Associate Editor
of the IEEE Transactions on Computers. He is a fellow of the IEEE.
Jon T. Butler received the B.E.E. and M.Engr. degrees from
Rensselaer Polytechnic Institute, Troy, New York, in 1966 and
1967, respectively. He received the Ph.D. degree from The Ohio
State University, Columbus, in 1973. Since 1987, he has been
a professor at the Naval Postgraduate School, Monterey, Cali-
fornia. From 1974 to 1987, he was at Northwestern University,
Evanston, Illinois. During that time, he served two periods of
leave at the Naval Postgraduate School, ﬁrst as a National Research Council
Senior Postdoctoral Associate (1980–1981) and second as the NAVALEX Chair
Professor (1985–1987). He served one period of leave as a foreign visiting pro-
fessor at Kyushu Institute of Technology, Iizuka, Japan. His research interests
include logic optimization and multiple-valued logic. He has served on the ed-
itorial boards of the IEEE Transactions on Computers, Computer, and IEEE
Computer Society Press. He has served as the editor-in-chief of Computer and
IEEE Computer Society Press. He received the Award of Excellence, the Out-
standing Contributed Paper Award, and a Distinctive Contributed Paper Award
for papers presented at the International Symposium on Multiple-Valued Logic.
He received the Distinguished Service Award, two Meritorious Awards, and nine
Certiﬁcates of Appreciation for service to the IEEE Computer Society. He is a
fellow of the IEEE.
IPSJ Transactions on System LSI Design Methodology Vol. 3 118–129 (Feb. 2010) c© 2010 Information Processing Society of Japan
