Abstract-Design reuse requires engineers to determine whether or not an existing block implements desired functionality. If a common high-level circuit model is used to represent components that are described at multiple levels of abstraction, comparisons between circuit specifications and a library of potential implementations can be performed accurately and quickly. A mechanism is presented for compactly specifying circuit functionality as polynomials at the word level. Polynomials can be used to represent circuits that are described at the bit level or arithmetically. Furthermore, in representing components as polynomials, differences in precision between potential implementations can be detected and quantified. We present a mechanism for constructing polynomial models for combinational and sequential circuits. Furthermore, we derive a means of approximating the functionality of nonpolynomial functions and determining a bound on the error of this approximation. These methods have been implemented in the POLYSYS synthesis tool and used to synthesize a JPEG encode block and infinite impulse response filter from a library of complex elements.
I. INTRODUCTION
T HE increased complexity of integrated circuits has forced designers to reuse existing circuitry when constructing new systems. The proliferation of reusable blocks has promised opportunities to complete new designs more quickly and with fewer errors. Reuse of existing components requires those components to have suitable characteristics, including area, power consumption, performance, and testing features. However, it is most important that the component implement the functionality required by the system. Searching the space of existing implementations for functional validity is time consuming and fraught with pitfalls, as the suitability of existing blocks is determined by manual methods or verbal descriptions. This search promises to become more complex as the number and need for reusable designs increases [1] . The models and methods presented in this paper enable automation of this search by generating circuit representations that are at a higher level of abstraction than those used in traditional library binding applications.
Component matching is the problem of allocating complex blocks given a system specification. This problem reduces to determining whether or not the functionality of a library element is the same as the functionality of part of a specification. For example, in designing the baseline JPEG encode block of Fig. 1, subblocks are required to perform a discrete cosine transform (DCT), quantization, dc (zero frequency) encoding, and ac (nonzero frequency) encoding. Given a library of existing blocks, a word level representation can be derived from the Boolean equations that describe the functionality of library elements. The Boolean equations that specify an existing block can be derived in a straightforward manner from commonly available component netlists. The JPEG system can then be synthesized by matching the arithmetic specification of each of these functions to the word-level representation of each library element. Component matching is closely related to verifying that a specification and an implementation match exactly, but presents important differences. In matching a component to a specification, it is valuable to detect components that implement functionality that is similar to, but not necessarily the same as, that of the specification. For example, in performing DCT operations, a specification may require computation of the . One possible implementation may be a function that does not implement exactly, but instead implements an approximation of the function to preserve area and increase performance. Furthermore, a specification may indicate that computation of can take up to three cycles; however, existing implementations may exist that require only two cycles. Thus, the specification and implementation are similar, but do not match exactly, allowing for tradeoffs in execution time, area, power consumption, precision, and other qualities.
The examples discussed above can be specified very efficiently with polynomial models. For example, can be approximated by This article presents methods for developing analogous wordlevel polynomial models for existing implementations given a 1063-8210/01$10.00 © 2001 IEEE bit-level description of the implementation. These methods are ideally suited for circuits that implement arithmetic functions and can be applied to combinational and sequential circuits.
This comparison often must be performed between arithmetic and bit-level abstractions of the functionality. Polynomial methods provide a means for generating word-level polynomial representations, given bit-level descriptions of an implementation. In generating a mathematical structure common to both levels of abstraction, allocation of complex components can be performed, closing the semantic gap between specifications such as those generated in MATLAB and implementations, such as those modeled with Boolean logic or hardware design languages (HDLs). This technique is used by the POLYSYS synthesis tool to map arithmetic specifications onto existing designs that are described by Boolean equations.
The techniques presented here are most effective for allocating blocks that are arithmetic intensive, but may contain significant control logic. Common application domains that fit this description include computer graphics and digital signal processing. To illustrate the application of the polynomial methods developed in this article, we map a JPEG encode specification to complex elements and compare the specification of a filter suitable for controlling the velocity of a tape through a tape drive to an existing filter. The arithmetic specification for the JPEG encode block and the IIR filter are derived from MATLAB, while the existing implementation are described by Boolean equations.
II. RELATED WORK
Reusable blocks have traditionally been characterized by verbal or object-oriented descriptions [2] , [3] such as "ethernet core" or "rasterizer," combined with component-specific attributes, such as "floating point" or "integer," and waveforms. Precise descriptions of functionality are usually restricted to smaller blocks such as combinational logic gates or simple arithmetic operations (e.g., addition or multiplication). For example, in allocating a JPEG block, current techniques may require that the specification and implementation both be described by the keyword "JPEG." This description is imprecise, however, as potential JPEG implementations may implement different compression schemes, different levels of accuracy, or operations on data sets of different sizes.
Component matching has historically been restricted to matching bit-level circuit specifications to logic gates. Many structures, such as binary decision diagrams (BDDs) [4] , are ideal for mapping combinational logic onto a library of gates. The canonicity and ease of composition that BDDs provide make them ideal for matching small combinational circuits. However, for more complex functions, like multiplication, the potentially exponential size of BDD structures makes comparison of BDDs time consuming and memory intensive. When comparisons are sought between functions that are not described at the bit level, BDD structures are not sufficient to represent circuit functionality. Furthermore, BDDs can yield information on whether or not a specification and implementation match exactly, but offer no path for quantifying the degree to which the two differ. That is, two functions that have similar, but not equal, BDD structures may implement drastically different arithmetic functions, while two very different BDDs may implement the same mathematical operation with different degrees of precision.
Binary moment diagrams (BMDs) [5] have been developed to ease the memory and time required to manipulate complex structures by generating word level representations. BMDs have been used to verify the functionality of linear circuits [6] and could be adapted to perform component matching for those circuits. However, BMDs are unsuitable for use in non linear functions because of the resulting exponential complexity. hybrid decision diagrams [7] and multiterminal BDDs [8] suffer from similar restrictions. power hybrid decision diagrams (PHDDs), developed in [9] are well suited to handling the non linearities associated with floating point arithmetic, but can still require, in the worst case, an exponentially large data structure to represent nonlinear functions.
In order to raise the complexity of blocks for which a functional characterization can be generated, algorithms have been developed to reduce the size of circuit representations. This can be achieved by generating data structures that represent an approximation of circuit functionality. For example, in [10] , a compact circuit approximation is derived that minimizes the number of input assignments for which the approximation and the actual circuit differ. In contrast, our work generates a compact circuit approximation that minimizes the numerical distance between the functionality of the representation and the actual block. Similarly, the allocation mechanism presented in this paper determines the accuracy of a match by the numerical distance between a specification and a possible implementation.
Minato introduced a method for modeling and manipulating circuits that implement polynomial functions using zero-suppressed BDDs [11] . This structure provides an efficient representation for those circuits for which a polynomial description is specified, but becomes exponentially large if discontinuities exist in the function. The methods we will present here develop a mechanism for deriving the polynomial representation given a Boolean circuit description. In addition, we will present a mechanism for detecting circuit discontinuities and generating compact approximations for highly discontinuous circuits.
Efficient component matching requires data structures that are canonical, constructible in polynomial time, and allow for simple composition. This paper will demonstrate methods for determining polynomial representations for circuits that are described at the bit level. Furthermore, we will prove that a unique minimum-order polynomial representation exists for all circuitry without feedback. In representing hardware as polynomials, blocks can be efficiently compared with one another to determine if they implement the same functionality. In addition, polynomials are easily composable, allowing efficient determination of the functionality of hierarchical or partitioned blocks.
III. POLYNOMIAL REPRESENTATIONS
To map an arithmetic specification to a complex element that is described at the bit level by Boolean logic, a word-level polynomial that encapsulates the element's functionality is derived.
We consider only completely-specified Boolean functions for the sake of simplicity. Generating a word-level polynomial representation for a Boolean function may appear to be an inconsistent problem because Boolean functions are inherently discontinuous. However, a Boolean function are bit vectors of length and , respectively, can be treated as a set of coordinates ( ), where Encode Decode Encode Decode
Thus, "encode" is an integer interpretation of a Boolean vector, such as two's complement or sign magnitude and "decode" is the inverse transformation. For the sake of simplicity, this paper will focus on those components based on two's complement arithmetic. The following encoding examples will be referred to in succeeding sections:
A minimum-order polynomial can be determined that fits the set of coordinates ( ). If the order of this polynomial is known to be , then coordinates can be extracted from the function and a set of equations and variables (the coefficients of the polynomial) can be constructed and solved. Thus, the problem of generating a word level polynomial representation for a Boolean function reduces to determining the order of the polynomial.
A. Existence and Uniqueness
The following theorem is the basis for determining the polynomial representation of circuits described at the bit level. This theorem, derived from the binomial distribution from traditional calculus, is proven for integers and used to prove the existence and uniqueness of polynomial representations of Boolean functions. ) that represents this circuit and would match a specification that requires the computation of the third power of .
B. Polynomial Computation
In the previous section, we have proven that any combinational circuit can be uniquely represented by a minimum-order polynomial. Once the order of this polynomial is determined, then the coefficients of the polynomial can be calculated by examining a finite number of circuit outputs. Thus, the problem of determining a canonical polynomial representation for a circuit can be reduced to finding the order of the polynomial that represents that circuit.
To begin deriving a method for determining the order of a Boolean function, remember from Theorem 3.2 that a polynomial representation , where , always exists for a Boolean function . Furthermore, from Theorem 3.1, we might deduce that the order of will be reduced by exactly one by computing . Therefore, the order of could be determined exactly by recursively performing until this difference is identically zero for all values of . In the algorithm discussed here, two's complement arithmetic is employed to compute this difference. The number of iterations required to set is the order of the unique, minimum-order polynomial that represents the circuit. In computing the order of a Boolean function, we assume that each output bit ( , , , ) of the function is represented as a Binary Decision Diagram. While this does present an exponentially sized data structure for some functions, we will show a heuristic in Section IX that reduces this data structure to linear complexity with respect to the number of input bits. In Sections III-B1-B4, we derive in detail the steps required to compute and determine if . These sections provide the rationale for the order computation algorithm shown in Fig. 3 To reduce the complexity the negation, we transform the problem of recursively computing until to the problem of recursively computing until . This is the equivalent of computing in two's complement encoding. This computation reduces the order of by one on each iteration, but avoids the complexity introduced by incrementation. This is possible because, on successive computations of , the subtraction of one does not accumulate 1st iteration:
Thus, instead of computing to reduce the order of by one, we compute , which is a computationally simpler way to reduce the order of by one.
3) Performing : Once and have been determined, the two functions are summed to produce the new reduced order . If this summation is performed in ripple carry fashion, the number of logic operations required is exponentially complex with respect to word length, due to the propagation of the carry. This is a result of the fact that for the th bit, the carry computation requires logic operations (note that complexity can be reduced by factoring the equation for ripple carry addition). To eliminate the additional complexity associated with ripple carry addition, a carry-save addition can be performed. Let us define where and are applied bitwise. Thus, is uniquely specified as Note that there are now two terms that must be complemented when recursively computing . These terms are and . Complementing both terms requires, according to two's complement arithmetic, a bit-wise inversion and an increment of each term. As in Section III-B2, in order to avoid these increments and their associated carry operations, order reduction can be performed by recursively computing until . The condition for terminating recursion has changed to because the equivalent computation in two's complement arithmetic is Since and are specified as the summation of a sum and carry term, their summation can be performed in two steps, as if two carry-save additions ( Fig. 2) were executed.
With these transformations, the order of is successively being reduced by one by recursively computing . This computation is of polynomial complexity with respect to the size of the BDD representation of .
4) Checking if
: Using a two's complement encoding, the following transformations can be used to determine if the recursively computed , without performing a ripple carry addition To avoid performing the ripple carry addition, a two-stage carry-save increment is performed at the end of each recursive step by performing the following logic operations ( ) Each bit of the resulting sum ( ) is checked for tautology and each bit of the resulting carry ( ) is checked whether it is tautologically zero. We refer to this test as the tautology check and it is necessary and sufficient to guarantee as proven in Theorem 3.4. As a result, the ripple carry computation does not need to be performed. after one recursion of order reduction with respect to an bit vector , the bounding function would be . After two iterations, the bounding function would be . If the input is out of range when incremented, i.e., , then the resulting is immaterial, since the input pattern can not be applied. Thus, requires that if is not a tautology, the bounding function must be true. Similarly, if is not tautologically zero, the bounding function must be true if . 6) The Complete Algorithm: The complete algorithm for computing the order of a Boolean function , given its BDD representation, is shown in Fig. 4 .
Step 1) Initialize the function to and the function to , an operation of linear complexity with respect to the size of the BDD representation of .
Step 2) Compute by complementing and , an operation of constant complexity with respect to BDD size.
Step 3) Compute the function by replacing with in the functions and , an operation of quadratic complexity with respect to BDD size.
Step 4) Reduce the order of by exactly one by computing the sum . This computation is performed by adding the results of Steps 2) and 3) with a two-stage carry-save addition, producing a new and . This step is of quadratic complexity with respect to BDD size.
Step 5) Compute the bounding function that restricts the domain over which the sum is evaluated, an operation that is of constant complexity relative to BDD size.
Step 6) Check the sum to see if each output bit is a tautology within the bounds specified by , an operation of constant complexity with respect to BDD size. If the tautology check is unsuccessful.
Step 7) Set and to the result of Step 5) and initiates a new recursion, an operation of linear complexity with respect to BDD size. Otherwise, the order of the minimum-order polynomial representa-tion is one less than the number of recursive computations that were performed. The following steps are followed to determine the order of these input vectors.
1) :
2) :
3) (1st iteration) 4) Tautology Check fails 5) (2nd iteration) 6) Tautology Check fails.
7) (3rd iteration) 8) Tautology Check
Three iterations reduce to zero for all . Thus, is of order 2.
Each step within the order computation algorithm is of polynomial complexity with respect to the number of nodes in the BDD representation of . However, the minimum-order polynomial representation may be of exponential order with respect to the number of bits in the input word . Thus, the number of recursions that are performed may be exponential. Sections IV and VII detail partitioning and approximation algorithms for efficiently generating polynomial representations for those circuits whose representations would otherwise be of exponential order.
Once the order of the function has been determined to be , is evaluated at , Decode . Solving the following set of linear equations for yields the polynomial representation of the Boolean function Encode Encode Encode Decode
C. Extension to Multivariable Functions
The techniques described above consider only univariable functions. However, multivariable polynomials exhibit the same features that allow order computation to be performed recursively; that is, recursively reduces the order of with respect to by one on each iteration if is held constant. Thus, the order of can be determined with respect to and with respect . However, the unique, minimum-order polynomial computation requires solving a set of simultaneous linear equations, where is the order with respect to and is the order with respect to .
IV. REPRESENTATION OF FUNCTIONS CONTAINING BRANCHES
To this point, the methods we have described allow computation of a polynomial representation for combinational circuits. As proven in Theorem 3.2, polynomial representations exist for all combinational circuits. For those circuits that implement arithmetic functions such as those generated by composing addition and multiplication operations, this representation is of very low order (e.g., one term to represent multiplication, two terms to represent addition). Consider, however, models of combinational circuits that contain branches, i.e., discontinuities.
For such circuits, polynomial representations, if computed using only the techniques described above, are usually of exponential order with respect to input word size. This is because a branch in the Boolean domain usually describes a set of coordinates in the integer domain that can only be fit to an exponentially-large polynomial. However, a high-order polynomial representation is an indicator that a branch exists within a circuit. This indicator can be used to partition circuit inputs into domains in which polynomial representations of low complexity exist. The boundaries of these domains are termed discontinuities. The encoder is performing an operation within each branch that is represented by polynomials of order zero. However, using the order computation methods described above, the discontinuities at the integer values cause the overall circuit to have a polynomial representation of order . To prevent an exponential number of order computation recursions from being performed on functions that contain branches, we use a heuristic based on a discontinuity threshold. Once the number of iterations has reached this threshold, the function is assumed to contain branches. The threshold is determined heuristically and enables efficient detection of discontinuities. Discontinuity detection, in turn, allows order computation to be performed on each branch of the circuit model.
Given a function , with order greater than the discontinuity threshold, discontinuities can be detected by performing order computation on for the case and the case . If the orders for each computation are different, and below the discontinuity threshold, a discontinuity has been detected and exists between and . If the order of , for or , is still above the threshold, then a discontinuity exists within the corresponding domain. Within that domain, an order computation is then performed on for the case and the case . Domain partitioning continues until the discontinuity is detected.
Similar to performing a binary search, detection of a single discontinuity is of linear complexity with respect to the number of input bits, not considering the complexity of the order computation. If we proceed blindly, computing the order of will generate an order of because of the discontinuity at . However, if we start with an initial discontinuity threshold of four, then after four order iterations, the uppermost bit of will be set to zero, then one, and the order computations will be performed for each case. The order computation for will result in an order of two. The order computation for will again reach the fourth iteration without passing the tautology check. The second most significant bit is set to zero, then one, and the order computation is performed again. Then order computation for will result in an order of 3 and the computation for will result in an order of two. Since both computations converged, but converged to different values, there is a discontinuity on the interval boundary. Thus, over the integer interval [0, 11] an order of two is determined and over the integer interval [12, 15] an order of three is determined.
Every discontinuity detected introduces a new polynomial into the description of a component. If the number of discontinuities is large, the polynomial representation of a component will also become large. Such cases can be handled by implementing a heuristic based on a domain threshold. If the number of discontinuities is greater than this threshold, then the functionality of the component may be approximated by the polynomial representation. The approximation technique is described in Section VII.
V. SYNCHRONOUS ACYCLIC CIRCUITS
From Theorem 3.2, we have established that a polynomial representation, , exists for all combinational circuits. This is due to the fact that combinational circuits specify a finite number of input/output pairs ( ) with corresponding integer values ( ) that can be treated as coordinates to which a polynomial can be fit. Synchronous circuits pose an additional problem because circuit outputs are not only a function of the current inputs but also previous inputs. Thus, the polynomial representation of a synchronous circuit contains terms that are dependent on previous input values:
. The symbol indicates the value of that is delayed by cycles.
A. Determining Combinational Equivalents
A polynomial representation for synchronous acyclic circuits can be computed by computing the polynomial representation for the equivalent combinational circuit with delayed input values. Consider a synchronous circuit represented by a synchronous logic network, i.e., a directed acyclic graph whose vertices represent combinational logic functions, whose edges represent function dependencies, and whose edge weights represent synchronous delays introduced by registers. The sequential depth of the network, , is the weight of the longest path. A synchronous logic network can be transformed, as shown in Fig. 5 , into a combinational function of delayed input variables with delay less than or equal to .
Given a synchronous network with depth , the equivalent combinational function is . Note that is finite due to the restriction that the circuit does not have feedback. A polynomial representation for can now be determined from . The order of is determined with respect to each for as independent variables and the coefficients of the polynomial representation are determined. In the example of Fig. 5 , this would result in the polynomial representation .
VI. SYNCHRONOUS CYCLIC CIRCUITS
The method for determining polynomial representations for sequential acyclic circuits relied on the acyclic nature of the circuit to guarantee that a finite number of time-shifted inputs were required. However, by breaking the feedback path of a cyclic circuit , the previous techniques can be used to derive the order of the cyclic circuit. This is achieved by introducing an input , and determining the order of with respect to and .
A synchronous cyclic circuit can be modeled as a Mealy/Moore finite state machine (FSM) that may or may not have an initial state. For example, a rasterizer is a synchronous cyclic circuit with an initial state and an infinite impulse response filter is a synchronous cyclic circuit with no initial state. For the sake of this analysis, we consider three different topologies of synchronous cyclic circuits: 1) an FSM with no initial state; 2) an FSM with an initial state that does not reach a steady state; and 3) an FSM with an initial state that reaches a steady state after a finite number of cycles. As shown in Fig. 6 , we can represent each of these topologies as a function that may have up to three branches: a branch corresponding to an initialization state , a branch corresponding to the transient states , and a branch corresponding to a steady state (labeled ). The techniques described in Section IV enable automatic detection of each of these branches. However, this is beyond the scope of this article. The succeeding discussion assumes that the presence of each of these branches has been detected and the polynomial representation has been determined.
Using the techniques described previously, we can compute a polynomial representation for each branch. An initialization branch has a polynomial representation that contains no terms with the variable . A steady-state branch has the polynomial representation
. If the function contains no initialization branch or no steady state branch [topology 1) or 2)], then no polynomial representation exists. However, the circuit is uniquely represented by the polynomial . In the case of topology 1), is simply . In the case of topology 2), is comprised of two domains (corresponding to and in Fig. 6 ), and is within the first domain and within the second domain. Example 6.1 illustrates computation of a polynomial representation for FSM with topology 2).
Example 6.1: Consider the finite state machine with a one bit input (initialize) and a three bit output initialize enableA enableB enableC that provides round-robin access to memory for three clients. Breaking the feedback loops yields the function initialize . Performing order computation results in the detection of four branches, each of which is order zero (i.e., constant). For example, in the branch that is executed under the condition initialize , the output initialize enableA enableB enableC . Thus, the polynomial representation for this branch is . Coefficient computation for each branch yields the following order zero polynomial representations for as shown in Table I . An initialization branch exists, but no steady-state branch exists, thus uniquely represents the finite state machine (although other finite state machines exist that perform the same operation with different state encodings).
The remainder of this analysis focuses on circuits for which is not a unique representation, i.e., those circuits that contain both an initialization state and steady state [topology 3)].
A. Order Computation With Feedback
Assume function implements three branches, one initialization branch , one steady state branch, and one transient feedback branch . We assume that a signal controls the number of iterations through the transient feedback path. We can then evaluate the circuit based on the number of iterations of the transient feedback branch.
The order of with respect to , referred to as , can be determined using the techniques presented in earlier sections. As a result, the polynomial representation of this branch, , can be determined. Furthermore, if is treated as an input to , then the order of with respect to , referred to as , and with respect to , referred to as , can also be determined. As a result, the polynomial representation of this branch can be determined. After initialization, the order of is and after the first iteration of the nonsteady-state feedback branch, the order of is less than and greater than . In general, if the order of is after iterations, then the order of , after one more iteration of the nonsteady-state feedback branch is less than and greater than . Thus, the upper bound on the order of after iterations is:
To determine the order of there are three cases that follow, which need to be considered: 1) is known; 2) is not known, , and there is no term in ; 3) is not known, and [ or there is an term in ].
In case 1), the order of can be bounded according to the equation above. In case 2), since there is no term in , the order of does not increase on successive iterations and is simply the greater of and . For both of these cases, since the order of is bounded, a polynomial representation exists for
. If the upper bound on the order is , this representation can be determined by extracting points from the circuit to create the system of linear equations that determine the polynomial coefficients. In case 3), the order of is dependent on and is therefore unbounded and has no polynomial representation. However, like the cyclic circuits with no initialization or steady state branch, the polynomial representation uniquely specifies the functionality of the circuit, and can be used to perform matching as shown in Section VIII-C. order of with respect to is the greater of and , both of which are one. Since the feedback polynomial is of order one with respect to and contains no term, case 2) is also satisfied for this polynomial and the order of with respect to is the greater of and , which are one and zero respectively. Thus, is of order one with respect to both inputs (i.e., and ), requiring points to be extracted from the circuit. The points can be extracted, yielding the following system of equations:
The solution to the system of equations yields the polynomial representation .
VII. APPROXIMATIONS
Polynomial representations are an efficient way to encapsulate the functionality of arithmetic circuits. Furthermore, circuits that implement nonarithmetic operations can be modeled efficiently by determining subdomains over which the circuit implements functionality that has a low-order polynomial representation, as shown in Section IV. However, this representation becomes very complex when the number of subdomains is large. For example, circuits that approximate arithmetic functions frequently generate many subdomains.
Example 7.1: Consider a circuit that implements , where is an bit word, requires subdomains (Fig. 7) in its polynomial representation. Rather than represent as a list of subdomains of and corresponding polynomials that describe exactly over those subdomains, it is much more efficient to represent as the polynomial and specify the maximum error between the continuous function and the exact polynomial representation . Given a Boolean function , with corresponding integer values ( ), an approximate polynomial representation can be determined. The approximate polynomial representation is determined such that for all , where is a given accuracy. Approximation allows a low-order polynomial representation to be generated for a Boolean function that would otherwise have a polynomial representation of high order. Sections VII-A and VII-B derive in detail the approximate polynomial representation and the tolerance within which the approximation is accurate.
A. Computing Approximations
As proven in Theorem 3.1, the order of a function is reduced by one by computing the difference . The algorithms to this point have relied on the resulting fact that if the order of is , then recursively performing this difference times will reduce the function to zero. Now we relax the requirement that be exactly zero. If performing this difference times results in a function that is not zero, but is numerically close to zero, then the polynomial representation of can be approximated well by a polynomial of degree .
To translate this to approximating a Boolean function with a polynomial, again consider the function . If the most significant bits of are one, then for the two's complement integer encoding of , Encode , the inequality holds. Similarly, if the most significant bits of Decode (performed using two complement arithmetic) are 1, then the inequality holds. As, a result, if is defined to be and is defined to be Decode , then the following statement holds: if the upper bits of the bit wise or of and are one, then . The bound on , allows us to derive an approximation of Fig. 7) . However, the first difference iteration reveals that the upper seven bits of are one, yielding the bound . Therefore, can be approximated by the first-order polynomial Encode .
B. Computing Approximation Error for the Linear Approximation
In this section, we will compute a bound on the accuracy of a linear approximation to the polynomial representation . The maximum value of the computation , where bits of are zero, yields the maximum error contributed by bit of the input. Thus, the sum of the maximum values of each of the above equations provides the maximum error contributed by all bits, which is a bound on the error of the approximation. Thus, a bound on the accuracy of the linear approximation is The error contributed by when is Encode . This is always negative because the most significant bit of is one when : Encode units. The error contributed by , when , is Encode . This is always positive because the most significant bit of is zero when : Encode units. Similarly, other differences contribute only positive error. Other differences contribute a total of 0.5 units of positive error, resulting in the error bound:
. Thus, the circuit implements the polynomial within 0.5 units. This approximate representation is far less complex than the 64 polynomials that would be required to represent the circuit exactly.
C. Nonlinear Approximations
A function may implement a nonlinear operation [e.g., ] that is well approximated by a nonlinear polynomial representation [e.g.,
]. In this case, the first iteration of may not satisfy the condition . If a suitable bound is found for the th iteration of , termed , instead of the first iteration, then a nonlinear approximation for can be computed, using , from Newton's forward difference interpolating formula
VIII. MATCHING
Consider a circuit specification that defines the functionality of a circuit. Given a library of existing components, where each component is described by a Boolean function , polynomial representations provide a means for quantifying the difference between the specification and a potential implementation . This can be achieved by computing the polynomial , where is the polynomial representation of within an accuracy of , and using traditional numerical methods to find the maximum value of . In quantifying the maximum error of an implementation and guaranteeing that is within a given tolerance, system traits such as performance, power and area can be optimized by selecting faster or smaller designs that implement less accurate arithmetic.
Example 8.1: Consider the specification for an 8-bit 3 3 sharpening filter used for processing grayscale images Consider an implementation with the following approximate polynomial representation: For , grayscale units. This implementation yields a sharpening filter that yields an image that is of similar quality to that specified, but likely smaller and faster than an exact implementation.
A. Transcendental Specifications
A means of approximating a specification for transcendental functions can be derived from the results of Taylor series approximation. Given a specification , with Taylor series , the difference between and is where . Thus, if the error in a Taylor series approximation to a function can be bounded, then the difference between an implementation that matches that approximation and the specification can be bounded.
Example 8.1.1: An implementation that is determined to be of order four and yields the polynomial representation matches the cosine function used in DCT with an error over the interval [0, 1].
B. Composition
The ease with which polynomials can be composed, using traditional algebraic manipulations, can allow seemingly inappropriate implementations to be combined to fulfill a specification. , and shifter exist in the implementation library, can be allocated and composed with the adder to approximate the The polynomial representation that results from this composition is and matches the specification derived in Example 8.2 within 1.3%.
C. Cyclic Circuits
As discussed in Section VI, when the order of a circuit with feedback can be bounded, a polynomial representation for that circuit can be determined exactly and the matching techniques described above can be used. Given a specification with bounded order and a cyclic component with unbounded order, the inequality can be solved for [where is the order of the initialization branch , is the order of feedback branch with respect to , is the order of the feedback branch with respect to the feedback input, and is the number of times the feedback branch is executed]. and . An example of this is shown in Section X-B.
IX. COMPLEXITY ISSUES
The order computation techniques described above are of quadratic complexity with respect to the size of the BDD representation of and output word length. Solving the set of linear equations for polynomial coefficients is of cubic complexity with respect to the order of the polynomial and we assume this order is small (less than the discontinuity threshold). However, the underlying BDD data structure can be of exponential complexity for common functions. Thus, reducing the complexity of polynomial computation requires reducing the complexity of the order computation, which, in turn, requires reduction of the complexity of the BDD.
Assume a function has an BDD with intermediate nodes, where is an bit word. If is partitioned into two words ( ) and ( ), the BDDs that describe each partition will require no more than two sets of intermediate nodes. Similarly, partitioning into words will result in a worst-case total node count of . Minimizing with respect to yields Partitioning into words of length will minimize BDD complexity. This will result in overall BDD complexity of
. Furthermore, such a partitioning will guarantee that the order of the polynomial representation for a component is less 2 . For those circuits implementing functions of order greater than 2 , a polynomial representation will be determined through domain partitioning and approximation, as explained in Sections IV and VII. In practice, very few circuits implement functions of order greater than 2 .
X. APPLICATIONS
To illustrate the application of polynomial methods, two applications are synthesized. A JPEG Encode block is first synthesized to demonstrate order computation and discontinuity detection. An IIR filter is then mapped to an existing filter to demonstrate synthesis with synchronous library elements and approximation. 
A. JPEG Encode Application
Generating polynomial descriptions allows a specification and implementation to be compared by computing the numerical difference between the polynomials. Consider the dc path for the JPEG encode system described in Fig. 1 and specified in more detail in Fig. 8 . The inputs describe grayscale values for an pixel block and output dc represents the encoded dc value for that pixel block. Specifications for four system blocks are described: 1) DCT; 2) quantize; 3) coefficient coding; and 4) dc coding. Three library elements were generated by synthesizing the Verilog code shown in Fig. 9 . Polynomial representations were computed from the resulting netlists. The first component requires that an order computation be performed for each input. The order of element with respect to each input is determined to be one and, after coefficient computation, the polynomial representation is
The order of element block is similarly determined to be one with respect to and and the resulting polynomial representation is Order computation for element yields an order greater than the discontinuity threshold of four. As a result, the upper bits of the inputs to each block are successively set to zero and one, as described in Section IV, and the following partitions and corresponding polynomial representations are determined as shown in Table III .
Performing a numerical comparison between the specification for DCT and , the specification for quantization and and the specification for coding and reveals an exact match for each ( ). Thus, the specification can be implemented by composing the complex components that exist in the library.
B. IIR Filter Application
Many embedded applications require digital filters to control mechanical operations. Common examples include altitude control systems for satellites, yaw dampers in airplanes, and fuel injection controllers in automobiles. We will apply polynomial methods to determine an existing filter from a library of filters suitable for reuse in a tape drive controller (Fig. 10) . The velocity of the tape with the tape drive is controlled by a voltage applied to the reel motor. This voltage is a function of past velocities and, therefore, past voltages, as well as the displacement Fig. 10 . Digital filter used as a compensator for controlling the move of a tape through a tape drive. Fig. 11 . Circuit description for library element to be compared to tape controller specification. required to position the tape properly. An existing circuit implementation within the library of filters is shown in Fig. 11 , with combinational blocks already described by polynomials. The challenge is to determine if the circuit can be allocated to implement the following specification, generated from MATLAB:
The first step in generating a polynomial representation for the circuit described in Fig. 11 is to break the feedback paths. This results in replacing in the list of equations and being added to the list of inputs. The next step in generating a polynomial representation requires generation of the equivalent combinational circuit. Progressing down directed acyclic graph that represents , the first rooted subgraph represents the assignment . This subgraph is duplicated, generating an additional circuit input , and the original subgraph is removed. Subsequently, the rooted subgraph ending with is duplicated, generating an additional circuit input and the original subgraph corresponding to the assignment to is removed. Continuing this process, the equivalent combinational circuit is generated, resulting in a circuit with the following inputs:
, .
The nodes in the original graph that represented assignments to each of were removed as they have been replaced by . The complete set of resulting equations is At this point, the circuit description has no feedback paths and no registers.
Order computation with respect to each of results in an order of one for each input. However, the order of the circuit with respect to each of is very large, indicating that a representation of an approximation of this circuit will be more efficient. Computation of reveals that . A similar result is determined for . Thus, the term that each of contributes to the polynomial representation of the circuit can be represented by an approximation of order one, of the form Encode . Following the error quantification steps outlined in Section VII, the bound on the error contributed by approximating each term of the polynomial that contains one of is . After performing coefficient computation, the following polynomial representation for the circuit is determined After closing the loop by setting , the specification and implementation can be compared by comparing their representative polynomials. The coefficients of and do not match exactly due to the approximation of , but are the same within 10 . Thus, the existing component can be allocated to implement the specification if the circuit tolerance of 10 is acceptable.
XI. EXPERIMENTAL RESULTS
To quantify the performance of order computation, a combinational multiplier with input lengths ranging from 4 to 64 bits, was constructed out of combinational 4-bit multipliers, and the polynomial representation determined. Multiplier logic was synthesized from Verilog to construct the Boolean equations that implement the Synopsys DesignWare multiplier. These equations were then ported to the Cal-2.0 BDD package which was used to perform BDD operations. Experiments were performed on a 200 MHz R4400 Indy Workstation with 256 MB of memory. The time required to determine the order of this circuit is shown in Fig. 12(a) and, for the 64-bit multiplier, the order was computed in under 80 s. Note that by using the complexity reduction methods from Section IX, order computation was performed on successive 4-bit chunks of each input word. This yielded a maximum BDD size of 61 nodes which fit completely in the 16 KB cache.
As expected, execution time varied with the square of the size of the input word. This is due to the function being of order one with respect to each input and having two inputs. Note that a similar computation for a function with polynomial representation would have been of linear complexity with respect to the size of and a more complex function such as that with polynomial representation would have varied with the fourth power of the size of the input word.
To quantify the performance of polynomial methods for synchronous circuits, experiments were conducted, to gauge the relationship between the execution time required to generate equivalent combinational circuits and the number of registers [ Fig. 12(c)] . The circuits on which this was performed were 16-bit accumulators with between one and five register stages [i.e., ]. Execution time varied quadratically with the number of registers. Note that the register removal tool is written in Perl and the execution times in Fig. 12(c) can be reduced greatly using compiled code.
Further experiments were conducted to determine the execution time of circuit approximation relative to input bit width. Polynomial approximations were computed for the circuit that implements the function for input bit widths ranging from 4 to 128 bits [ Fig. 12(d) ]. While of high order complexity, approximations completed quickly, even for the widest datapaths. The accuracy of circuit approximation was determined for several circuits of bit width 16 [ Fig. 12(e) ], all of which resulted in an error of less than two units over the integer range [0, ]. These experiments were performed with compiled code.
XII. SUMMARY
In performing high-level synthesis with complex components, automating component matching requires a means for quickly determining whether an existing block performs the function outlined in the specification. Current methods for completing this task become prohibitively memory intensive or time consuming for circuits that implement complex functions. We have demonstrated an algorithm for performing component matching with complex library elements by constructing word-level polynomial representations for combinational and sequential circuits.
Circuit specifications can be efficiently matched to existing implementations by generating the unique minimum-order polynomial functions for the specification and the implementation and comparing those polynomials. These functions can be generated with quadratic complexity with respect to the number of input bits to each function. Discontinuities in the specification or implementation can be detected, allowing polynomial representations to be computed for intervals between discontinuities. For sequential circuits, the equivalent combinational circuit can be derived, from which a polynomial representation can be computed. Furthermore, an approximate polynomial representation can be derived for those circuits that contain many discontinuities and the error of that approximation can be quantified. Applications of these techniques were demonstrated in mapping the specification of a JPEG Encode block and an IIR filter to existing complex blocks.
Using polynomial representations, differences between a specification and implementation can be quantified, allowing tradeoffs between precision and speed. In addition, the ease with which polynomials can be composed can allow such differences to be compensated for by combining multiple existing blocks or constructing logic around a single block.
The methods presented in this paper are well suited to matching blocks that have compact arithmetic representations, such as those found in DSP, computer graphics, and ALUs. Furthermore, these methods provide a means for separating control operations, such a branches, from arithmetic operations and detecting blocks that contain many discontinuities such as controllers, based on the order of the polynomial representation.
