Programmable Logic Array (PLA) adders are described which petform an addition in one cycle with a single pass through a PLA and require a reasonable number ofproduct terms for an 8-, 16-, or 
Introduction
Programmable Logic Arrays, PLAs [ 1, 21, have been successfully applied to the design of control logic and simple functions such as counters, small adders, etc. Large adders have usually been implemented on standard PLAs iteratively, a few bits per cycle. With previous methodology, the implementation of a large width adder in one cycle with a single pass through a PLA has generally required too many product terms to be economical. The number of product terms in the AND array is a measure of one of the dimensions of a PLA and is directly related to the silicon area on a chip as well as the signal delay through the PLA. This paper describes one-cycle adder designs for standard PLAs as well as for PLAs dedicated to adders. The standard PLA adder is an improved version of one described elsewhere by the author [3] . These designs reduce the number of product terms to acceptable levels even for 16-and 32-bit adders. Two features of standard PLA designs are particularly useful in reducing the number of logical product terms. These are: methods, and a procedure is developed to optimize string sizes to additionally reduce the number of product terms.
The AND array which contains the unique product terms can be further reduced through the sharing of a row
164
ARNOLD WEINBERGER of the array by two product terms. Similarly, the OR array, which generates logical sums of product terms, can also be reduced through the sharing of a column of the array by two sums of products. The split rows and columns are particularly effective in dedicated PLA adders, although they may also be useful for other logical functions using PLAs.
Standard PLAs
A PLA consists basically of an AND array and an OR array in series, as shown in Fig. 1 . The array names, AND-OR, describe the generic logic levels of the SEARCH-READ arrays of an associative table [4] . The two arrays may be implemented with types of logic other than AND-OR; a widely-used logic, implemented with MOS technology, is
NOR-NOR.
The generic AND (SEARCH) array produces an array of product terms of the inputs to the PLA. Each product term is the AND of functions of the individual inputs,
A , B , C , . . ., as in Eq. (1):
Product term = f l(A) . f 2 ( B ) . . . . . (1) Each input enters the AND in one of three states: true, complement, or don't care. The true and complement lines of each input intersect the AND array at two connections which are personalized for one of the three states. The personalization (illustrated for the input A ) is of the forms shown in Fig. 2 (a) when the generic AND (SEARCH word) is implemented with a real AND, and is of the form shown in Fig. 2 (b) when implemented with a NOR. It should be noted that only one connection at most is made at the intersections of the AND array with the true and complement lines of an input. For the don't care state, no connections are made. To personalize the two connections it is sufficient to provide a single switching device to which either the true, the complement, or neither line is connected.
The generic OR (READ) array produces a generic OR of selected product terms on each array output. The array is personalized with a single bit at each intersection of a product term with an output line. A 1 selects the product term, a 0 does not. Each array output is the real OR of selected product terms if the array is comprised of real ORs, as in Fig. 3 . If the array is comprised of NORS, each output is the NOR of selected product terms.
The personalized two-bit cell at the intersection of a product term with the one-bit decoder outputs corresponds to selecting the subset of minterms (maxterms) to comprise the desired function. Figures 4(a) and (b) illustrate the four possible functions of input A , of which three are used. Figure 4(a) shows the possible products of maxterms, each maxterm included or not according to the function to be personalized. A 1 is oRed with the maxterm if it is not included in the function, while a 0 is oRed if it is included. Similarly, Fig. 4(b) shows the possible sums of minterms, each minterm included according to the function to be personalized. A 1 is ANDed with the minterm if included, a 0 if not.
The number of product terms can be significantly reduced by substituting two-bit decoders for a pair of onebit decoders [ 5 ] . The total number of decoder outputs remains the same. The product term now represents the AND of functions of pairs of inputs, as in Eq. Two-input decoders have already been applied to a standard PLA [ 2 ] and will be shown to be particularly useful for adders.
Another economizing PLA feature is the use of XOR outputs [7], where pairs of OR array outputs are xoRed to produce a single PLA output. Figure 6 shows the PLA expanded to include two-input decoders and XOR outputs.
Adders
A typical adder adds two n-bit numbers, A(A,, . . ., An-J and B(B,, . . ., Bn-J together with an input carry Ci, to produce a sum S(S,, . . ., Sn-J and an output carry Gout (CJ. Using the single-bit-position functions,
a carry Ci from any bit position i can be expressed directly in terms of these functions and Cin, as in Eqs. (4) and (5) where Z and II are symbols for OR and AND, respectively, H* means either H or P may be used, H** means either H or G may be used, Gn = Ci, = C n , and P, = Also, a sum bit can be expressed as a function of the output carry from the preceding bit position and expanded into an XOR of two entities, one of which includes a distant carry, as in Eqs. (6) and (7):
where G;+l = carry-generate condition for bit group i + 1 through j (high-to-low order, i 5 In a similar fashion, the output carry can be expressed as an XOR of two entities, one of which includes a distant carry, as shown in Eqs. C,,, = Gi + Hi ' Cj+, = Go V H i . Cj+, Equations (6) through (9) can also be expressed as functions of the distant carry of opposite polarity. The selected forms of the equations provide more opportunities for sharing product terms.
PLA adder designs
The adder equations can now be applied to the PLA of Fig. 6 .
Addend and augend of the same bit position, Ai and Bi, enter a common decoder, so that the intersection of an AND with the decoder outputs can produce one of the six useful adder functions of Ai and Bi, i.e., Gi, Pi, Hi, or their complements. The input carry Ci, enters as the sole input to a decoder. (For uniformity, a two-input decoder is provided for C,, with one input unused.)
A string of K contiguous sum bits is generated as a function of a common carry into the string, using Eqs. (6) and (7). Positive and negative strings of sum bits are shown in Eqs. (10) and ( ] I ) , respectively: 
Note that H : + l = H i + l + . . . + H i of the bracket to the right of the Exclusive-OR is actually implemented with product terms already present in the left brackets of the string of sums. The reader can verify that the different representations of Eq. (12) are equivalent:
The common carry shared by the sum bits of a string is expressed as a sum of product terms according to Eq. (4)
or ( 5 ) and is generated in the AND array. Clearly, if the sum bits are grouped into few but large strings, few such common carries, and hence few product terms for these carries, would be needed. On the other hand, the number of product terms needed for a sum bit in a string increases with the distance of the sum bit from the common carry. Therefore, the total number of product terms needed for the adder is minimized by choosing an optimal grouping of sum bits to strings.
Three string types are identified: low-order, intermediate, and high-order.
A low-order string includes a product term representing the input carry Ci, or Ci,, the low-order sum bits implemented according to Eq. (IO) or (1 l), and the product terms representing the output carry of the string according to Eq. Therefore, it is advantageous to use the same polarity output carry from the string as the sum bits. Since the sum bits are a function of the opposite polarity input carry to the string, it is also advantageous to alternate polarities of strings. It should also be noted that when sharing product terms between Si and C, (or si and E,), the common factor Hi must be used and Pi (or G i ) cannot be substituted for it, i.e., H*i (or H*T) does not apply.
The number of unique product terms needed for a loworder string of K sum bits and its output carry is: 1 for the input carry, 1 + 2 + 4 + . . . + 2(K -I) for the sum bits (noting that some product terms are shared, e.g., Rj), and 2 for the additional unique (non-shared) product terms C )R array contained in the output carry of the string. Equation (13) expresses T,ow, the number of unique product terms of the low-order string:
T,,w = 3 for K = 1,
For K = 1, the low-order sum is generated more efficiently according to Eq. (14) or (15):
together with the opposite polarity output carry of this string, Cn-, = P n P l + H,-, . tin, or C,_, = G,-l + Hnpl .
C,,, respectively. The two product terms of SnP1 (or sn-,) and the additional unique product term for Cn-* (or Cn-J add up to three unique product terms for a low-order string of one. If a low-order string of one is used, the next string is of the same polarity as the low-order sum in order to make use of the opposite polarity output carry of the low-order string.
An intermediate string uses the product terms of the output carry of the preceding string to generate the sum bits according to Eq. (10) or (1 1). It also generates the output carry of the string according to Eq. (4) or ( 9 , respectively.
The number of unique product terms for an intermediate string, Ti, of size K > 1 is one less than for a loworder string because the input carry to the intermediate string has already been counted as part of the preceding string. For K = 1 they are equal. However, the output carry of the string has additional product terms equal to L , the number of bit positions of lower order than the string.
A high-order string generates the high-order sum bits as for an intermediate string. However, the output carry of the string, C,, is needed only as an output of the adder, Gout, so that it can be generated according to Eq. The number of unique product terms for the high-order string, Thigh, is L + 1 less than for an intermediate string,
since the output carry is a function of the input carry to the string:
Figure 7(a) illustrates an eight-bit adder that generates the outputs in a one-cycle pass through the PLA. The out-169 Table 2 Illustration of procedure for optimal string assignment. + above numbers marks strings to be increased by one put sum bits are divided into three strings of 3, 3, and 2 bits, high-to-low order. The strings have been optimized to further reduce the total number of product terms to 25. An entry in the AND array is noted with a function of the decoder inputs, i.e., Gi = A i . Bi, etc. These functions can be readily converted to personalized four-bit cells by means of Fig. 5. Figure 7 (b) expresses the eight-bit adder in equation form to correspond to the PLA format used.
Optimization
An optimum string size is determined by minimizing the total number of product terms ( T ) averaged over the string size ( K ) . We begin with the low-order string and proceed toward the higher-order strings.
An optimum low-order string is either one or two bits long, since
For an intermediate string, the minimum number of product terms averaged over the string size,
is a function of L , the number of bits of lower order than the string. Successive (higher-order) intermediate strings
170
ARNOLD WEINBERGER should therefore be increasing monotonically. We determine the transition value of L , L,, for which string sizes K and K + 1 are equally efficient, i.e.,
For K = 1, L, is negative, which means that an intermediate string size of two is always more efficient than a string of one. Table 1 
shows that a pair of equal intermediate string sizes (two K -1 sizes) are followed by a pair of next larger size (two K sizes) for optimum assignment of intermediate string sizes. In other words, after a low-order string of one is arbitrarily selected and followed by an intermediate string of two, pairs of next higher string sizes follow (pairs of threes, pairs of fours, etc.).
An optimum high-order string is determined in relation to the other strings. First we note that if the high-order string is greater than (or smaller than) the adjacent intermediate string by two or more, the combined number of product terms for the two strings can be reduced by reducing (or increasing) the high-order string by one and increasing (or reducing) the adjacent string by one. This leads to the following empirical procedure for assigning string sizes: We begin with a low-order string of one (the smaller of the two optimal sizes), followed by a single string of two and pairs of strings of three, four, etc. If the bit positions of the adder are exhausted when the high-order string is equal to or one greater than the adjacent string, the first-pass string assignment is final. If the high-order string is less than the adjacent string, the latter becomes the new high-order string and the former highorder string is deemed a remainder to be absorbed by the intermediate strings as follows: First, the low-order string of one is increased to two, the next string of two is incrzased to three, the higher-order of the two strings of Table 3 Number of product terms for (a) eight-bit adder, (b) 16-bit adder, and (c) 32-bit adder, using a conventional PLA. K = string size, L = number of lower-order bit positions, and T = number of product terms. three is increased to four, the higher-order of the next pair of intermediate strings is increased by one, etc., until the remainder is exhausted. Table 2 illustrates the above procedure for assigning strings to achieve a minimum number of product terms. The assignment is not necessarily unique. For some adder sizes a different assignment can achieve the same minimum. For example, the eight-bit adder of Fig. 7 can also be implemented with 25 product terms using string sizes 2, 3, 2, and 1, high-to-low order. Table 3 illustrates the relevant parameters for eight-bit, 16-bit, and 32-bit adders, using 25, 68, and 195 product terms, respectively.
Bit position

Decoders with more than two inputs
Additional reduction in the number of product terms for an adder may be obtained using four-input or higher-input decoders while preserving the generality of use of the PLA. A product term may now be defined as the AND of functions of input groups, with an input group comprising the inputs of a decoder. With standard decoders, however, this results in a wider AND array and more costly decoding. For example, a four-input decoder replacing two two-input decoders doubles the number of decoder outputs from 8 to 16, an eight-input decoder replacing four two-input decoders increases the number of decoder outputs from 16 to 256, etc. In the limit, a single decoder accepting all adder inputs becomes a conventional ROM decoder, while each product term can represent any function of the decoder inputs without the need of an OR ar- which generates seven elementary symmetric functions Figure 8 compares two-input and four-input special representing the combined input values ranging from 0 to decoders showing the generated outputs. It is noted that 6. Any adder function of the four inputs can be generated replacing a pair of two-input special decoders with one from a combination of the seven decoder outputs. By four-input special decoder increases the number of decontrast, a conventional two-input decoder assumes relacoder outputs from six to seven. By contrast, with contive input weights of 8, 4, 2, and I , requiring 16 outputs ventional decoders, the number of outputs doubles-from ranging in value from 0 to 15. eight to 16. The width of the AND array can be further reduced by customizing each decoder to produce only those functions that the product terms require, particularly for decoders with a large number of inputs. For example, an eight-input special decoder which accepts four adjacent pairs of adder inputs of relative weights 8, 8, 4, 4, 2, 2, 1, and 1 , produces 31 elementary symmetric functions representing weights 0 through 30. However, the number of different functions of these inputs actually needed by the product terms of a 32-bit adder varies from six to ten. In other words, the width of the AND array is actually less for eight-input custom decoders than for decoders with fewer inputs. At the same time, the number of product terms is also reduced. The reduction of the AND array in both dimensions results in a set of more complex functions produced by the custom decoders. A custom decoder is particularly useful for the low-order inputs with which the input carry may be combined in one decoder.
Adders using four-and five-input decoders
The 16-bit adder defined in Figs. 9(a) and (b) will be used to demonstrate the effect of using four-and five-input decoders for adder designs. A five-input custom decoder is used for the inputs comprising the input carry, Ci,, and the two pairs of inputs to the low-order bit positions 14 and 15. The remaining decoders accept four inputs each, comprising pairs of inputs of adjacent bit positions.
Adder outputs are again grouped in strings of contiguous sum bits. The low-order string includes the two positive low-order sum bits, SI5 and S,4. They exit di-173
NORs
Figure 10 Personalization of AND array functions controlled by a four-input symmetric function generator.
rectly from the custom five-input decoder, together with the output carry from the string, C,,, which enters the AND array to help generate the carries C,,, C H , and C,. The general equations for the sums and the carries can be derived in a manner similar to those for the adder using a PLA with two-input decoders. A few differences are noted:
1 . A product term is expressed as the AND of functions of the new decoders. An entry in the AND array is a function of the respective decoder inputs. The four-input decoders may still be conventional, with an entry in the AND array readily converted to a personalized 16-bit cell. The conversion follows from an extension of Fig. 5 to a four-bit decoder. However, when special decoders are used, the conversion of an entry in the AND array to a personalized cell of fewer than 16 bits requires different rules. 2 . The double asterisk attached to the strict propagate function, HS+'"*, means that may be used as don't-care conditions; e.g., G: " may be substituted
174
for HE+'. This simplifies personalization and may also reduce decoder outputs, as will be subsequentlv demonstrated. This principle was applied earlier in simpler form to single-bit propagate functions, where P or G was substituted for H , and is extendable to multi-bit propagate functions. 3. It can be noted in Fig. 9(b) that the left bracket of an equation for a high-order sum of a string, e.g., S,, cannot share product terms with the carry from the string, either c, or C,. Therefore, C, is arbitrarily selected to produce successive sum outputs of the same polarity. This is in contrast to Fig. 7(a) , where such product term sharing requires alternating polarities of strings.
To enable this kind of product term sharing, the left bracket of the high-order sum of a string, such as S,, would be expressed as (H, An empirical procedure for optimally assigning string sizes, similar to one described earlier, results in the following number of product terms (and string sizes): for an 8-bit adder, 13 product terms (string sizes 4, 2, and 2); for a 16-bit adder, 35 product terms (string sizes 4, 4, 4, 2, and 2); and for a 32-bit adder, 99 product terms (string sizes 6, 6, 6, 4, 4, 4, and 2 ) . Figure 10 illustrates the bit personalization for the various functions of a four-bit special decoder. The decoder is an elementary symmetric function generator producing positive outputs and driving an AND array consisting of NORS. Note that a maximum of only six switching devices needs to be provided for personalizing a function because the function requiring all seven columns to be connected is never used. It is assumed that a switching device is located between two adjacent columns and can be shared between the two columns. Therefore, six devices can be shared by the seven columns, with each device connected to its left column (connection pointing left), its right column (connection pointing right), or neither column (no connection shown). No devices need be provided between adjacent sets of columns. Also note that an elementary symmetric function is not connected if it is included in the desired function, corresponding to the rule for a conventional decoder with positive outputs driving an AND array consisting of NORS. If the AND array is implemented with ANDS, the decoder should produce complement outputs.
Other expressions may be substituted for some of those in the AND array of Fig. 9 to reduce the number of device connections. For example, the complement of the inclusive two-bit propagate function G: " may be substituted for the strict propagate function H:" without affecting the outputs of the adder. The substitution reduces the maximum number of connections in Fig. 10 from six to four. Rearranging the outputs of the decoder permits reducing the number of devices that need to be provided, even assuming that a device can be shared only between its two adjacent columns. As shown more explicitly in Fig. 11 , only four devices are needed for bit positions 10 and 1 1 , and five devices for bit positions 8 and 9, to personalize the respective functions.
The five-input custom decoder for inputs A,,, B,,, A,,, B,,, and Cin produces the two low-order sum bits directly, as well as the carry C,, driving the AND array, as shown in Fig. 12 . The positive C,, is intended for the NOR implementation of the AND array in Fig. 9 where c,, is needed for several product terms. If the AND array is implemented with ANDS, the custom decoder should generate The width of the AND array reduces to 50 columns using the special decoders consisting of seven elementary symmetric function generators with seven columns each and the custom decoder with one column for the AND array. If custom decoders replace the elementary symmetric function generators, the width of the AND array is further reduced. Moreover, still fewer devices are needed and only one device connection is made at the intersection of an AND array row with the outputs of a custom decoder. For example, Fig. 13 shows the custom decoder outputs for bit positions 12 and 13 of the 16-bit adder of Fig. 9 , as well as the AND array personalization for the five unique functions the decoder must provide. Again, the decoder generates complement functions to drive a NOR implementation of the AND array. Based on the number of unique functions needed, the total width of the AND array of the 16-bit adder is reduced to 37.
The 16-bit dedicated PLA adder can be further compressed horizontally and vertically using schemes which eliminate array sections of unconnected devices [8] . It should be noted in Fig. 9(a) that the arrays are rather sparsely populated with entries (representing connected devices). For example, the first row contains entries only in the columns of the low-order decoder, in the AND array, and of the sum bits SI, and SI,, in the OR array. A compressed 16-bit adder is illustrated in Fig. 14 . First, the OR array is split into a left and a right part to permit an AND array row to be shared by two product terms. The left and right product terms sharing a row are shown separated with a heavy vertical line.
Second, OR array col-176 umns are also shared between pairs of outputs, the split in Adders using decoders with larger number of inputs Using custom decoders, it is possible to continue the trade-off between decoder complexity and array size. For example, with four adder bit position inputs to a decoder, custom decoders of eight and nine inputs may be used. The nine-input decoder would be assigned to the low-order four-bit positions plus the input carry Gin. The decoder would generate the low-order four sum bits directly as well as the signal representing the carry out of the decoder inputs to drive the AND array.
When optimum string sizes are used, the number of product terms (and string sizes) needed for an eight-bit, 16-bit, and 32-bit adder is six (string sizes 4 and 4), 19 (string sizes 4,4,4, and 4) and 54 (string sizes 8 , 8, 4 , 4 , 4 , and 4), respectively.
If carried to the limit in which all inputs to the adder enter a single custom decoder, the "decoder" becomes a custom designed adder without the need of arrays.
Summary and conclusions
It has been demonstrated that one-cycle addition of a wide data path can be effectively implemented with one pass through a PLA. Effectiveness is measured in the number of product terms needed, since that number relates to the chip area required by a PLA as well as to the delay through the PLA AND and OR arrays. The adder is designed to take advantage of two-bit input decoders and Exclusive-OR outputs-two features which can presently be incorporated in a standard PLA.
Adder equations with carry-look-ahead have been adapted to the PLA features to use product terms sparingly and to maximize sharing of product terms among different functions of product terms. For example, a string of contiguous sum bits is expressed using a common carry of one polarity so that the product terms representing the carry are shared by the several sum bits. The development of a procedure that determines the optimum string sizes into which the adder sum bits are grouped to minimize the total number of product terms has also been demonstrated. A standard PLA will normally implement a number of functions, one of which may be an adder. With LSI, PLAs will increasingly be used as macros on a chip, tailored to specific functional needs. If a PLA is dedicated to an adder, further efficiencies can be gained. Input decoders with more than two inputs can further reduce the number of product terms needed. At the same time, the width of the AND array of a PLA, the dimension which measures the number of decoder outputs, can be reduced by substituting special decoders to produce functions relevant to addition. As a result, both the height and the width of a PLA adder can be significantly reduced.
A dedicated PLA adder can be further compressed in size by splitting the OR array of the PLA into two parts with the single AND array between them. Many of the AND array rows, which normally contain a single product term, can thus be shared between two product terms. Also, an OR array column can be split to contain two sums of product terms, instead of one, by providing distinct outputs at the top and bottom of the column.
