We compare the layout implementations in terms of area and power, as well as provide comparisons to first-order area estimates done in the logic design phase. The results show that the new hybrid array multiplier can be significantly more efficient, with close to 30% power savings.
I. INTRODUCTION
Multiplier modules are common to many DSP applications. The fastest types of multipliers are parallel multipliers. Among these, the Wallace multiplier [13] is among the fastest. However, they do not have such a regular structure as the conventional array [8] or Booth [9] multipliers. Hence, when layout regularity, high-performance and low power are primary concerns, Booth multipliers tend to be the primary choice [2] , [5], [7] , [9] , [ 111. In this paper, we present layout implementations for both the Modified Booth multiplier and the new array multiplier using hybrid code proposed in [4] . We synthesize the multipliers by using an automatic synthesis tool named Tropic [IO] . In order to compare the Modified Booth and the hybrid array architectures, both using radix-4, the ELDO which is a spice-like simulator and is part of the Mentor Graphics environment, was used. The results show that the new array multiplier is significantly more efficient, saving more than 30% in power consumption. The power reduction presented by the new array multiplier is mainly due to the lower logic depth, which has a large impact in the amount of glitching in the circuit. We should stress further that, in contrast to the architecture presented in [4] , increasing the radix for the Booth architecture is a difficult task, thus not being able to leverage from the potential savings of higher radices. This paper is organized as follows. The next section briefly describes the radix-4 hybrid code, the hybrid array multiplier and the modified Booth multiplier. After this, we describe the design methodology, the hlly automated layout synthesis and how area and power results were obtained. Comparisons between the radix-4 hybrid array multiplier architecture and the Modified Booth, for both switch level and electrical level are presented following the sequence. Finally we conclude this paper, discussing the main contributions and future work.
2's COMPLEMENT MULTIPLIERS
In this section, we summarize the methodology of [4] for the generation of regular structures for arithmetic operators using signed r a d i~-2~ representation and hybrid encoding.
A. Hybrid Code
The idea of the hybrid code is to split the operands in groups of m-bits, encode each group using the Gray code and to use the Binary approach to propagate the carry between the groups. In this way, a compromise between the minimum Hamming distance between consecutive values of the Gray code and the minimum bit dependency of the Binary code is achieved [3] . Table I shows the 2's complement Hybrid encoding for 4-bit numbers and m=2. The radix-2"' operation in 2's complement representation is This operation is illustrated in Fig. 1 . For the case of radix-4 we have a conversion from hybrid code to binary code at the input of values. After that, the bits are calculated to binary encoding. Finally, at the end of multiplication, the final value is converted to hybrid encoding. Only one Type 111 module is required for any type of multiplier, whereas (25-2) Type I1 modules and (5-lY Type I modules are needed. We present a concrete example for W=8 bit wide operands using radix-4 (m=2) in Fig. 2 .
C. Modified Booth
The radix-4 modified Booth algorithm has been presented in [9] . In this architecture it is possible to reduce the number of partial products by encoding the two's complement multiplier. In the circuit the control signals (0, +Y, +2Y, -Y and -2Y) are generated from the multiplier operand Y for each 3-bit group. A multiplexer produces the partial product according to the encoded control signal.
One of the main problems in the modified Booth multiplier is the generation of the 2's complement for the multiplicand term. This is calculated by each multiplexer shown in the example of Fig. 4 for an 8-bit wide. 
( S~O I
I I I i n 0 I I oil I o i n I inlShrrt) Fig. 4 Example of an 8-bit multiplication using Modified Booth algorithm Common to both architectures is that, at each step of the algorithm, two bits are processed. However, the basic Booth cells are not simple adders as in the array multiplier, but must perform addition-subtraction-no operation and controlled leftshift of the bits of the multiplicand. Besides taking more area, this complexity also makes it more difficult to increase the radix value in the Booth architecture. As can be observed in Fig. 5 , for an 8-bit architecture, 4 encoders and 4 multiplexers are necessary in order to calculate the partial product terms. The multiplexers produce the multiplicand term according to the 3 bits that are generated in the encoder circuit. The partial product terms are shifted by the adders, which are also used to produce the final result.
DESIGN METHODOLOGY
In this section, we present the design flow for the layout implementation of the multipliers based on an automatic layout generation tool called TROPIC [lo] . Fig. 6 shows the design flow used in the logic [3] , in gray, and layout synthesis of the multipliers. In this figure we present both our methodology in black and the methodology used in [4] , in grey, for the analysis of the multipliers.
A . Design Flow
The multiplier circuits were described in a Berkeley Logic Interchange Format (BLIF). Thus, BLIF files are used as input of the design flow, as can be observed in Fig. 6 . In [3] , where the multipliers were synthesized in a logic level, the SIS tool [I21 is used in order to estimate area of the multipliers. In the methodology of [3], the power consumption of the multipliers was estimated using the switch-level simulator SLS [6] . Thus, it was necessary to use a converter from BLIF to SLS format. For the power estimation, different types of vectors, in SLS format, were used in the primary inputs of the multipliers. For the synthesis of the multipliers implemented in this work, the Tropic tool is used. This tool uses a sim format as input and performs an automatic synthesis of the layout of the circuit. Thus, a converter from BLIF to sim format has been developed in this work. In the Tropic environment, the total area and the number of transistors of the circuits are also estimated. Since the Tropic tool generates the widely used cif format, the resulting circuit layout can be visualized with Mentor Graphics IC Station tool (Mentor Graphics Environment). Extracted SPICE netlist files were simulated in the Mentor Graphics environment for power consumption estimation at the back-annotated electrical level. The ELDO electrical simulator was used. Power simulation is dependent on a set of input vectors. In this paper we compare results for two sets of vectors: random and representing sinusoidal signal. These vectors were generated for the SLS tool binary dgit format and converted into a set of input vectors in SPICE format for transient analysis also using a tool made especially for this project (Gerabits Converter). 
IV. PERFORMANCE COMPARISONS
In this section, we present area and power results for the multiplier architectures after layout, both W=8 and 16 bitwide operands. The modified Booth and the hybrid array multiplier proposed in [4] are compared. In the design a HCMOS 0.25pm technology is used. We set the number of cell bands generated by TROPIC such that the aspect ratio widthheight close to 1 was obtained.
Area results were obtained using the TROPIC tool and are presented both in terms of total area and in terms of number of transistors. Automatic layout compaction is not used in this design. Total average power is presented and was obtained by ELDO simulations. Real trace input signals and a random pattern signal with 500 input vectors was used. The real trace signal represents two sinusoidal signals with 90 degrees phase difference. In these simulations we have used Vdd=2.5V and f=SOMHz frequency operation. We have also compared our results with those reported in [4] , considering 16-bit architectures and for random pattern input vectors.
A. Area Results
In the Booth multiplier, an encoding scheme is used in order to reduce the number of stages in multiplication. Two bits of multiplication are performed at once and thus the multiplier requires half the stages. In the hybrid array multiplier proposed in [4] the number of stages can be reduced by more than half while the regularity can be kept as in the pure array multiplier. Table I1 presents As can be observed in Table II , the hybrid array multiplier presents the highest area and number of h'ansistors values. The same results can be observed in Table III , for 16-bit multipliers. This occurs due to the fact that the partial product lines operate on group of m bits and the basic multiplier elements, which compose 11-215 the modules for the product terms, are slightly more complex. As can be compared in Table II and Table HI , a larger area results for both the 8 and 16-bit array multipliers we proposed. As observed in [I], the major sources of power dissipation in multipliers are spurious transitions and logic races that flow through the circuit. Thus, the significantly less amount of spurious transitions in the new array multiplier justifies the gain in power when compared against the Booth multiplier, as observed in Table   IV and Table V , for 8 and 16-bit circuits respectively. Also noticeable in these tables, the array and Booth multipliers present more power reduction when using a sinusoidal waveform as input. This occurs due to the larger degree of correlation between input samples, leading to less abrupt variations, hence a smaller number of input bits toggling. As observed in [4] , the new hybrid array multiplier presents less logic depth due to the more balanced paths to the basic blocks that compose the array architecture. This contributes for improvement in power reduction because of the less generation of useless transitions. results obtained in our work in terms of way of simulation. In [4] , power consumption of the multipliers was estimated using the SLS switch-level simulator, where glitching activity is taken into account. Moreover, at the logic level power results were obtained by using a random pattern input signal with 10,000 input vectors. Area was estimated in SIS [ 121 environment and the results were presented in terms of number of literals, which is approximately half of the number of transistors Table VI shows area and power percentage changes between the new hybrid array and modified Booth multipliers. The estimates at the logic level and after layout correlate extremely well for both area and power. Area estimates at the logic level is just the number of literals coming ii-om logic synthesis. The relative power estimations are fairly close. The larger number of glitches generated in the modified Booth makes this architecture more power consuming, which is captured with the SLS simulator. This validates the results reported in [4] for the gate level design.
V. CONCLUSIONS
We have described the layout implementation of a new hybrid array multiplier that operates in 2's complement numbers using radix-2'" encoding. The signed radix-2'" hybrid array multiplier we proposed exhibits much lower power consumption than the modified Booth multipliers. The power saving have reached up to 32% considering 16-bit architectures and a random pattern input vector. Comparison of the logic and electrical level estimates were done and support that the modified Booth multiplier consumes more power when glitching activity is correctly simulated.
We combined both academic and commercial tools in this design and power estimation. Delay and silicon results are being experimentally tested and will be shown in future work.
