In general, an arithmetic logic unit (ALU) of a DSP core is composed of an adder, multiplier and shifter. In order to obtain a high-performance 32-bit ALU, in this paper, an adaptive leaf-cell based layout technique is proposed. Thus novel architectures of 64-bit adder, 32 £ 32-bit multiplier, and 32-bit shifter are proposed. The architecture of the proposed 64-bit adder is based on the conditional select addition with regular adaptive multiplexers. Secondly, novel optimized data compressors with a compound logic are proposed in a 32 £ 32-bit multiplier. Finally, a shift algorithm with a pre-mask decoder is proposed for the 32-bit barrel shifter. They have been fabricated with 0.25 mm 1-poly 5-metal CMOS process, and we have obtained desired experimental results.
INTRODUCTION
Recently, high performance digital signal processors have appeared on ASIC market, because of the increasing demands for multimedia data processing. Specially, a DSP core is one of a key component in the area of telecommunication, voice, video, three dimension graphics and so on. As shown in Fig. 1 , a DSP core is composed of an arithmetic logic unit (ALU), memory unit, controller unit, and I/O unit.
Further, the ALU of a DSP core is generally composed of an adder, multiplier and shifter. Thus, an ALU has a great role to improve the characteristic of DSP core. This paper describes a design methodology of 32-bit ALU for a DSP core that has a low-power consumption and highspeed operation. In order to obtain a low power and a high speed characteristics, therefore, novel architectures of adder, multiplier, and shifter with an adaptive leaf-cell based layout technique are proposed. In general, a leaf-cell based layout means a strong regular architecture with a basic cell library [1 -3] . However, it has some drawbacks that the chip area is somewhat larger and the power consumption is bigger than those of a full-custom layout are. On the contrary, the full-custom layout technique has a small chip area and a low power consumption, while it takes many hours. Thus we propose an adaptive leaf-cell based layout that has the advantages of the conventional leaf-cell layout and the full-custom layout. The contents of the paper are as follows.
In the second section, a 64-bit conditional select adder with adaptive regular multiplexers is discussed. The most optimised data compressors and a novel compound logic for the design of a 32 £ 32-bit multiplier are described in the third section. In the fourth section, a 32-bit barrel shifter with a pre-mask decoder is discussed. An adaptive leaf-cell based layout generation process and experimental results are described in the fifth section. Finally, the conclusions are summarized in the sixth section.
CONDITIONAL SELECT ADDER WITH ADAPTIVE REGULAR MULTIPLEXERS
The proposed architecture of a 64-bit conditional select adder is shown in Fig. 2 .
In order to obtain a low power and high-speed operation, it is combined with a carry look-ahead adder, a carry select adder, and a conditional sum adder [4 -6] . Further, the block carry generation block (CGB) is separated from the sum generation block (SGB) to raise Further, each circuit diagram for the adaptive multiplexers is shown in Fig. 5 . According to the role of each multiplexer, the most optimised multiplexer is chosen and adopted. Further, we obtain a low power consumption, because of the simple switch operation of the multiplexers. 
LEAF-CELL BASED LAYOUT
In each CSAB, the carry is generated by CGB, while the sum is instantaneously generated by SGB. The initial condition of SGB is decided by the output of separated block carry generation block (BCGB) that employs the same methods as the CGB. After a j-bit's carry is compared with a ðj 2 1Þ-bit carry, the j-bit carry is decided in BCGB. Then, this carry determines the carry value of the next two bits. In the same way, the carry is transferred to the final stage with a low power consumption and a high speed operation. At the last stage, therefore, the carry in the BCGB returns to the CSAB and selects the final sums. Figure 6 shows the conceptual adding procedure of the proposed adder with the adaptive leaf-cell based layout technique. The architecture of the proposed adder has a regular form and the total delay time is shorter than that of others [4 -6] . Thus the proposed conditional select adder has a low power and a high speed architecture. 32 3 32-BIT MULTIPLIER WITH OPTIMISED DATA COMPRESSORS Figure 7 (a) shows the proposed architecture of the 32 £ 32-bit multiplier [7] , which employs a modified booth's algorithm, Wallace tree, and an adder discussed in the "Conditional select adder with adaptive regular multiplexers" section. To reduce the multiplication time and power consumption, novel data compressors based on a full-adder are proposed. The concrete architecture with data compression blocks is shown in Fig. 7(b) .
To solve a sign extension problem in the booth encoder and one-bit adding problem in the partial product, 4-2 compressors and 9-2 compressor are proposed. After the end of Booth's encoding, the large bundles of data have to be compressed into vertical two data. In the conventional ones, only 4-2 compressor has been used because of the layout regularity [7, 8] . In the proposed architecture, the mixed data compressors are used to reduce power consumption and raise operating speed. Figure 8 shows the interconnection and critical delay path of data compression block, and Fig. 9 shows the block diagram of the optimised data compressors.
With the proposed data compressors, the critical delay path is about eight equivalent full-adder delay, while the conventional one is nine equivalent full-adder delay [7] . Figure 10 shows a circuit diagram of the proposed fulladder based on the novel compound logic. The logic is combined of a conventional CMOS logic and passtransistor logic [9, 10] . The compound logic only selects both the advantage of driving capability in CMOS logic and the advantage of simple switch connection in passtransistor logic. Thus the full-adder has a low power consumption and a high speed operation.
32-BIT BARREL SHIFTER WITH A PRE-MASK DECODER
An ordinary architecture of a 32-bit barrel shifter is shown in Figs. 11 and 12 shows an example illustration of a 3-bit barrel shifting [11] . According to the input data, the shifting is occurred with a rate of 1, 2 and 4. In Fig. 11 , some parts of output data are erased at the mask generator, after the desired shifting is ended. This is because it is necessary to support the random rotation. Figure 13 shows the proposed architecture of a 32-bit barrel shifter. The proposed shifter consists of a pre-mask decoder, shift array (left shift only), control units (scale factor decoder and option decoder), and mask generator. To reduce the number of internal wires in the shift arrays, we adopt a shift array with left shift only. While the conventional one has a post-mask generator, the proposed shifter has a pre-mask decoder and pre-mask generator. Thus, we can remove the useless shifting with the pre-mask decoder. Further, the power consumption is drastically reduced, because only left shifting algorithm is adopted in the shift array. Of course, the random left/right shifting, rotate, and fill with control signal in the empty state are also available in the proposed shifter. Figure 14 shows the circuit diagram of the proposed pre-mask decoder. It has a high-performance operation, because it is composed of a pass-transistor logic. 
EXPERIMENTAL RESULTS
For the purpose of the advantages of both bottom -up and the top -down design methodology, the ALU has been implemented with the proposed adaptive leaf-cell based layout technique. Further, we have adopted a hardware and software partitioning approach [1 -3] . Figure 15 shows the adaptive leaf-cell based layout generation program including a schematic netlist, wiring information, cell location information, and cell combination information. Using this procedure, the layout of the ALU has been generated and prototype chip has been fabricated. Figure 16 shows the microphotograph of the proposed ALU with 0.25 mm one-poly five-metal n-well standard CMOS technology. Figures 17 and 18 shows the experimental results of the proposed adder and the multiplier. The delay time is 2.3 and 3.3 ns, respectively. In case of the shifter, it is 2.1 ns.
The measured results are summarized in Table I and the comparison with the conventional ones are described in Table II .
CONCLUSIONS
In this paper, design methodology of an ALU for DSP core was discussed. In order to obtain a high performance operation of the ALU, a novel adaptive leaf-cell based layout technique was proposed. The prototype chip was implemented with 0.25 mm 1-poly 5-metal CMOS technology. Therefore, a 64-bit conditional select adder with adaptive regular multiplexers had the delay of 2.3 ns and it consumed 0.096 mW at 100 MHz. A 32 £ 32-bit multiplier with the optimized data compressors had the delay of 3.3 ns and it consumed 1.052 mW at 100 MHz. Finally, a 32-bit barrel shifter with a proposed pre-mask decoder had the delay of 2.1 ns and it consumed 0.051 mW at 100 MHz operation. In comparison with the conventional ones, they had desired experimental results.
