Abstract-This paper presents a low-power ASIC design for cell search in the wideband code-division multiple-access (W-CDMA) system. A low-complexity algorithm that is able to work satisfactorily under the effect of large frequency and clock errors is designed first. Then, a set of low-power measures are employed in the design of hardware architecture and circuits. Finally, through power analysis, critical blocks are identified and redesigned so as to further reduce the power consumption. The final design shows that the power is reduced by 51% from the original design of 133.6 mW to 65.49 mW, and its core area is also reduced by 31.9% from 
I. INTRODUCTION
I N A CODE-DIVISION multiple-access (CDMA) cellular system, the procedure employed by a mobile station to search for the best cell site and to achieve code, time, and frequency synchronization with it is referred to as cell search. Fast cell search is particularly important for the wideband CDMA (W-CDMA) system because of the use of nonsynchronous base stations in the system [1] , [2] .
A three-stage search procedure has been designed in the W-CDMA specifications in order to facilitate fast cell search, including slot synchronization, joint frame synchronization and code-group identification, and scrambling-code detection [2] . Slot synchronization (stage 1) is achieved by detecting the primary synchronization channel (PSCH). Joint frame synchronization and code-group identification (stage 2) is achieved by detecting the secondary synchronization channel (SSCH). And, after the code group is identified, the scrambling code can be determined easily by using the common pilot channel (CPICH) (stage 3).
A great deal of research has been contributed to the design of the cell search algorithms [1] , [3] - [5] . In [1] , a pipelined process was proposed to achieve faster cell search than the serial one at the cost of higher complexity. Partial symbol de-spreading with noncoherent combining was proposed in [1] , [3] a mobile station. This imperfection also incurs sampling error (clock error) at the analog-to-digital converter (ADC). It was shown in [4] that the clock error may exceed over one timing period during the course of three-stage search and that will result in search failure. In [5] , a search scheme with multiple "code time" candidates was proposed to reduce the search time in the clock-drifting environment.
In this paper, a low-power ASIC is designed for cell search in the W-CDMA system under the effect of large frequency and clock errors. A set of low-power design practices starting from the algorithm to hardware architecture and circuits is performed so as to reduce chip's power consumption. By using the power-efficient algorithm, architecture, and circuit designs, the power consumption of the design is reduced by 51%. The design is implemented and verified in a 3.3-V 0.35-m CMOS technology with clock rate 15.36 MHz.
The rest of this paper is organized as follows. Section II presents a low-complexity cell search algorithm under large frequency and clock errors. Section III describes low-power architecture and circuits, power analysis, and redesign of critical blocks. Section IV summarizes implementation and testing results. Finally, the paper is concluded in Section V.
II. LOW-COMPLEXITY CELL SEARCH ALGORITHM
Three stages of cell search can be performed either in the serial or pipelined fashion [1] . In the pipelined search, all three stages are performed concurrently and that results in a faster search. In this paper, a low-complexity pipelined search algorithm will be adopted for the low-power ASIC design.
Different methods have been proposed to counteract the effects of frequency and clock errors on the cell search performance. Generally, two methods can be used to mitigate the effect of large frequency error. One is frequency offset compensation (FOC) and the other is partial symbol spreading (PSD). FOC has superior performance but needs multiple stage-1 detectors [1] , [4] . To counteract the clock error, the simplest method is the random sampling per frame (RSPF) proposed in [4] , which was shown to be able to work satisfactorily under 4 ppm of clock error. For large clock error, the method of multiple timing candidates (MTC) could be employed, but multiple stage-2 and stage-3 detectors are needed [5] . Here, a simple method called sample-point reordering (SPR) is proposed to counteract the clock error for up to 10 ppm. The basic idea is as follows. First, the range of clock error is divided into "bins" with each bin denoting a presumed clock error. Then, within a bin, a controller is used to drop or stuff one sample point from or into incoming sampled sequence whenever the 0018-9200/04$20.00 © 2004 IEEE clock error is accumulated to one sample interval, as shown in Fig. 1 . Table I compares the complexity of different cell search algorithms in terms of million operations per second (MOPS) [6] . Among them, does not need any multiplier as used in the FOC function and has low complexity under the case of , , , where is the number of stage 1 when using FOC and SPR, and and are the number of candidates associated with MTC for stages 2 and 3, respectively. Fig. 2 shows the search time performance. The algorithm is able to reach 0.9 probability of search success in 600 ms. It is good enough for practical applications and is adopted as the low-complexity algorithm for this low-power ASIC design. Fig. 3 illustrates the block diagram of cell search ASIC along with internal bitwidths [9] . The ASIC consists of four parts, including preprocessing, stage-1, stage-2, and stage-3 detectors. The preprocessing module includes RSPF and SPR. The sample-point reorder consists of a pair of tapped-delay lines along with multiplexers and a reordering controller. The initial selection of multiplexers is at the "0" position. Whenever the reordering controller accumulates sampling error up to one sampling interval, the selection of multiplexers is adjusted toward " " to drop or stuff one sampling point from or into incoming sampled sequence. Then, the RSPF selects one of the sampling points within one-chip duration as input data to the three stages and changes its selection randomly for every new frame. In stage 1, the primary synchronization code (PSC) detector is designed as a hybrid de-spreader combining efficient Golay correlator (EGC) and hierarchical matched filter (MF) [7] , [8] . As shown in Fig. 4 , only 18 additions are needed to match PSC in four segments of 64-chip partial symbols. Similarly, a hybrid secondary synchronization code (SSC) detector is also designed by combining EGC and active correlator in stage 2 [9] . In stage 3, eight complex-valued active de-spreaders are employed to de-spread all scramble codes in the identified group.
III. LOW-POWER HARDWARE DESIGN
After the first-phase design, a complete power analysis is performed through its post-layout simulations in a 3.3-V 0.35-m CMOS technology by the EPIC's Powermill tool. The driving vector is generated from the system simulation model in [4] and its length is over 600 ms. Table II tabulates the details of power consumption and layout areas of all blocks.
As shown in the table, de-spreaders and (non)coherent combiners are the power-critical bocks in cell search processing. In de-spreaders, many shift registers are used in EGCs and MFs of PSC and SSC detectors. These shift registers are used as delay elements only, but all fields of shift registers change their values per clock cycle. This value (level) changing in CMOS circuits results in unwanted power dissipation. To overcome this defi- ciency, a pointer-based FIFO buffer is used to replace shift registers [10] .
In (non)coherent combiners, main operations are multiplications or squares used to calculate the true magnitude of signals. In cell search, however, magnitude values of signals are just used to compare with each other for finding the largest one. Therefore, a new low-complexity approximation is proposed to reduce the power consumption. That is, is approximated by (1) where and are of the form and therefore, only shift operations are needed. Fig. 5 shows the curve plane of which is apparently much closer to that of than other approximations, such as and . Fig. 6 is the simulation result by using this approximation. It is clear that the approximation induces almost no loss on search performance.
After applying the pointer-based FIFO buffer and the new magnitude calculator into the cell search ASIC, the power reduction is 51% from the original design of 133.6 mW to 65.49 mW. In addition, the chip area is also reduced 31.9% from mm to mm . Fig. 7 shows power reduction of redesigned blocks. 
IV. IMPLEMENTATION RESULTS
The low-power cell search ASIC is implemented into a real chip through a top-down cell-based ASIC design approach 
V. CONCLUSION
A low-power cell search ASIC for the W-CDMA system has been designed and implemented. First, a low-complexity search algorithm, which combines sample-point reordering, random sampling per frame, and partial symbol de-spreading is devised to counteract the effects of large frequency and sampling errors. The algorithm is then implemented to its layout with careful consideration for power consumption. Furthermore, the power consumption of whole chip is analyzed, and critical blocks such as those of de-spreading and magnitude calculations are redesigned using pointer-based FIFO buffers and new weighted magnitude calculators. The power and area reductions of the chip are 51% and 31.9%, respectively. The low-power cell search ASIC consumes 65.49 mW at 15.36 MHz.
