Abstract-The third generation (3G) of cellular communications standards is based on wideband CDMA. The wideband signal experiences frequency selective fading due to multipath propagation. To mitigate this effect, a RAKE receiver is typically used to coherently combine the signal energy received on different multipaths. An effective multipath searcher is, therefore, required to identify the delayed versions of the transmitted signal with low probability of false alarm and misdetection. This paper presents an efficient and novel WCDMA multipath searcher design and VLSI architecture that provides a good compromise between complexity, performance, and power consumption. Novel multipath searcher algorithms such as time domain interleaving and peak detection are also presented. The proposed searcher was implemented in 0.18 m CMOS technology and requires only 150 k gates for a total area of 1.5 mm 2 consuming 6.6 mw at 100 MHz. The functionality and performance of the searcher was verified under realistic conditions using a channel emulator.
the incoming signal with the spreading sequence yields energy peaks at the multipath locations. Furthermore, multipath with a time separation larger than the chip rate period of the spreading sequence (for example, in WCDMA, the chip rate period is 260 ns [2] ) can be independently resolved by the receiver. That is, correlating the incoming signal at these multipath locations generates independent information about the transmitted data.
CDMA receivers typically employ a RAKE architecture to demodulate the received signal [1] . In a RAKE architecture, a separate finger (a correlator receiver) is assigned to each detected multipath. The outputs of the fingers are then compensated for delay and phase and combined into one symbol. Correctly identifying the multipath profile at any given time is critical to the performance of the system. Using additional correct multipaths provides more signal energy to the RAKE receiver while combining invalid multipaths increase the noise level. Furthermore, the multipath profile changes in time due to the dynamic nature of the wireless channel.
This paper describes a novel system and VLSI architecture for a multipath searcher in a WCDMA mobile receiver. The searcher is responsible for continuously monitoring the channel and determining the multipath profile. A microprocessor uses the profile data to allocate the available RAKE fingers to the different multipaths. The searcher addressed in this paper can be used in a multiantenna receiver, where each antenna is treated as an independent source of multipaths. The multipath searcher algorithm must simultaneously optimize the following three parameters.
1) Maximize the probability of detection of a valid multipath . Equivalently, minimize the probability of misdetection .
2)
Minimize the probability of incorrectly declaring a valid multipath where none exists. This is known as the false-alarm probability . 3) Make sure that the multipath profile is current. This is critical when the receiver is in a high Doppler environment-i.e., when it is moving-and in a birth-death multipath environment. This constraint implies that the profile estimation must be performed quickly and the dwell time is short [3] . We believe that the CDMA multipath searcher described in this paper presents a compromise between processing power, hardware complexity and system performance. A number of algorithmic and architectural innovations and extensive simulation results are provided in this paper, which is organized as follows. Section II provides the system description and the basics of the proposed multipath searcher algorithm along with 0018-9545/$20.00 © 2005 IEEE some simulation results. The implementation details, including novel techniques for offset interleaving and peak detection are discussed in Section III. Section IV discusses the overall system on a chip (SOC) of which the searcher is a module. Finally, Section V presents the laboratory setup and measurement results. Conclusions are given last.
II. SYSTEM DESIGN

A. CDMA Signal Structure
The WCDMA signal is generated by multiplying the data stream with a high data rate pseudonoise (PN) spreading code. Each data symbol is thus converted into a number of chips at a rate of 3.84 Mcps. The scrambling code is designed to have low autocorrelation for offsets greater than 1 chip. Thus, multipaths separated by at least one chip can be treated as separate data streams and correlated separately. The WCDMA standard provides a global common pilot channel to assist with determining the multipath delay and energy profiles.
B. Principle of Operation
The signal transmitted from a specific base station is initially detected using a cell searcher algorithm which then provides the multipath searcher with the correct PN sequence aligned with the strongest multipath [4] , [5] . The multipath searcher uses this PN sequence for correlation with the physical common pilot channel (P-CPICH) symbols in order to detect other multipaths. Once a multipath is detected, the multipath searcher provides the RAKE receiver with the offset of the multipath relative to a global time reference in order to correctly configure the newly assigned correlator finger. Since the wireless channel characteristics vary over time [6] , the multipath searcher must operate continuously. According to the delay profiles used in WCDMA standard as well as in commonly used channel models for outdoor propagation, the excess delay one can anticipate is around 20 s ( chips). Therefore, the searcher should look for multipaths in a window of 20 s before and after the current strongest multipath. The multipaths should be detected with an accuracy of at least one chip. This enables the timing tracking loop inside a RAKE finger to identify the correct sampling point where the maximum energy of the multipath is present [8] . Detecting multipaths with the precision of one chip assumes that only one sample per chip is used in the detection process. However, this single sampling point might not correspond to the peak energy of the searched multipath, introducing up to 4-dB degradation in the detected multipath power. This will reduce the probability of detection of weak multipaths. Therefore, it is desirable to test few samples per chip for each PN offset. However, increasing the resolution of multipath detection requires a proportional increase in either hardware or processing time. In our multipath searcher implementation, we have chosen to use two samples per chip which provide chip resolution and less than 1-dB degradation. This results in possible offsets that need to be scanned by the multipath searcher. In the design, we further used a 25% margin factor and due to implementation considerations (see Section III-A), the chosen multipath searcher can scan a window of 408 offsets.
The searcher must operate under a wide range of conditions. Possible scenarios include birth and death channel conditions where one or more multipaths with a large fraction of the total received power suddenly disappear and re-appear at a new offset due to shadow fading and corner effects. The searcher must find the multipath at the new location as soon as possible to avoid dramatic performance degradation. Therefore, there are scenarios where the searcher has to respond very quickly-in few milliseconds. There are also situations when the searcher has found all the strong multipaths, but should continue to search for weaker multipaths in order to improve signal-to-noise ratio (SNR). In other conditions, the power level of different multipaths can change very dynamically in the presence of Doppler [10] . In this environment, the multipath searcher should not track the variations of the multipath power caused by fast fading. On the other hand, in the case of a slow fading (i.e., Hz) channel, multipaths can remain in a fade for a long period (i.e., at least 10 frames or 100 ms). The multipath searcher algorithm should be able to detect other stronger multipaths and remove the fading multipath. This requires the searcher to finish the detection of all multipaths in less than 10 frames. These different scenarios require an algorithm for multipath management. The multipath management algorithm allocates RAKE fingers to multipaths found by the searcher based on the power of the multipaths, the availability of RAKE fingers, current bit error rate (BER), frame error rate (FER), and SNR. This paper is focused on the multipath searcher itself, and does not consider the multipath management algorithm. However, in order to implement an efficient multipath management algorithm, the multipath searcher block should offer a high level degree of flexibility and reconfigurability to meet the various conditions imposed by the challenging wireless environment.
C. Multipath Searcher Algorithms
The problem of multipath searching in CDMA systems has been studied in [11] - [18] . However, most of the work reported in these papers focus only on the system level algorithms and do not discuss implementation issues or field trail performance. As a contribution of this paper to the multipath searching problem, all three levels of system algorithms, ASIC implementation and field trial measurements are addressed jointly.
A two-stage search algorithm consisting of initial search and verification stage has been proposed in [12] and has been adopted by others. The searcher described in this paper is based on this dual-dwell algorithm, however, major modifications have been introduced on the algorithmic level to improve the results and make the implementation more hardware efficient. To summarize, this paper makes the following contributions on the algorithmic and implementation levels.
• A third stage referred to as "detection stage" is added to the two-stage scheme proposed in [12] . This third stage helps in detecting relatively weak multipaths and hence improves the probability of missed detection. Furthermore, detecting such weak multipaths enables the RAKE combiner to utilize more fingers and provide a higher SNR especially useful in the case of high data rate link. The RAKE architecture used in our design allows for utilizing as many as 10 multipaths per antenna if desired.
•
The concept of dual threshold values, and , is introduced to allow for the detection of strong multipaths immediately without going through verification and detection stages.
•
The birth and death channel condition where a strong multipath appears or disappears suddenly due to channel shadowing is considered and addressed in all the search stages as an important case. This case is particularly emphasized in the 3G standard channel models.
• An adaptive scheme is proposed to calculate all the threshold levels based on an estimated noise level. To better estimate the noise level, a leaky accumulation scheme is introduced.
The proposed searching algorithm tests the delays within four consecutive samples at different times and hence takes advantage of time diversity provided by this interleaving. The proposed interleaving improves after the initial stage by almost 2 dB, with no penalty in hardware complexity or power consumption, as will be shown later.
• A peak detection stage is designed to avoid misdetection in the presence of overlapping multipaths less than a chip apart. The overlapping case is emphasized in the 3G standard channel models.
To perform the coherent averaging against the PN code, we use a programmable and flexible bank of correlators as opposed to a group of matched filters. This approach provides a more power and area efficient implementation and a higher level of dynamic programmability of the available resources by the microprocessor (see [5] for more details on the advantages correlators banks over matched filters).
The implementation and trial of the algorithms enable us to validate many of the design parameters choices (such as different threshold values, coherent and noncoherent correlation periods) in real time scenarios. These parameters have been derived and validated in the previous works only based on computer simulations. In the following sections we will present a discussion of the algorithms and design parameters of the searcher architecture.
The proposed algorithm has three stages: initial search, verification of the candidates, and detection stage where the decision on multipath offsets and powers is made. Fig. 1 depicts the conceptual block diagram of the multipath searcher algorithm. Three stages are pipelined, so that the detection time is reduced and is maximized. All the inputs are processed during one frame, and outputs are available for next stage or usage at the end of the frame.
During the initial multipath search, the instantaneous power levels for each PN offset (out of 408 possible offsets) is computed by correlation over 512 chips with a resolution of chips. To illustrate the output of the first stage, let us assume that there are two multipaths with relative power levels of 0 and dB at offsets 0 and 25 chips. The output of the initial multipath search for this scenario is depicted in Fig. 2 . As we can observe from the figure, there are two high autocorrelation peaks that correspond to the two multipaths. However, there are other cross-correlation peaks of the PN codes present at the output. A major factor that directly affects the performance of the multipath searcher algorithm is the threshold level which is required to detect multipaths and discard false peaks. In low SNR or fast fading channels, this decision is not reliable. Only in the case of strong multipaths, we can set a high threshold and detect a multipath with an acceptable . In these conditions is high only for strong multipath ( dB). However, having a provision of detection of strong multipath during the initial multipath search is invaluable since these multipaths are available immediately and can be used to mitigate the adverse effects of birth and death channel conditions.
Weaker multipaths are detected using the subsequent multipath searcher stages. However, to reduce the processing requirements and improve the performance, a dual-dwell search approach is used and among all the 408 possible offsets scanned during the initial search only a subset of offsets, , is selected for further observation in the verification stage. This subset must be chosen such that is maximized. The first stage is therefore high, but it must be maintained at a reasonable level in order to limit the size of . Therefore, two thresholds are used during the initial multipath searcher stage, as indicated in Fig. 2 . The first threshold is relatively high and is used to select a high power multipath. The high power multipath should be used as soon as possible and will be referred to as the "urgent multipath." Note that for a specific frame, the first stage reports at most one urgent multipath. The urgent multipath offset is also among the candidates, so that if a false alarm occurs, it can be corrected after the verification stage. The second threshold is relatively low and is used to select the candidates set for verification in the subsequent multipath searcher stage. The design parameters for the initial stage are: the number of correlators required for screening 408 possible offsets, the threshold used for selection of urgent multipaths, and the threshold for selection of candidates.
The multipath searcher verification stage provides an accurate power estimate for all the multipath offset candidates provided by the initial multipath stage. The frame is divided into uniformly distributed averaging periods, as illustrated in Fig. 3 . A 512 chips correlation is computed for each candidate offset in each averaging period. Power of each multipath candidate is given by
The design parameters for the verification stage are: the number of correlators required for the averaging, the maximum number of multipath candidates, and the number of averaging periods.
After the verification stage, an accurate estimate of the power for all candidates is available. Results from the verification stage can be used to detect additional multipaths. Candidates that exceed a given threshold are declared valid. The threshold should be selected such that is small and is high for relatively strong multipaths ( dB). These detected multipaths are available for the multipath management algorithm two frames after the multipath search start. However, weaker multipaths ( dB dB) still can not be detected with a high and low due to low signal to noise ratio on these multipaths and cross correlation peaks that are still in the same order of magnitude.
Detection of the weak multipaths requires postprocessing that is performed in the third stage. This stage is performed in the using the results from the verification stage. At the end of each frame, the initial and verification multipath searcher stages output candidate offsets and their power values, respectively. The candidates are then monitored over several frames so that weak multipaths can be identified with low and high . Essentially, weak multipaths have relatively high probability of misdetection during the initial search. Therefore, we need to accumulate power for different candidates over several frames. We also need to count how many times this multipath appeared among the candidates. On the other hand, false detected candidates after the initial stage are going to have a low power reported by the verification stage and are unlikely to be detected more than once by the initial stage. The final decision on the presence of a multipath at the specific offset is made in the detection stage based on the power reported for candidates by the verification stage and the offset reporting frequency. However, this procedure introduces a delay in the detection process. Fig. 4 illustrates the delay of five frames from the initial search to the detection of the multipath. Here, we assume that the multipath candidates are monitored over three frames in the detection stage. There are two design parameters in the third stage detection process: number of frames to monitor the candidates, and the threshold for the final multipath detection.
III. IMPLEMENTATION
A. Initial Search
As discussed in Section II-B, the required multipath searcher window length, with a 25% design margin factor, is 400 offsets. Let represent the number of correlators required in the initial stage to compute the 512 chip correlations at the 400 different offsets during one frame. The number of required correlators is determined by therefore,
. Note that using six correlators the search window is now 408 candidate offsets at chip resolution and 3584 chips are available for overhead operation in the initial stage implementation. If the searcher is used in a dual-antenna receiver, 12 correlators will be required for the initial stage implementation. Each of the 408 magnitudes must be compared with two thresholds to determine if the offset should be considered as a candidate multipath and verified in the second stage , reported directly to the as an urgent multipath , or simply discarded. The thresholds are defined as a function of the noise estimate as , and , where urgent detection factor (UDF) and signal detection factor (SDF) are design parameters.
1) Adaptive Noise Estimation:
Correctly estimating the noise level is the determining factor for the performance of the initial multipath searcher stage performance. Initial simulations showed that using a noise estimate based on the mean of the correlation energies from all 408 offsets gives good results in AWGN and slow fading environments. However, in fast fading environments ( Hz), both the multipath detection and false alarm rate were at unacceptable levels. Therefore, we are introducing a novel method for continuously computing the noise estimate based on leaky averaging of the correlation energies of previous offsets-see [19] . The block diagram of the adaptive noise estimation is given in Fig. 5 , where is the leaky coefficient and NDF is the noise detection factor. NDF is used to make sure that only nonmultipath correlations contribute to the noise estimate (i.e., if the correlation energy for a specific offset is greater than , the correlation energy is not considered as noise and the noise estimate is not updated). Extensive simulations both for AWGN and fading channel conditions were conducted to assess the performance of the proposed adaptive noise estimation algorithm and determine reasonable values for the searcher parameters. Note that all the parameters are fully programmable in the VLSI implementation.
The relative values of NDF, SDF, and UDF have a considerable effect on the searcher performance. Based on the simulations, we determined that the best value for NDF is 4, which discards any correlation result that is 6 dB higher than the computed noise floor. We then explored the space for SDF and UDF. For higher SDF, fewer candidates exceed and increases. On the other hand, too many candidates are passed to the second stage for lower SDF, making the second stage very hardware intensive. The optimal value appears to be . For that combination of parameters, fewer than 40 candidates are generated by the first stage in 95% of the simulated channels.
2) Computing TH1: The UDF parameter must be selected to minimize since any candidate that exceeds is passed to the without further verification. Fig. 6 shows the simulation results for and for the "urgent" multipath in fading channel with Doppler of 100 Hz and two equal strength multipaths with a separation of 25 chips. From Fig. 6 we can observe that the misdetection probability is less sensitive to the UDF factor than the probability of false alarm. It is interesting to note that the false alarm probability increases as the SNR increases. This phenomenon is a consequence of the threshold based on the noise estimation. Essentially, when the SNR is high, the noise estimate becomes a small number. Since TABLE I  INTERLEAVED OFFSET CORRELATION the cross-correlation peaks do not depend on the noise level, when the threshold becomes smaller the large cross-correlation peaks are more likely to exceed the threshold. This effectively increases the probability of false alarm.
3) Offset Interleaving: The multipath searcher must be robust to a variety of interference sources-both noise and multipath interference. In the proposed implementation of the initial multipath searcher stage, six correlators are simultaneously correlating the incoming signal to test the different offsets to detect multipaths. In order to improve the performance of the multipath searcher, we are introducing the principle of interleaved offset searching in the initial stage. The principle of interleaving data to make the system more robust to time varying, bursty noise sources is well known. The idea is: if multiple decisions can be used interchangeably, it is beneficial to make these decisions at different times. Our proposed searcher tests the delays within four samples, as shown in Table I , at different times and hence takes advantage of time diversity provided by interleaving. With six correlators, it will take four time slots (of each 512 chips) to perform the correlations. This results in a correlation cycle that repeats every 24 correlations (6 correlators 4 correlations each). In the straightforward approach consecutive offsets are processed at the same time and will all experience the same interference and PN-cross correlation effects. However in the proposed interleaving idea correlations for offsets of 0, 4, 8, 12, 16, and 20 are performed in the first slot, then offsets of 1, 5, 9, 13, 17, and 21 in the second slot, and so on. By doing so, consecutive offsets are now processed one time slot apart (512 chips), and the characteristics of PN-cross correlations effects and interference changes significantly over 512 chips. The interleaving improves after the initial stage by almost 2 dB, as shown in Fig. 7 . The block of 24 correlation results is postprocessed as discussed later.
4) Peak Detection:
The time resolution of a WCDMA system is limited to the duration of one chip. Indoor wireless channels frequently have multipaths that are separated by a fraction of a chip and therefore cannot be resolved. Scenarios with closely spaced multipaths present a special challenge to the searcher. Furthermore, the pulse shaping filter (raised-cosine filter with for WCDMA) spills the energy around the actual multipath location. Consider the scenario shown in Fig. 8(a) where two multipaths are separated by one chip. The power in the lobes of the two multipaths is combined at offset 3 and result in a peak that exceeds the set threshold. However, there are clearly only two multipaths in this scenario. A peak detection algorithm is therefore employed to prune the spurious peaks and reduce the number of candidates that have to be verified in the second stage. The basic idea of the algorithm is to declare a peak only if it is larger than its two immediate neighbors (at chip offset). The peak detection can be performed on any blocks of consecutive offsets. In our implementation, we set . Fig. 8(b) -(d) shows how the peak detection algorithm proceeds at different steps.
5) VLSI Implementation:
The overall block diagram of the first multipath searcher stage for one antenna is shown in Fig. 9 . As discussed in Section III-A, six correlators are used to compute 512-chip correlations of the input signal. Only one PN generator is used to generate the PN codes used in the initial stage. A delay line is used to generate the required offsets for all six parallel correlators within one symbol. A two-chip delay implements the interleaving by 4 discussed in Section III-A-3) since each chip is made up of 2 samples. The PN generator continuously generates PN codes for 2 symbols (1024 chips) since the first two symbols are in the same chip. The correlators automatically select the chip offset depending on the current symbol number. The PN generator then holds for one chip since the next two symbols are delayed by one chip. After 1024 chips the PN generator halts for a duration of 12 cycles. The reason for this delay is that the next peak detection block will start at sample number 24 which is offset by 12 chips from the current location of the PN generator. The timing diagram is shown in Fig. 10 .
Each block of 24 correlation results is passed to the peak detection block as discussed in Section III-A-4). Finally up to 36 candidates are stored in a double buffer to be used by the second stage in the verification phase. 
B. Verification Stage
The candidates generated in the first stage are verified in the second stage. The verification stage gives a greater confidence in the candidates by providing an accurate measurement of the energy at the candidate offset by averaging the power at each offset several times across the frame. The following formula shows the relationship between the number of correlators and the number of averaging periods # # # For example, if we have 36 candidates and we wish to average each offset 10 times over 512 chips, then the formula above would result in 4.8 as the number of correlators. However, we will need some time guard at the end of the frame to finish the processing and report the results back to the microcontroller and also prepare the correlators for the next frame. Therefore, we decided to use six correlators instead. By doing so, the number of chips required to finish the processing would be 30 720 (by using the same formula for # ), which gives us an extra chips at the end of the frame to handle the operation overhead.
At the end of a frame, the searcher computes and stores the average correlation powers for every candidate identified in the first stage. The tuples containing the offset and its power are passed to the . The compares each power with a threshold and identifies the strong multipaths. A multipath is considered to be strong if dB for the multipath. Offsets with powers that exceed the threshold are passed directly to the multipath management routine, while the others are saved for processing in the detection stage.
The threshold must ensure low probability of false alarm regardless of the SNR and the Doppler fading rate while providing an acceptable probability of detection for the strong multipath. The threshold can no longer be based on the adaptive scheme described in Section III-A-1) since correlation results from across the frame are averaged. Since the initial search runs in parallel with the second stage, we used the average energy of the correlation for the 408 offsets in the window to provide an estimate of the overall noise and cross-correlation in the current frame. We will refer to this estimate as SS. Therefore, we can use this interference estimate "SS" as the threshold calculation in the second stage. The threshold is defined through the strong multipath threshold (SMT) factor . Extensive simulations under both AWGN and fading conditions show that provides acceptable performance for and . Representative simulation results are shown in Fig. 11 .
has an unexpected behavior at higher signal to noise ratios:
actually increases when there is less noise. The explanation for this behavior is based on the fact that when the SNR increases the noise estimate becomes relatively small. Therefore, the cross-correlation peaks are more likely to exceed the threshold, similarly to what was observed in the first stage. This phenomenon can be prevented by using a different threshold when the noise estimate becomes small. The new threshold is defined by where is a programmable parameter. At low SNR, the original definition is used, but at high SNR the threshold is defined by a ratio of the largest correlation value and the second largest candidate. For example, assuming a channel profile where it is expected that candidate multipaths are 6 dB less than the strongest multipath, the correct value to use for is . 1) VLSI Implementation: As discussed above, the verification stage computes 512-chip correlations at up to 36 different offsets determined by the first stage. Each of the offsets is evaluated at 10 different points in the frame and the average correlation power is presented to the . The correlations are done using six correlators in parallel. The 36 possible offsets are split into six groups of six offsets and each correlator handles one group. The offsets in each group might be spaced far apart. This consideration requires a dedicated PN code generator for each correlator, as shown in Fig. 12 . The algorithm and timing diagram for the verification stage are rather nontrivial and are described next. It is important to note that the offsets generated by the first stage are sequentially ordered and are spaced at least one chip apart. This constraint allows us to reduce the number of times the PN code generator needs to be initialized. Consider the first correlator working on offsets . The timing diagram is shown in Fig. 13 and the algorithm is summarized here.
1) Read the first offset from the stage 1 buffer. Note that the offsets are in the range . 2) Wait until the "virtual start of frame" that corresponds to the center of the multipath search window. This signal is common to the first and second searcher stages.
3)
Initialize the PN code generator to chip 0 and start incrementing every chip.
4)
Wait until chip number . The symbol correlations must always start on symbol boundaries. The pilot channel being correlated, CPCH, does not have distinct symbol boundaries since it is simply a stream of "1"s. However, other WCDMA channels are transmitted at the same time and would increase interference unless symbol boundaries on those channels are observed.
5)
Start the correlation and correlate over 512 chips. The result of this correlation corresponds to the first component of the average power for the first multipath candidate . Save in an accumulator for the first multipath candidate.
6)
Pause the PN generator and wait for chips. The new state of the PN generator corresponds to a multipath candidate at offset .
7)
Continue from step 5 except considering the first component of the second multipath candidate .
8)
Repeat steps 5-7 six times to compute the first components of all the multipath candidates. This completes the first averaging period. Note that the averaging period is 7 512 chips long to allow enough time for the 6 correlation periods and the wait times between successive correlations.
9)
Wait until chip 7 512. 10) Continue from step 3 except the PN generator is initialized to 7 512 and the second averaging component of all the multipaths is being considered. 11) Repeat these steps 10 times to accumulate the multipath components over 10 averaging periods. 12) Pass the accumulated results, given by , to the .
C. Detection Stage
After the verification stage, we have an accurate estimate of the power for all multipath candidates. Although strong multipath can be detected reliably based on the verification stage, weaker multipaths that need to be used in the RAKE receiver might still not be consistently detected among cross-correlation peaks in the presence of noise. Therefore, we need to further process the results from the verification stage in the multipath searcher detection stage.
As discussed earlier, in the second stage there is a tradeoff between misdetection and false alarm probability as we change the threshold . Therefore it is impossible to improve the misdetection and maintain the false alarm either constant or below certain limit. The detection stage is designed to overcome this problem by monitoring the candidates reported by the initial and verification stages over consecutive frames.
Pulse shaping filtering spills some energy around the actual multipath offset. Therefore, for example, if a multipath is offset by of a chip with the samples used for the multipath searcher, the initial stage will detect energy at the offsets that are of a chip around the multipath. In other cases where the noise level is large compared to the multipath energy, due to the interleaving process we can detect a multipath at the samples at of a chip around the multipath even if the multipath is perfectly aligned with the sample used by the initial stage. Therefore, a multipath will be assumed to exist at position for a specific frame if the initial stage reports a candidate at position , or . The energy for the multipath at position is the average energy of the reported candidates at position , or . The detection works as follows.
1)
Monitor results from the initial and verification stages for consecutive frames.
2)
Average the reported energy for each offset in the search window over the consecutive frames (note that we average the energy only over the frames where the offset is reported as a candidate).
3)
If a particular offset is reported less than times over consecutive frames, set its average energy to zero.
4)
If the averaged energy for a particular offset is below , where is the noise computed over the search window in the initial stage average over the frames, set its average energy to zero.
5)
Perform the peak detection (see Section III-A-4). 
6)
Offsets for which the average energy is nonzero are the final multipath candidates. Therefore, the detection stage by using a majority vote is able to reduce the probability of false alarm. It then allows the designer to use a smaller threshold SMT and thus improve the weaker multipaths probability of detection. Furthermore, by averaging the energy over several frames, the is also improved.
The larger the , the better decision can be made in the detection stage, since more correlation results from different frames would be available. However, the delay due to large values of can be a problem, if the channel profile actually changes over the frames (the multipath searching algorithm has to complete a multipath detection before the channel profile changes). Using frames, and taking into consideration the time used by the initial and verification stages, makes the entire searching process equal to 5 frames or 50 ms. Figs. 14 and 15 depict the simulation results for and show an acceptable performance at both very low and very high Doppler frequencies in a reasonable search time of five frames or 50 ms. In both figures, two multipaths are present with the power of the first multipath set to 0 dB and the power of the second multipath is varied with respect to the first.
IV. SYSTEM ON A CHIP IMPLEMENTATION
The searcher presented in this paper is part of a complete modem implementing a dual antenna WCDMA transceiver. The overall structure of the entire system on a chip (SoC) including the searcher as part of the modem is presented in Fig. 16 The SoC integrates two ARM cores operating at up to 110 MHz.
The physical layer ARM (PHY) is responsible for controlling and configuring the baseband modem, channel [de] coding, interfacing to the RF front end via SPI, and for interfacing to the PRO ARM. The protocol (PRO) ARM microcontroller is purely for higher layer operations (protocol stack) as well as interfacing with the host through a cardbus interface. An external SDRAM and FLASH are used to store both data and code for both processors.
The modem section encompasses all digital signal processing for the physical layer excluding decoding and logical channel handling. The modem is implemented in a flexible hard-coded datapath. The hard-coded datapath approach allows the use of low power techniques such as canonic-signed-digit multiplier implementation and optimized signal precision. The PHY has full control over the resources in the modem and can turn them off with fine granularity to reduce power consumption. ADC inputs from up to two antennas are processed to obtain the soft symbols. Once initial synchronization to within one chip period is established, fine frequency and timing loops are activated to lock on and track the detected multipaths. The searcher unit performs a scan of new multipath components and updates the microcontroller accordingly. RAKE fingers are assigned to each detected multipath component via the microcontroller interface. The ASIC described can simultaneously track, process, and combine up to 20 multipaths from the two antennas. This requires on the order of 6 Giga operations per second (Gops). The modem is capable of handling 2 Mbps on the receive path and 768 Kbps on the transmit path thus conforming to the highest class of communication per the WCDMA standard. Fig. 17 illustrates the die photo of the SoC while Table II presents the overall SoC implementation statistics. As observed from the table the total power consumption at 384 Kbps is 550 mW. The searcher unit was implemented in a total of 150 k gates in 1.5 mm consuming 6.6 mW at 100 MHz.
V. LABORATORY TESTING
The testing of a complex SoC such as the one described in this paper is a major task. A testbed platform was developed specifically to validate the functionality of the ASIC and to measure the real-time performance. The test setup is shown in Fig. 18 . The ASIC was tested under a variety of conditions using both digital baseband and RF signals generated by a WCDMA transmitter. A WCDMA signal generator is used to emulate a base station. The generated signals are then passed through a wireless channel emulator that can be programmed to emulate different wireless conditions. The emulator then generates RF signals that are passed to an analog RF that down converts the signal back to baseband and presents it to the SoC. The transmitter section of the SoC is also hooked up to a transmitter tester to measure the error vector magnitude (EVM) of the transmitted symbols. 
A. Test Results
The modem was tested with the channels provided by the standard. The test results presented illustrate the case of single versus dual antenna performance. The criterion used to validate the performance of the searcher was meeting the 3-GPP mandated block error rate of 0.01 for both single and dual antenna cases for a 12.2-Kbps dedicated data channel. In all test cases, the searcher presented in this paper performed satisfactorily in locating and tracking the multipath while maintaining the BLER mandated by the standard. We present performance results for three different kinds of test cases that directly test the searcher under different conditions. 1) Case 1: Static test cases in which the multipath have fixed values and offsets that do not change with time. This scenario has four different multipaths at respective delays of 0, 1040, 2084, and 3124 ns with power levels at 0, , , and dB respectively. The speed is set to 120 Km/h. Performance curves are presented in Fig. 19 .
2)
Case 2: The second is a time varying multipath scenario in which one multipath is fixed while the other multipath changes its location (relative to the fixed multipath) with time according to the following equation where s, s and s . The two multipaths are of equal strength. Performance curves are presented in Fig. 20. 
3)
Case 3: The final scenario is a birth and death scenario in which multipaths periodically appear and disappear abruptly. Two equal strength paths, Path1 and Path2 are chosen randomly from the offset distribution of s. The path profile is updated every 191 ms. Performance curves are presented in Fig. 21 .
VI. CONCLUSION
A complete multipath searcher suitable for dual antenna WCDMA (3G) receiver was described. Both the algorithms and the VLSI implementation were presented and justified. The searcher exemplifies a good software/hardware partitioning by using the P only for control intensive and long-duration tasks. The searcher was implemented as part of a complete WCDMA transceiver and its operation was verified in laboratory trials. The transceiver was implemented in 0.18 m CMOS technology with the searcher using only 150 k gates and a total area of 1.5 mm .
Eugene Grayver (S'00-M'01) received the B.S. degree in electrical engineering from the California Institute of Technology, Pasadena, and the Ph.D. degree from the University of California, Los Angeles, in 2000.
He was one of the founding team members of Innovics Wireless, Los Angeles, a fabless semiconductor company working on low-power ASICs for multiantenna 3G mobile receivers. In 2003, he joined the Aerospace Corporation, El Segundo, CA, where he is working on flexible communications platforms. His research interests include reconfigurable implementations of digital signal processing algorithms, adaptive computing, low-power VLSI circuits for communications, and system design of wireless data communication systems. He has swven journal publications, and more than 12 conference papers. 
Jean-François Frigon
