Abstract: Two low power flexible Rake receiver architectures are presented. The first architecture exploits the statistical distribution of multipath delays in wireless channels to reduce power dissipation. The second Rake architecture is based on a tradeoff between algorithm accuracy and circuit complexity. By introducing a negligible performance degradation, the SRAM memory for the input sample buffer is eliminated, achieving low power consumption and small silicon area. Both Rake architectures are targeted for third generation WCDMA mobile terminals (downlink receivers), but the circuits can also be applied to base station (uplink) receivers. The architectures have been synthesized in a 0.18 m standard cell CMOS technology using Cadence BuildGates. The proposed architectures achieve significant area and power savings as compared to previous circuits described in the literature. 
I. INTRODUCTION
A Rake receiver is an integral part of a wireless CDMA communication system, combining the received signals from several multipaths into a common decision statistic with significantly improved signal-to-noise ratio (SNR) and robustness to channel fading. A FlexRake architecture is proposed in [1] among multiple channel decoders. A low power alternative of the original FlexRake receiver employing parallel correlation engines for several data channels is presented in [2] . A common drawback of both architectures proposed in [1] and [2] is the significant power and area resources required for the input stream buffer. The efficient memory organization of the sample buffer is the primary focus of this paper.
The signals traveling along different multipaths from a transmitter to a wireless receiver can be assumed to have a uniform (or Poisson) delay distribution, such that L multipath signals arrive with equal probability during a time period [0, T max ]. The delay spread T max is discretely represented by N sample = T max /T sample sample moments. Each chip of the received direct sequence CDMA signal is represented by N spc samples (N spc is typically between two and six) produced by an analog-to-digital converter (ADC). Each of the multipath signals is locked on one of N spc samples within each chip and all data chips are despread by the appropriate pseudonoise (PN) code phase to form data bit decision statistics. In a conventional flexible Rake receiver all received input samples are stored in the SRAM memory buffer [1, 2] . The multipath delay information from the channel estimation and signal acquisition and tracking blocks is used to address the individual multipaths in the buffer.
An approach for reducing power consumption based on the statistical distribution of the multipath delays is described in section II. A second flexible Rake architecture is proposed in section III, where the SRAM memory block of the input buffer is eliminated, albeit with a minor degradation in algorithm accuracy. Results from the logic synthesis of both architectures are presented in section IV and compared to previous reports.
II. REDUCTION OF INPUT BUFFER WRITE ACCESSES
The number of SRAM write accesses can be reduced significantly based on the characteristic that the delays of the detected multipath signals are statistically distributed and the probability that all stored samples are used is low. Only a fraction of the stored samples are therefore used in the decoding process. For all cases where L N spc , there is a redundancy in the recorded input samples since no more than L sample slots are needed at a time. For L equally 
Stream buffer Correlator engine
Sample buffer distributed multipaths, the probability that exactly N x of all N spc sample slots are occupied is equal to the difference between the probability that each multipath delay falls within N x sample slots and the probability that all delays fall within
(1)
Based on this observation, a tag buffer of the used sample slots is created. Only the samples from these sample slots are written into the SRAM stream buffer. An architecture incorporating these changes is shown in Fig.  2 . Blocking excessive sample writes significantly reduces the dissipated power. A conventional FlexRake receiver architecture requires N spc * F chip write accesses per second [1] , where F chip is the chip frequency of the pseudonoise code (3.84 Mchips/sec for the WCDMA standard [3] ). Alternatively, the number of memory write accesses in the proposed receiver is a probabilistic function with a mean value,
where P L-Nn is the probability that exactly N n of all N spc sample slots are occupied. The distribution of N write_new for different N n and L parameters is shown in Fig. 3 . Example: For L = 3 multipaths and N spc = 4 samples per chip, the average number of SRAM write accesses performed by the circuit shown in This result demonstrates a significant reduction (by more than 70%) in the number of write accesses as compared to a conventional FlexRake architecture [1, 2] where
The probability distribution for L multipaths to occupy the N spc sample slots is plotted in Fig. 3 for N spc = 4. Based on this distribution, the average power savings from reducing the number of write accesses (as compared to the conventional FlexRake receiver [1] ) is shown in Fig. 4 for N spc = 4 and a varying number of multipath signals. For cases where N spc > L, there are always unoccupied sample slots, therefore a significant number of redundant samples are not written into the SRAM buffer. Alternatively, when N spc L, the savings originates from the probabilistic nature of the wireless multipath signals and up to 10% of the write accesses are not used.
III. ELIMINATION OF SRAM MEMORY BLOCK
The input stream buffer, implemented as a single-port SRAM memory, is the most power and area consuming block of the flexible Rake receivers proposed in [1] , [2] , and described in section II. All of the samples of the received signal are stored in this buffer. Individual multipath signals are selected by a memory-addressing scheme based on the estimated delay of the signal. The FlexRake receivers described in [1, 2] are algorithmically identical to the classical Rake receiver algorithm [4] . It is typical in a wireless environment that the delay of the L strongest multipath signals remains unchanged for relatively long periods of time, while the amplitude of the received signals fluctuates. For such environments, detecting new multipath signals is a rare event. A new architecture, illustrated in Fig. 5 , addresses this property. The key idea in this architecture is that the sample buffer is reduced to N spc memory cells (registers) and the multipath signals are identified by selecting an appropriate phase of the decoding pseudonoise code.
Eliminating the SRAM memory block is achieved at the price of increased power consumed to switch the phases of the code generators (according to the delays of the multipaths being decoded). The primary architectural tradeoff is between the area and power of accessing an SRAM memory versus the resources required to change the phases of the decoding code generators (pseudonoise and orthogonal). The OVSF (Orthogonal Variable Spreading Factor) code phase is easily changed by loading a new value into a 10-bit counter [5] . The pseudonoise (PN) code generator, however, is composed of two 25-bit registers [3] . Changing the values in the two registers is less power efficient than operating several generators in parallel. This approach also provides flexibility to power down unused generators when fewer multipath signals are decoded. The most efficient approach is to combine the architecture shown in Fig. 5 with the parallel correlation engines described in [2] , as shown in Fig. 6 . The appropriate sample slots are selected for each despreading block based on multipath tracking information. Several code generators and arithmetic units operate in parallel. Significant power is saved by reducing the frequency of operation and eliminating the frequent changes in the PN code phase. The optimal amount of parallelism depends on the expected wireless channel conditions and the number of orthogonal data channels.
From an algorithmic perspective, a delay is introduced when decoding a new multipath signal since there are only N spc buffered samples. The maximum decoding delay is equal to the delay spread T max and depends upon when the multipath detection occurs. During this period, the data bit decision statistic is formed from the detected signals of the remaining L-1 multipaths. Typical WCDMA systems perform channel estimation on a slot-by-slot basis, where a slot consists of ten control bits [3] . Wireless channels are modeled as quasi-stationary, assuming constant characteristics over short time periods. For such typical environments, the degradation in the detection performance, measured in terms of the bit error rate (BER) and frame error rate (FER), is negligible. The architecture shown in Fig. 6 can also be considered as returning to the conventional Rake receiver with parallel fingers [4] . It is complemented by the concept of sample slots and flexible multipath allocation by dynamic changes of the code phases. An analysis of the efficiency of this approach reveals considerable advantages as compared to FlexRake receivers based on an SRAM buffer [1, 2] .
IV. LOGIC SYNTHESIS AND SIMULATION
The following parameters are used for synthesizing the proposed architectures:
Bit width of received samples: 4 bits (8-bit word for the I/Q samples); N spc = 4 samples per chip; Delay spread: T max = 33 s, covered by 512 samples. The flexible Rake receiver architectures proposed in Figs. 2 and 6 are synthesized in 0.18 m standard cell CMOS technology [6, 7] using Cadence BuildGates. The primary focus of this section is on the resources required for the input stream buffer since the realization of this circuit is the primary difference among the proposed architectures and the original FlexRake receivers [1, 2] . Synthesis results for the memory block are summarized in Table 1 for the most common cases of L = 3 and L = 4 multipath signals (the number of orthogonal code data channels can be higher). As listed in Table 1 , the amount of power dissipated by the SRAM block is significant as compared to the 1.55 mW [1] and 0.44 mW [2] dissipated in the logic portion of the FlexRake receiver.
The first proposed architecture is lower power since the (average) number of write accesses is reduced. The power dissipation of the stream buffer can be reduced by about 38% when the number of multipaths L is lower than the number of samples per chip N spc . Alternatively, a smaller power reduction is possible by tracking the unused sample slots when L N spc .
In the second approach, proposed in section III, the sample buffer is implemented as three 8-bit registers (as shown in Fig. 6 ), significantly reducing the area and power as well as simplifying the control logic. The area of the small register file is considerably smaller than the SRAM memory required for the stream buffer in the previous architectures. The power savings analysis should also consider the correlation engine. As discussed in section III, it is more power efficient to operate several (typically, the expected number of multipaths L) parallel correlators with independent code generators as compared to a single correlation engine operating at a very high frequency. Both the area and power dissipated by the sample registers are significantly smaller than the resources required for the SRAM memory. Depending on the expected wireless channel conditions, different configurations for a partially or completely parallel correlation engine are more efficient. The architectures presented in section III are particularly appropriate for CDMA communications with infrequent changes in the multipath delay structure. A small delay spread T max is also desirable to minimize the average period of degraded decoding performance based on fewer multipath signals.
V. CONCLUSIONS
Two architectures for low power flexible Rake receivers are presented based on alternative memory organizations of the input sample buffer block. Both architectures provide power savings as compared to a conventional FlexRake receiver. In the second approach, the SRAM block is eliminated with a minor degradation in algorithmic accuracy and the additional power of the code phase changes supported by the code generators. Different approaches are preferable for different wireless channel conditions. 
