A flexible and reconfigurable receiver architecture for the WCDMA high data rate connections is presented. The proposed architecture consists of a single computational unit featuring the demodulation of one channel path and the suppression of one term of the inter-path interference with minimal configuration logic and routing. This unit is used in a serial fashion to perform the total channel demodulation and IPI suppression. It is controlled by the supervisor, an intelligent architectural element, in order to optimise system performance over a computational power constraint.
A flexible and reconfigurable receiver architecture for the WCDMA high data rate connections is presented. The proposed architecture consists of a single computational unit featuring the demodulation of one channel path and the suppression of one term of the inter-path interference with minimal configuration logic and routing. This unit is used in a serial fashion to perform the total channel demodulation and IPI suppression. It is controlled by the supervisor, an intelligent architectural element, in order to optimise system performance over a computational power constraint.
Introduction: High data rate connections can be supported in WCDMA systems by employing low spreading factors while maintaining fixed bandwidth. If extremely low spreading factors, e.g. 2 or 4, are applied, it is likely that there is only one high data rate user present in the system [1] . The other possible users employ significantly higher spreading factors, and thus are received at lower power. Therefore the multiple access interference (MAI) can be considered negligible and inter-path interference (IPI) becomes the basic reason of performance degradation for the conventional RAKE receiver.
A significant gain in performance can be obtained by employing an interference cancellation approach, which does not treat IPI phenomena as unstructured white noise but instead takes into consideration the structured nature of the interference. A multi-stage inter-path interference canceller (IC) introduced in our previous work [2, 3] showed remarkable performance enhancement. This cancellation scheme suppresses IPI phenomena by using a process close to the RAKE demodulation and is ideal for a reconfigurable and flexible implementation.
In this Letter we present this novel architecture for WCDMA mobile terminals. The proposed architecture combines hardware efficiency, flexibility and controllability.
Flexible and reconfigurable architecture: Instead of using a number of dedicated fingers to perform channel demodulation, and a separate functional block to mitigate the IPI phenomena at the correlation output, the proposed architecture performs demodulation and multistage IPI cancellation sequentially by using a well-controlled single computational unit. A detailed block diagram of the proposed architecture is shown in Fig. 1 . It contains three main units: stream memory, computational unit and Supervisor (SPV). A. The stream memory: The multi-stage interference cancellation is a block algorithm (BLE) and works on data blocks. In the WCDMA the data transmission is organised in terms of time-slots. Thus, we can consider that the block is equal to one slot and its maximum available processing time is equal to its duration. The stream memory consists of two two-port SRAMs. The first one, SRAM chip, is used to store the current transmitted slot, and the second one, SRAM symbol, as the data source for the processing of the previous slot. Fig. 2 shows the relative timing and overlapping of the operations (SRAM chip WR=RD and SRAM symbol WR=RD) in order to avoid data hazards. The input slot arrives in the SRAM chip as I=Q sample pairs coming from pulse shaping filtering. Considering that read is higher than write access frequency, the necessary size for the SRAM chip can be calculated with the following formula
where N SRAMchip is the size of the SRAM chip memory in bits, T slot is the slot duration, T spread is the delay spread, T c is the chip duration, R is the oversampling ratio and N bits is the word length for both I and Q samples.
As for the SRAM symbol, it is used as a data input for the IC algorithm. It stores the RAKE and the previous IC stage output. Thus its necessary size can be calculated as
where N SRAMsymbol is the size of the SRAM symbol in bits, SF is the spreading factor and N bits 0 is the word length for both I and Q soft decisions of the decision device.
With 10-bit I=Q samples, R ¼ 4, a slot duration 2=3 ms, a chip rate f The proposed unit contains all of the computational, storage, and configuration resources needed for one channel demodulation and one IPI term suppression. Because its operation is time-multiplexed between the multipath components and the IPI discreet terms, its operation frequency has to be high enough, so that all the operations can be performed within a processing cycle defined by the timeslot. Its identification is a key contribution of this work. It contains two trivial multipliers, two integrators, one complex multiplication and three multiplexers to accomplish the reconfiguration.
In the RAKE mode, the data coming from the SRAM chip are chip samples. According to the channel estimation and the processing symbol and path, one sample chip arrives at the computational unit in each clock cycle. The first trivial multiplier is switched off and the second one performs the despreading process. In each clock cycle, it multiplies the input chip sample with the appropriate spreading and scrambling value. The resulting signals are integrated over a period corresponding to the spreading factor. Partial symbol integration results are stored in an integration register which has an initial value equal to zero and is initialised for each processing path and symbol. The output of the integrator is multiplied with the conjugate channel by the complex multiplier. Finally, a second integrator is used in order to perform the maximum ratio combination (MRC). In this case, the integration period is equal to the number of tracked paths, the initial value of the register is also zero and the initialisation is repeated for every processing symbol. The resulting symbol soft decisions are stored at the symbol SRAM, for further processing.
In the interference cancellation mode and for the initial stage, the data coming from the SRAM symbol are the symbol soft decisions of the RAKE combination. In each clock cycle, one symbol is passed through the decision device to produce an estimation of the corresponding transmitted symbol. In our proposition the decision function used is a hybrid combination of hard and soft decision. The estimated symbol is fed into the first trivial multiplier, which implements the spreading process. In the sequel, each chip produced is fed into the second multiplier which implements the despreading process. The spreading= scrambling codes used have different phases corresponding to the generated IPI term in question. The resulting signals are integrated over a period corresponding to the spreading factor. The output of the integrator is multiplied with the combination of two channel paths and its 2-complementary is taken. Finally, a second integrator is used to suppress the generating IPI term from the RAKE decision. The initial value of the integrator register is the RAKE decision of the processing symbol. The integration period is equal to the number of the considered IPI terms. The decision symbols produced are stored in the SRAM symbol. To implement the next stages of the IC algorithm, the required input symbol soft decisions are the stored output of the previous IC stage.
C. Supervisor: The SPV is an intelligent architectural block which controls and synchronises the operations of the computational unit and the stream memory. It translates the time delay line (TDL) given by the channel estimation to memory addresses and provides the appropriate read=write addresses for the tracking of a particular channel path or the reconstruction of an IPI term. Concerning the control of the computational unit, SPV provides the appropriate reconfiguration and input signals at its different structural blocks and synchronises its operations.
An important task of SPV is the optimisation of the system performance under a well-defined computational power constraint. This constraint is the maximum number of cycles which can be used to process a symbol. This number can be calculated with
where f clk is the clock frequency, f sym ¼ f c =SF is the symbol frequency, and a is defined as the ratio f clk =f c . The above constraint limits the number of the serial iterations of the computational unit. With a pipeline implementation of the computational unit, the throughput is equal to 1 process=SF cycles. Thus, if L is the number of resolvable channel paths, the above constraint can be written as
where K(K À 1) is the number of IPI terms under consideration with 0 K L, and V the number of stages. At run time, the SPV selects these parameters with parameter K at a maximum. With L ¼ 3 and a ¼ 32, we can use K ¼ 3 and V ' 5. Fig. 3 shows the BER performance of the proposed reception scheme for a WCDMA-FDD high data rate downlink connection. The radio channel has three independent and equal Rayleigh fading with average power P ¼ [0 0 0] dB and delay t ¼ [0 3 6]T c . The proposed algorithms achieve better performance than the conventional RAKE, block minimum mean square error (BLE-MMSE) and block zero forcing (BLE-ZF) [4] . 
