The integration of multimedia services over wireless channels calls for provision of variable quality of service (QoS) requirements. While radio resource management algorithms (such as power control and call admission control) can provide certain levels of variability in QoS, an alternate approach is to use recon gurable radio architectures to provide diverse QoS guarantees. In this paper, we outline a novel recon gurable architecture for linear multiuser detection, there by providing a wide range of bit-error-rate requirements amongst the constituent receivers of the recon gurable architecture. Speci cally, we focus on achieving this dynamic recon guration via a software radio implementation of linear multiuser receivers. Using a uni ed framework for achieving this recon guration, we partition functionality into two core technologies ( eld programmable gate arrays (FPGA) and digital signal processor (DSP) devices) based on processing speed requirements. We present experimental results on the performance and recon gurability of the software radio architecture as well as the impact of xed point arithmetic (due to hardware constraints).
Introduction
Future wideband CDMA systems will be required to support multimedia tra c with varied characteristics and quality of service (QoS) requirements. The integration of diverse services in the same system motivates the need for increasing capacity as well as dynamic QoS control. An important type of spread spectrum is direct sequence code division multiple access (DS-CDMA), used for communication in a multiple access environment. In a conventional CDMA system, all users interfere with each other. Potentially signi cant capacity increases and near/far resistance can theoretically be achieved if the negative effect that each user has on others can be canceled. A more fundamental view of this is multiuser detection 1], in which all users are considered as signals for each other. Then, instead of users interfering with each other, they are all being used for their mutual bene t by joint detection. The drawback of optimal multiuser detection is one of complexity so that suboptimal approaches are being sought. There is a wide range of possible performance/complexity combinations possible (see 1, 2, 3] and the references therein). Much of the present research is aimed at nding appropriate tradeo between complexity and performance. The theory of multiuser detection has, thus, far outpaced the applicability. This paper aims at advancing the theory in directions which will facilitate its practical applications. Speci cally, we focus on developing software radio architectures for implementing linear multiuser detectors for wideband CDMA communications.
In the past, radio systems have migrated from analog to digital in several ways including system control to source and channel coding to hardware technology. Recent progress in radio prototyping has extended the horizons of radio technology by liberating chronic dependence on hard-wired characteristics, including frequency band, channel bandwidth and channel coding. This e ort has lead to the feasibility of dynamically programmable receivers or software radios 4] . The evolution of programmable hardware and increased exibility via increased programmability has been accomplished by a combination of techniques. These include multiband antennas and RF conversion, wideband analog to digital (A/D) and digital to analog (D/A) conversion and the implementation of IF, baseband and bit-stream processing functions in general purpose programmable processors. In general, a canonical software radio architecture can be thought of as the comprehensive, consistent set of functions, components and design rules according to which systems of interest may be organized, designed and constructed. For a speci c architecture, the key issues in design are in partitioning of functions and components such that functions are assigned to components and interfaces among components correspond to interfaces among functions. In an advanced application, a software radio does not just perform signal reception, but it also entails several other specialized functions. These include characterizing energy distributions in adjacent channels, recognizing the mode of incoming transmission, adaptive interference rejection, adaptive channel estimation and equalization as well as forward error correction. Further, software radios also support incremental service enhancements through a wide range of software tools. These tools may be used in analyzing the radio communications channels, improving design for optimum performance enhancements as well as testing and prototyping. An example of such a software radio implementation for CDMA systems is presented in 5] where a simple successive interference cancelation scheme 6] was implemented. The novelty of the design in 5] included the appropriate partitioning of functionality and design to allow for dynamic recon guration via programmable processors.
In this paper, we address the issue of dynamically recon guring linear multiuser architectures/detectors with a view to providing a dynamic QoS in an integrated multiple service wireless environment such as those in future wideband CDMA systems. The approach in this work is based on dynamically recon guring architectures (via software, i.e., a software radio) amongst di erent linear multiuser detectors to achieve a range of bit-error-rate (BER) requirements. Speci cally, we consider four di erent linear receivers, namely, the matched lter receiver, the approximate decorrelator 7], the exact decorrelator 8] and the minimum mean-squared-error detector 9]. We rst develop a uni ed architecture describes all the above linear detectors. Then, based on this uni ed approach we show how dynamic recon gurability amongst these four receiver structures can be achieved via a software radio implementation. Our implementation is based on partitioning the functionality of the architecture between two core technologies (FPGA and DSP) according to processing speed requirements.
The paper is organized as follows. In section 2, we present a uni ed framework that describes all the above linear detectors as equivalent single-user receivers. We then present the software radio architecture for linear multiuser detection in section 3. In section 4 we present the experimental results on the performance of the software radio for linear multiuser detection, and conclude in section 5.
2 A Uni ed Architecture for Linear Multiuser Detection and Dynamic Recon gurability
To overcome the complexity issue of optimum multiuser detection, several suboptimal schemes have been proposed (see 1, 2, 3] ). These suboptimal schemes comprise both linear and nonlinear detectors. Our emphasis in this work is on linear multiuser detection schemes since they are relatively easier to implement. Speci cally, we focus on the following linear detectors : the matched lter (MF) detector (which can be thought of as a degenerate multiuser receiver), the decorrelator (DC) 8], the approximate decorrelator (AD) 7] and the minimum mean-squared error (MMSE) detector 9], and develop a uni ed framework for describing these linear receivers. For simplicity in illustration, we will consider the case of synchronous reception of K users in a system. In this case, the output of the matched lter for the k th user can be written as
A j b j jk + n k ; k = 1; :K (1) where A k , and b k 2 f?1; 1g, are the amplitude and the bit of the k th user respectively, and n k = R T 0 n(t)s k (t)dt, with s k (t) being the signature waveform of the k th user which is assumed to have unit energy. The crosscorrelation between the signature waveforms is de ned as
The matched lter outputs for the users in the system can be expressed in vector form as y = RAb + n; h k (t) = s k (t). Thus it is seen that any of the linear multiuser detectors described above can be realized by appropriately choosing the lter taps of the \modi ed" matched lter h k (t). This forms the basis of the software radio architecture we develop in this paper for linear multiuser detection.
Further, under some assumptions 1 on the signature sequences used in the system, it can be shown that these linear multiuser detectors yield a variable QoS that covers a wide range of values 2 . Speci cally, when the measure of quality is the BER achieved by a user, the relative performance of these receivers may be classi ed as 3 BER MF BER AD BER DC BER MMSE : (12) Thus recon guring the detectors among the matched lter and the structures in (9), (10), and (11) allows the option of a variable QoS from moderate (for a matched lter) to very high (for the MMSE detector). As an example, a user may switch to a matched lter con guration when using voice tra c while the MMSE mode may be preferred for data tra c. Further, we can also account for di erent data rates by deriving appropriate singleuser linear lters for the multirate schemes outlined in 12, 13].
3 Software Radio Architecture for Linear Multiuser Detection Figure 1 shows the canonical software radio architecture for linear multiuser detection. The architecture includes channel processing (such as translation from IF to baseband), environment processing (e.g., estimation of signal and interference parameters and correlations), matched ltering and information bit-stream processing (e.g., FEC or convolutional decoding, soft decisions etc.). These functionalities are partitioned into two core technologies based on processing speed requirements. These two technologies are based on eld 1 If the crosscorrelations among all signature sequences are very low, then all the detectors achieve similar performance 2 The MMSE detector achieves the same performance (or worse) as the decorrelator at very high signalto-noise ratios, and is better at lower signal-to-noise ratios 10] 3 In the absence of perfect power control and under the special case of two users, there are counterexamples to the general relationship regarding the relative performance of linear multiuser receivers (see 11] for details and also other similar conjectures) programmable gate arrays (FPGA) and digital signal processor (DSP) devices. The key idea behind the software architecture is to dynamically recon gure the matched lter according to the desired QoS (corresponding to the one of the appropriate linear detectors). In the following sections of the paper, we present a detailed description of the software radio architecture that we use to allow variable QoS for users in a CDMA system. Speci cally, we focus on the partitioning of functionality and the design of building blocks that allow this architecture to dynamically recon gure between the various linear detection schemes. 
Logical Partitioning of the Architecture
We rst begin by describing the logical partitioning of the functionality required by a given user under the recon gurable architecture for linear multiuser detection. In Figure 2 , we show the necessary operations required by each type of linear detector among the MF, AD, DC and MMSE receivers. Even though the di erent types of linear receivers are shown as separate entities in Figure 2 , essentially each user's radio could possibly have only one variable lter-tap receiver implemented using Xilinx FPGAs 4 . All classes of the above receivers require two common generic operations namely, estimation of path delays of the users and the generation of PN sequences fs k (t)g of the users in the system. The MF is probably the simplest in that for any user k, it just uses the information from these two generic operations in determining the timing o set of the PN sequence s k (t) for the speci c user. For the AD, the complexity is slightly higher than that of the MF in that the \modi ed" matched lter taps are adjusted according to the formulation in equation (10) . As seen in Figure 2 , the additional functionality required here is the crosscorrelation values f kj g and also the signature of sequences of all the users fs k (t)g. From (9), we see that the functional operations required for the DC are additionally the computation of the inverse matrix of crosscorrelations R ?1 and also the column vector corresponding to the k th user, i.e., R + kj . The MMSE receiver (see equation (11)) incurs the additional complexity over the DC in that it also requires estimates of the received signal powers of the users in the system. In this paper, we do not explicitly address the issue of estimating the path delays or the received signal powers of the users in the system. We assume that the software radio architecture proposed here does have the functionality required to do both the estimation operations similar to that used in conventional radio designs. However, the software radio architecture presented here is versatile in that it can easily allow a variety of signal processing algorithms to be implemented for accomplishing the required estimation. 
Practical Implementation
In this section, we discuss the practical implementation aspects of the software radio architecture for linear multiuser detection. Speci cally, we focus on the testbed for linear multiuser detection followed by the partitioning of resources between FPGA and DSP devices, and nally, the details on the implementation of the correlation elements constituting the recon gurable radio receiver.
Testbed
The core of the hardware part of the testbed (see Figure 3) Figure 1 . The channelizer performs frequency down-conversion, low-pass ltering and decimation of the sampled baseband signal. It is used for selection of the service bandwidth (i.e., the tuning band) from among those available in the sampled signal. The Multiband Digital Receiver used as a channelizer has two narrowband receivers with dynamic range of 1KHz ? 1MHz and one wideband receiver with dynamic range of 2MHz ? 35MHz. Thus it is capable of supporting a wide range of output sample rates.
A SUN Workstation accesses the VMEbus through a Bit3 SUN-Sbus to VMEbus adapter and is used as a primary data stream source as well as development host. Development tools are centered around Signal Processing Worksystem. SPW 20] is a computer-aided design (CAD) tool that allows for the simulation and design of the complex communication systems based on block diagrams. It has a rich library of common communication The part of the design that is targeted for DSP implementation is prepared by SPW's Code Generation System (CGS) (or MultiProx in case of partitioning into multiple DSPs 22] ) and is directly downloadable into the Quad TMS320C40 oating-point DSP board.
SPW's Hardware Design System (HDS) is used to model the behavior of a xed-point part of the design. Again SPW's Signal Flow Simulator is used to verify the xed-point model functionality. After the design is veri ed the corresponding Hardware Description Language (HDL) code is automatically generated. This code is then synthesized (and/or simulated by the event driven HDL simulator like Synopsys VSS 23]) by appropriate set of tools (e.g., Synopsys Design Compiler 24]). The resulting design is further processed by the XILINX XACT tool 25] in order to generate the FPGA chip layout and routing 5 , there by producing the con guration bit-stream 25]. This con guration bit-stream 6 de nes 5 At this stage the hardware portion of the design can to be further re-partitioned into multiple FPGA components if it does not t into a single FPGA 6 This is not to be confused with the information bit-stream the combinatorial circuitry, ip-ops, interconnect structure, and the I/O bu ers inside a particular FPGA device. APTIX tools 14] are used to interconnect the FPGAs, connect FPGAs and DSPs through parallel I/O board and for routing of debugging signals to the control probes of the logic analyzer.
Resource Partitioning of the Architecture
As mentioned earlier, the software radio architecture for linear multiuser detection is partitioned into two core technologies, namely eld programmable gate arrays (FPGA) and digital signal processor (DSP) devices. This partitioning is usually driven by the required functionality of the radio device and also the processing speed requirements. The algorithmic complexity of the linear multiuser receivers increases with increase in performance. Speci cally, the complexity of the signal processing algorithms corresponding to the MF, AD, DC and MMSE receivers can be classi ed as C MF < C AD < C DC < C MMSE ; (13) where C receiver denotes the complexity of the particular receiver. In general, for a system with K users, processing gain N and oversampling factor O s , the oating point complexity in terms of the number of multiply-accumulate (MAC) operations is given for each of the above receivers as (see also 10] for similar complexity trade-o s).
C 3 (K) 3 + C(amp); where C(amp) denotes the complexity due to amplitude estimation that is incurred in the MMSE receiver (which is not explicitly considered in this paper). We show in Figure 4 (a), the number of MAC operations for each receiver as a function of the number of users. Alternately, we can evaluate the number of users that can be supported as a function of achievable information data rates (for the di erent receiver structures) where the active constraint is the limitations in the complexity or processing speeds of the DSP device.
In Figure 4 (b), a simple illustration is shown for a 50 MHz oating point TMS320C40 DSP. Thus we see that, an all DSP implementation imposes serious constraints on the achievable data rates when the number of users increases especially, when we operate with more complex receivers. In this work, the software radio architecture for recon gurable linear multiuser detection is implemented by partitioning the resources between FPGAs and DSPs as shown in Figure 5 . The FPGA segment of the architecture includes the PN sequence generators, two recon gurable blocks (that determine the lter taps of the appropriate linear receivers), an on-line estimator module and the correlator. The motivation for the particular selection of constituents that comprise the FPGA segment is that the functionality provided by each constituent here can be easily handled by the processing speeds of the FPGA hardware. The two recon gurable blocks contain the core of the sample level processing for each of the multiuser detectors and are actually implemented in separate FPGA components to facilitate on-the-y recon gurability 7 . This particular partitioning enables us to recon g-ure one of the blocks while the other is running and therefore, by oversizing the hardware, avoid loss of data during switchover to di erent receiver structures. The rest of the sample level processing logic (PN sequence generators, control multiplexer, and correlator) is implemented in a separate FPGA component since it does not require reprogrammability. The on-line estimation block is used to perform timing estimates of the incoming signals which are then fed along with the reference PN sequences to compute the appropriate lter taps. Further, there is also a provision for re ning the on-line estimates by using more sophisticated o -line algorithms. Additionally, an FPGA component is used as an interface between the DSP segment and APTIX board 14]. The more algorithmically complex operations required for the linear multiuser receivers is partitioned into the DSP segment of the software radio architecture shown in Figure  5 . This segment is implemented using a a quad TMS320C40 DSP board. The operations performed in this segment include o -line estimation procedures, information bit-stream processing as well as control and recon guration management. The o -line estimation block includes estimation of powers of the received signals (for the MMSE detector), and computation of the inverse of the matrix of crosscorrelations (in the case of both the decorrelator and the MMSE detector). Further, it can also re ne estimates of the on-line estimation block in the FPGA segment. The information bit-stream processing functions performed vary from error control techniques, such as FEC or convolutional decoding, to soft decision decoding. The control and recon guration management block in the DSP segment of the architecture basically determines the variable QoS that can be achieved by dynamic reconguration of the receiver structures in the FPGA segment. The recon guration of the blocks in the FPGA segment is directly controlled by the control and recon guration management block. The subsequent switching amongst receivers is achieved by a control multiplexer M (shown in Figure 5 ) that is also controlled by the control and recon guration management block. Thus the DSP segment of the device acts to achieve the variable QoS requirements of the speci c type of service. Implicit in the control and recon guration management block is also the capability to interact with higher layer protocols/stacks to facilitate the QoS demand of a speci c data stream. At the link layer, the actions of the DSP segment of the device also encompass environment processing such as sensing interference levels. The reader should note that the DSP devices have access to the con guration bit-streams for each of the receiver structures and can download the particular con guration bit-stream as and when required.
While we do not give the detailed schematics of each block in Figure 5 , we present the schematic for the correlator block in Figure 6 . This block is used in all four receiver implementations and is based on a binary multiplier as opposed to the traditional implementation with the sign bit inverter. This is due to the fact that the \modi ed" matched lters presented in equations (9), (10) and (11) result in lter taps (input Ref PN in Figure  6 ) that are not necessarily binary (except in the case of the conventional matched lter receiver).
In Table 1 , we show the relative hardware complexity of the di erent linear multiuser detection schemes in terms of the required number of con gurable logic blocks (CLB). A CLB, in case of XILINX 4000 series FPGA, comprises of a pair of ip-ops and two independent (Boolean) logical four-input function generators 25]. As can be seen the complexity for each detector increases with both increasing number of users K and increasing precision in quantization (Q r denotes the number of bits used in quantization). As expected, the matched lter complexity remains invariant to the number of users in the system and de- pends only on the precision of quantization. Even though the decorrelator (and the MMSE) achieves better QoS, it comes at the expense of an exponential increase in complexity (with increasing number of users) over the AD detector. This directly maps to an increase in processing power requirements of the FPGA segment of the software radio architecture. This again motivates switching to lower order receivers when the QoS requirements are moderate. The complexity of operations required to be performed in the DSP segment of the architecture involve matrix inversion and o -line estimation of amplitudes, etc. 
Experimental Results and Discussion
We begin the presentation of the experimental results by considering rst, oating point implementations of the linear multiuser receivers. In all our experiments, the transmitter powers of the users is controlled perfectly so that they are all received at the same received power level. Further, the oating point results presented here agree very well with the analytical results on the performance of these receivers from 1, 7, 8, 9] .
To show the exibility of the software radio architecture in providing variable QoS measured as the bit-error performance (see equation 12), we consider two sets of experiments. In the rst set of experiments, we x the number of users in the system at K = 15. We then show the experimental bit-error-rate achievable for SNRs varying from 0 dB to 10 dB for each of the detectors, namely MF, AD, DC and MMSE for a system where the processing gain is N = 64 and N = 128. In Figure 7 , it is seen that at low SNRs the range of QoS achievable is quite limited. However, at higher SNRs, the DC and MMSE provide a BER gain of upto three orders of magnitude compared to either the MF or the AD. As expected, the AD is again slightly better than the MF for the set of operating points considered here. For the case when N = 128, it is seen that the dynamic range of QoS achievable is greater among the detectors at high SNRs. Speci cally, the AD provides upto an order of magnitude better performance than the MF.
A variation of the above experiments is presented in Figure 8 , where we show the variations in QoS achieved for di erent number of users in the system for a xed SNR of 10 dB. This shows that when the number of users in the system increases, the users can still operate at the same SNR, but can switch to a higher complexity detector to maintain the same QoS. It is seen that the dynamic range of switching from the MF to the AD is upto a load factor (de ned as the ratio K N ) of 1=3. This again conforms to the theoretical prediction on the load factor given in 7] . As an example, it is seen that when N = 64, if users in a system desire a BER of 10 ?3 , they can operate on a MF receiver upto 5 users, and on an AD receiver upto 10 users and the decorrelator receiver upto 30 users without changing their received power levels. Additionally, if some of the users require more stringent BERs such as 10 ?5 , they could operate entirely on a DC realization of the \modi ed" matched lter. Further, when the processing gain is N = 128, it is seen that the there is a greater range in achievable capacity (that is number of users that can be supported) by switching to the DC and MMSE receivers.
While the results presented uptil now have been based on using oating point arithmetic, we now focus on the e ects of quantization in the performance of the above linear detectors. We consider xed point arithmetic operations for the DSP processors in the software radio architecture and evaluate the average probability of error performance of all the detectors, namely the MF, AD and DC. Speci cally, we consider xed point operations using 2 ? bit, 3 ?bit, and 4 ?bit quantization. The experimental results for the matched lter are shown in Figure 9 , for the approximate decorrelator in Figure 10 , and the decorrelator in Figure  11 . The performance results using xed-point implementations are also compared with the oating point results. It is seen that all the detectors experience a degradation in performance, with the matched lter being the least sensitive and the decorrelator being the most sensitive to quantization e ects. However, the encouraging note is that the use of even a 6 ? bit precision quantizer seems to pull performance close to that of the oating point reference results.
Another interesting impact of such xed point operations in realizing detectors is the e ect on the \near-far" resistance of these detectors. In Figure 12 , we consider the average bit-error rate achieved by the MF, AD and DC receivers for the case of 4?bit quantization as a function of interference powers. In this experiment, the processing gain is taken to be N = 64, and the number of users is K = 5. The desired user's power is xed, while the interfering powers are increased. As expected, the MF and AD are not near-far resistant, and show degradations in performance as the interfering powers increase. Further, the AD shows a more graceful degradation in performance relative to the matched lter. The DC, which is theoretically near-far resistant 8], however fails to maintain this property in the presence of quantization e ects. However, the BER performance is still superior to the other two detectors.
Conclusion
In this paper, we have presented a recon gurable software radio architecture for linear multiuser detection in CDMA systems. The recon gurability of linear multiuser receivers allows for the integration of multimedia services over wireless channels with variable quality of service (QoS) requirements. While radio resource management algorithms can provide certain levels of variability in QoS guarantees, we have shown that recon gurable radio architectures can also provide diverse QoS guarantees ranging in several orders of magnitudes in terms of BER requirements. We have showed the feasibility of achieving this dynamic recon guration via a software radio implementation of linear multiuser receivers. We rst presented a uni ed framework for achieving this recon guration, and then partitioned functionality into two core technologies ( eld programmable gate arrays (FPGA) and digital signal processor (DSP) devices) based on processing speed requirements. Our experimental results on the performance and recon gurability of the software radio architecture show that diverse services such as voice and data tra c can be simultaneously supported using such a software radio architecture for linear multiuser detection. The impact of xed point arithmetic that often arises due to hardware constraints, shows that the degradation in receiver performance can be compensated by using as low as 4 ? bit quantization even for complex multiuser detection algorithms. Future work includes studying the e ect on-line estimation algorithms and also the impact of processing delays on QoS guarantees. 
