Abstract-Contention-based multiple access is a crucial component of many wireless systems. It is known that using interference cancellation techniques to receive and decode multiple packets that arrive simultaneously can improve the efficiency of multiple access. However, such multi-packet reception (MPR) schemes proposed in the literature require complex receivers capable of performing advanced signal processing over significant amounts of soft undecodable information received over multiple contention steps. In this paper, we show that local channel knowledge and elementary received signal strength measurements, which are made by many receivers today, can actively facilitate multi-packet reception and even simplify the interference canceling receiver's design. We introduce a simple algorithm called Turbo-Dual Power Multiple Access (Turbo-DPMA) that uses local channel knowledge to limit the receive power levels to two discrete values that are carefully chosen to facilitate successive interference cancellation. As we shall see, limiting the receive power in such a manner not only facilitates the simultaneous reception of up to two packets, but it also enables the receiver to derive additional useful information about the contending users from its received signal strength indicator. The resulting receiver structure is markedly simpler, as it needs to process only the immediate received signal, without having to store and process signals received previously. Even more remarkably, the Turbo-DPMA is stable for packet arrival rates as high as 0.793 packets/slot, which is significantly better than all the contention algorithms known to date.
I. INTRODUCTION
Multiple access (MA) of nodes contending for a shared medium such as a wireless channel is a fundamental problem in wireless communications [1] , [2] . The first, and best known, contention-based algorithm is the ALOHA protocol, in which the nodes transmit packets independently. In ALOHA, the transmission is successful if no packet collisions occur, i.e., if only one packet is received by the destination at any time.
Multiple access algorithms that use Multiple Packet Reception (MPR), in which multiple packets -from single or multiple transmission attempts -are received and successfully separated by the receiver, are provably more efficient than ALOHA [3] - [5] . However, MPR often requires receivers that are capable of advanced signal processing. For example, by means of a polynomial phase-modulating sequence, the cyclostationarity of different received packets was used to color-code packets from multiple transmissions [4] . Signal † A. F. Molisch is also at Lund University, Sweden. separation was achieved in [6] using a rotational invariance technique. In Network-assisted Diversity Multiple Access (NDMA) [7] , when k packets collide in a time slot, the network makes the transmitters to retransmit another k − 1 times. So long as the channel changes sufficiently from one slot to another, these k consecutive transmissions allow the receiver to invert the channel matrix and recover all k collided packets. However, such channel variation can be difficult to ensure in low Doppler regimes. As can be seen, these algorithms also require receivers that can store and process significant amounts of soft information about signals received over multiple transmissions.
A more direct MPR approach uses successive interference cancellation (SIC) to improve the throughput of multiple access [8] . In SIC, a successfully decoded packet is remodulated and removed from the received signal in order to better decode other packets that are received over the same channel at the same time [9] . For example, the SIC Tree Algorithm (SICTA) [8] stores soft information about the undecodable received signal whenever the receiver detects the presence of a message but cannot decode it successfully. This soft information is combined with subsequent received signals to improve the chances of decoding all the signals received thus far. When the receiver does eventually decode a packet, it subtracts its contribution from all previously stored received signals, and thereafter attempts to again decode them. The SICTA protocol is stable for arrival rates up to 0.693 packets/slot. This is substantially better than the First-come-firstserve (FCFS) binary tree algorithm, which becomes unstable when the packet arrival rate exceeds 0.487 packets/slot [10] , [11] . However, like all other MPR schemes, SICTA requires the receiver to store soft information of the received signal of all previously undecodable messages. This also implies that decoding successively the possibly many packets that have collided over time can lead to long delays.
Another important consideration is the feedback message size. While 2-bit "idle (0)", "success (1)", and "collision (e)" messages are fed back in most protocols, the set of messages in SICTA includes "0", "e", and, in addition, the number of packets that were finally resolved in the previous time slot. This number can be arbitrary large, and requires allocation of more bits for feedback signaling.
In this paper, we propose a new and simple multiple access paradigm that uses local channel state information (CSI) at the transmitter to control the power received at the destination from each node (or, equivalently, the node's transmit power) so as to actively facilitate MPR. This local CSI can be easily obtained using channel reciprocity in time division duplex systems [2] , and has been exploited in other multiple access schemes [12] - [14] . While the receiver still uses SIC, a key advantage is that it does not need to store signals from previous transmissions, which significantly reduces its memory and processing requirements. Instead, the receiver effectively utilizes elementary information about the total received signal strength/power (RSSI) -a capability that is present in many commercial receivers already [15] , [16] . As we show, not only is this paradigm more efficient than the best multiple access schemes known to date, but its receiver is also significantly simpler than the advanced ones required by other MPR algorithms.
In particular, we propose the Turbo Dual Power Multiple Access (Turbo-DPMA) algorithm in which the nodes transmit such that their receive power takes on one of two power levels. The key lies in setting the two power levels carefully so as to enable MPR using SIC at the receiver. As mentioned, DPMA does not require the receiver to store soft information of any undecodable signals over time -MPR is achieved simply by the use of successive interference cancellation of packets received in the same time slot. Using four possible feedback messages, Turbo-DPMA is stable for arrival rates up to 0.793 packets/slot. This is better than all algorithms proposed in the literature to date.
As mentioned, the use of local CSI to improve multiple access has been looked into previously. For example, in channel-aware ALOHA [12] , each user transmits only if its channel gain exceeds a system-determined threshold. The Opportunistic ALOHA (O-ALOHA) protocol [13] sets the probability of transmission as a function of local channel knowledge. In [14] , the time required for identifying the user with the highest priority through multiple access was substantially reduced by ensuring that the receive power levels were discrete. However, a key difference is that all the above algorithms assume single packet reception, in which at most one packet is successfully decoded at any time and no packet is decoded when multiple nodes transmit simultaneously. To the best of our knowledge, DPMA is the first algorithm to use local CSI and RSSI to actively facilitate MPR and simplify receiver design.
The remainder of the paper is organized as follows. The system model is described in Sec. II. The Turbo-DPMA algorithm and its analysis are presented in Sec. III. Section IV describes simulations results. Our conclusions follow in Sec. V.
II. SYSTEM MODEL
We consider a wireless network consisting of a number of packet-generating nodes that need to transmit packets to a message sink. The packets of each node are assumed to arrive at unique times. The packets are transmitted from the nodes in a time-slotted manner; it is assumed that all packets have the same size. Without loss of generality (wlog), the duration of a slot is set to unity. The channel power gain between transmitting node i and the message sink is denoted by h i , and is assumed to be known at the transmitter. This assumption is similar to the one made in channel-aware ALOHA [13] , [17] . To facilitate analysis, we assume a Poisson packet arrival process with a mean arrival rate (over all users) of λ. We also make the standard assumption that each new packet is generated at a unique node [8] , [10] , [11] .
Let P i denote the power received from node i. (We shall henceforth call it 'receive power'). The sink can decode the packet from node i successfully if its received signal to interference and noise ratio (SINR) exceeds a threshold:
where σ 2 is the noise power andγ ≥ 1 is a threshold that depends on the modulation and coding used for the packet transmission [18] . Thus, a packet can be decoded successfully even when two or more users transmit simultaneously.
Consider now the specific case where every node i, which has local CSI, adjusts its transmit power so that its receive power, P i , is either q 0 or q 1 (wlog, let q 1 > q 0 ). When two nodes each transmit a packet, one with receive power q 0 and another with q 1 , both packets can be decoded successfully using SIC if q 1 q 0 + σ 2 ≥γ and
A checksum field in the packet enables the receiver to determine whether a packet has been decoded successfully.
The power level settings in (2) can be generalized to handle simultaneous transmissions by more than two users. Note that no packet can be decoded successfully if more than one user's receive power is q 1 . However, if only one user's receive power is q 1 , and if the power levels are set as follows:
then the packet with receive power q 1 can be decoded so long as there are at most a users with receive power at q 0 . The parameter a is called the adversary order. The assignment of receive power levels to nodes is determined as per the Turbo-DPMA algorithm, which is developed in the next section.
A. Controlling Receive Power Levels
It is the local channel gain that enables the transmitting node to control the receive power level. Each node can easily acquired by making the message sink broadcast a (predefined) pilot sequence. Each node then locally computes its channel gain to the sink. For a target receive power P and an estimated channel gain h, a node transmits its message at power P/h. This technique is analogous to the power control that is ubiquitous in second-and third-generation CDMA-based cellular systems. The mechanisms for enforcing discrete receive powers are the same in our case, though the motivation is subtly different. In power-controlled secondgeneration CDMA, it is essential that the arrival powers from all users are identical. In third-generation systems, several discrete receive power levels are foreseen (related to the fact that users with higher data rates need higher power). In our system, the different power levels are used for data sent at the same transmission rate.
B. Using Two Discrete Receive Power Levels
The discussion above in (2) and (3) used two power levels. Two such levels can be easily accommodated by receivers of existing systems. For example, if the minimum SINR ratio for successful decoding isγ = 10 dB, it follows from (1) that the transmitter and receiver dynamic range should be at least 10 dB if the packet of higher received power is to be received successfully. In existing systems, the mobile station transmit power dynamic range is 35 dB in GSM systems [19] and 74 dB in third generation WCDMA systems [15] . After accounting for variation in receive signal strength due to fading and nearfar problem, one can reasonably assume that the receiver has about 20 dB of dynamic range.
In this paper, we develop the algorithm for two discrete receive power levels. As we shall see, this itself results in substantial performance improvement. While the proposed scheme can also be generalized to handle more power levels to deliver even better performance, this comes at the expense of a larger dynamic range requirement and a greater feedback overhead.
C. Relevant SIC Receiver Properties
For the case of the two receive power levels specified in (3), an SIC receiver that processes only the signal received in the current time slot exhibits the following properties:
• If only two packets are received, one with power q 0 and the other with power q 1 , then both can be decoded.
• If only one packet is received with power q 0 , then it can be decoded.
• If one packet is received with power q 1 , then it can be decoded so long as no other packet is received with power q 1 and the number of packets with receive power q 0 does not exceed the adversary order a.
• Otherwise, none of the received packets can be decoded.
D. Exploiting Received Signal Strength Information (RSSI)
The total receive power, specified by the received signal strength information (RSSI) at the receiver, is the summation of the receive power of each received packet in a time slot. Since the receive power of each packet takes only two values, q 0 and q 1 , the receiver can extract useful side information from RSSI regarding the number of packets received at each of the two power levels. We will use this side information in the development of the Turbo-DPMA algorithm.
We also define a quantity called the the Residual Receive Power (RRP), which can be derived from the RSSI after the receiver successively performs SIC. RRP is defined as the power of the received signal that remains after all decodable messages have been canceled from it. For example, if the receiver gets two packets, one at power q 1 and the other at power q 0 , the RRP is on the order of the noise power, σ 2 , as both packets will be successively decoded and canceled from the received signal. Consider another case in which the receiver gets three packets, one at power q 1 and two at power q 0 , for a ≥ 2. Then, it decodes the packet at q 1 successfully, and it fails to decode the remaining two packets at q 0 . Therefore, the RRP is now 2q 0 + σ 2 . Finally, when no packet is received, the RRP is on the order of σ 2 .
III. TURBO-DPMA

A. Feedback Messages of Turbo-DPMA
As mentioned, the possible values of RRP shed useful light on the contention process. The following four scenarios provide a complete characterization of all the possible RRP values, and the information that can be derived from them and fed back by the receiver 1 :
This implies that all transmitted packets, if any, have been resolved. The receiver therefore feeds back a Resolved-All (RA) message.
This implies that a packet with a receive power of q 1 , if present, was decoded successfully, and at least two packets had a receive power of q 0 and could not be decoded. The receiver therefore feeds back a Resolved-High (RH) message.
3) RRP ∈ {mq 1 + σ 2 : m ≥ 2, m ∈ Z}: This implies that no packet is received at/near q 0 , and the receiver cannot resolve the packets received at power q 1 . 2 The receiver therefore feeds back a Resolved-Low (RL) message. 4) RRP > q 1 + σ 2 and RRP / ∈ {mq 1 + σ 2 : m ∈ Z}: This implies that at least one message was received with power q 1 and the receiver could not decode any of the messages. The receiver therefore feeds back a Resolved-None (RN) message. 3 
B. Feedback Overhead
Two bits of feedback are required by the algorithm to send the above four messages. Existing multiple access protocols that use 0/1/e feedback also require the same number of bits.
C. Queuing, Gating and Contention Resolution Interval
When a new packet arrives, the system may be in the process of resolving the contention due to previously transmitted packets. In this case, the new packet is stored in its local queue with its arrival time stamp, and it awaits the completion of the current contention process. Consider the time slot in which the system clears the (k − 1)-th contention. The k-th contention resolution interval (CRI) begins at this time. Let b k denote the number of time slots with unresolved packets at this time.
The system uses a time-limited gated access strategy [11] , which allows packets in a maximum interval of t 0 time slots to participate in the k-th CRI. That is, if b k is smaller than t 0 , then all unresolved packets (in the queue) participate in the k-th CRI. Otherwise, only the packets with time stamps in the first t 0 time slots participate in the k-th CRI. The other packets remain in the queue until a future CRI. Adopting the terminology of the part-and-try algorithm [11] , we refer to t 0 as the gating interval. Such a gating mechanism is wellsuited for a multiple packet reception protocol such as ours; the parameter t 0 will play an important role in optimizing the protocol's performance.
D. Formal Definition of Turbo-DPMA Algorithm
We first provide a formal definition of the Turbo-DPMA algorithm and then explain the reasoning behind it. An example is also provided to illustrate its various possible steps.
To specify the algorithm, we first define the following terminology. Let X = [x min , x max ) denote a contiguous time interval. Let U be a stack of unresolved contiguous time intervals. The operation U.push(X) pushes the interval X into the stack. The operation U.pop returns the interval that last entered in the stack, and also eliminates it from the stack. We define the functions H(X) and L(X) to split the interval X into two equal-sized 'higher' and 'lower' intervals, respectively, as follows:
Let τ denote the current time slot number, and d denote the latest time stamp that was included in a CRI. At system initialization, we set τ = 1 and d = 0, so that the packets with arrival time stamps in [0, 1) have not entered any CRI.
At the beginning of each CRI, the algorithm computes the number of back-logged time slots b = τ − d. As per the gating mechanism, the algorithm sets U = {[d, d + min(b, t 0 ))}, so that all packets that arrived within a interval, over a duration of at most t 0 slots, participate in the CRI. Thereafter, we update d to d + min(b, t 0 ). At each time step of the CRI, all the transmitting nodes and the receiver (sink) implement the Turbo-DPMA algorithm as follows. (Which part of the algorithm is implemented by whom is clear from context.)
• Transmission rule: Let W = U.pop. Every node with a packet arrival time stamp in the interval H(W ) transmits so that its receive power is q 1 , and every node with a packet arrival time stamp in L(W ) transmits so that its receive power is q 0 .
• Feedback generation: The receiver determines its feedback as per Sec. III-A, and broadcasts it to all nodes.
• Response to feedback: 1) If feedback = RA and W 6 = ∅, then continue.
2) If W = ∅ and feedback = RA, then terminate current CRI.
push(H(W )).
• At the end of a CRI: The current time τ is updated to be the next time slot (which is also the slot in which the next CRI begins).
E. Explanation
Turbo-DPMA is basically a splitting algorithm. Once packets collide in a slot, the algorithm splits the arrival time space in half, and makes nodes that lie in the two halves of the space to transmit and resolve each other in different time slots. Specifically,
• A feedback of RA implies that every packet that was transmitted has been successfully resolved. Therefore, no packets remain in the interval W being handled in the current slot. Hence, the algorithm proceeds to resolve packets in the arrival time intervals that remain in the stack. If the stack is empty, then all packets in the current CRI have been resolved, and the next CRI begins.
• A feedback of RH implies that at least two packets were received at power q 0 (and all the packets that arrived H(W ) have been resolved). Hence, in the next slot, the nodes with packets in L(W ), transmit with receive powers of either q 0 or q 1 as per the Transmission Rule.
• A feedback of RL implies that at least two packets were received at power q 1 , and none at q 0 (which means that no more unresolved packets remain in L(W )). Hence, in the next slot, two receive power levels will be assigned to packets that are currently received at power q 1 .
• Finally, a feedback of RN implies that packets were received at both powers q 0 and q 1 and none were resolved. Pushing L(W ) and then H(W ) leads to the packets in L(W ) being resolved after and separately from the packets in H(W ).
F. Illustrative Example
We now demonstrate how the algorithm works by means of an example, the parameters of which are artificially chosen to exercise the many scenarios defined in the algorithm. We consider a specific scenario consisting of 5 nodes contending in a CRI, and an adversary order a = 1. Wlog, assume that their time stamps initially lie between 0 and 1. Say, the arrival time stamps of these nodes, labeled A, B, C, D, and E, are 0.2, 0.3, 0.4, 0.55, and 0.6 respectively.
In the first slot, packets from nodes with time stamps that lie in the range [0, 0.5), namely, A, B and C, arrive with receive power q 1 . And, packets from remaining nodes whose time stamps lie in [0.5, 1), namely, D and E, arrive with receive power q 0 . This results in an RRP of 3q 1 + 2q 0 + σ 2 , which is larger than q 1 . Thus, the receiver feeds back the ResolvedNone (RN) message to all nodes.
In slot 2, only the high power nodes of slot 1 transmit. Now A has a receive power of q 1 (its time stamp lies in [0, 0.25)), and B and C have a receive power of q 0 (their time stamps lie in [0.25, 0.5)). Since a = 1, A cannot be decoded successfully in this slot, and the receiver feeds back RN again.
In slot 3, only one node -the high power node A of slot 2 -transmits as only its time stamp lies in [0.125, 0.25). It is received at power q 0 . The receiver can now decode A's packet successfully, the RRP is less than q 0 , and the receiver feeds back Resolved-All (RA). In slot 4, both the low power nodes of slot 2, B and C, end up getting resolved simultaneously as they are received at powers q 1 and q 0 , respectively. (Their time stamps lie in [0.25, 0.375) and [0.375, 0.5), respectively). The RRP is again less than q 0 , and another RA is fed back.
In slot 5, the low power nodes of slot 1 (D and E) transmit such that their receive power is q 1 (time stamps lie in [0.5, 0.625)), and no packet gets decoded. As the RRP does not have any q 0 component, the receiver feeds back a ResolvedLow (RL) message. (This also implies that that the remaining nodes are in H(W ).) Finally, in slot 6, D and E transmit and both their packets are decoded successfully, and RA is fed back. This also empties the stack, which terminates this CRI. A new CRI commences in the next slot.
G. Analysis
In this section we briefly outline the throughput analysis and give the final results; a more detailed derivation is in [20] .
We first consider the expected number of slots, L n , required to resolve a simultaneous transmission by n nodes. Clearly, when only zero or one packet is received in a slot, it takes exactly one slot to resolve the packet. Thus, L 0 = L 1 = 1. When two packets are received in a slot, the system needs one slot for transmitting with the current power level, and possibly (depending on the RRP) additional slots to resolve collisions. Taking into account all possible receive power combinations, it can be shown that
which, when solved, gives L 2 = 2. Similarly, if n ≥ 3 packets are transmitted, the duration for packet resolution can be shown to be
For a Poisson packet arrival process with mean arrival rate λ, and a time interval of t slots is included in a CRI, the expected number of slots required to resolve a CRI is
The following Lemma characterizes the stability region of the multiple access protocol.
Lemma 1:
The necessary and sufficient condition for stability is Proof: Let the backlog b k be defined as the number of slots with unresolved packets in the system at the beginning of the k-th CRI. It is clear that b k is a Markov process as b k depends on only the state at b k−1 . Due to the time-limited gated access design, all packets in the interval b k enter CRI when b k < t 0 ; otherwise, only the packets in an interval of t 0 enter the CRI. Hence, the expected number of back-logged slots in the next CRI is
Proving stability is equivalent to showing that b k is a supermartingale whenever b k ≥ t 0 , which leads to (7) . For a given a, we can numerically evaluate the stability region of the DPMA algorithm in terms of t 0 and λ. Figure 1 shows the maximum arrival rate as a function of t 0 for different values of adversary order. As a increases, we see that the maximum stable value of λ also increases as expected, from 0.743 when 1 < a < 2, to 0.782 when 2 < a < 3, 0.791 when 3 < a < 4, and 0.793 when 4 < a < 5. This increase in maximum stable arrival rate is expected since higher a values implies larger gap between the two power levels, and thus more packets need to be received at q 0 before the SINR condition for the packet received at q 1 is violated. Finally, the optimum gating interval, t 0 , that leads to the highest stable arrival rate, also increases from 2.37 to 2.5 as a increases.
IV. SIMULATIONS
We confirm our analysis using Monte Carlo simulations over 3 × 10 5 consecutive packets. Our simulation uses an infinite nodes assumption, where a new node is introduced for each new packets arriving at the system. The receiver noise is assumed to be −100 dBm, and the decoding threshold γ = 10 dB. Hence, q 0 = −90 dBm. We set a < 5, which means that the receiver dynamic range needs not exceed 17 dB. Fig. 2 shows the average delay of Turbo-DPMA. The simulations use a = 1.3 and a = 4.3 (which sets the value of q 1 ), and the corresponding optimal maximum initial tried interval. As expected, the the delay increases rapidly as the packet arrival rate approaches the maximum value for stability, This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the WCNC 2008 proceedings. which is 0.743 packets/slot for a = 1.3, and 0.793 packets/slot for a = 4.3. This result is significantly better than the best known 0.693 packets/slots using the SICTA protocol [8] which uses a more complicated receiver structure.
Finally, in Fig. 3 , we examine the sensitivity of the Turbo-DPMA algorithm to the gating interval, t 0 , for different arrival rate values, when a = 4.3. Only for arrival rates close to the stability limit, is the average delay sensitive to the t 0 value.
V. CONCLUSIONS
We showed that exploiting local channel knowledge to limit the range of receive power actively facilitates multi-packet reception and also simplifies the receiver design. In particular, we proposed a multiple access algorithm called Turbo-DPMA that employs just two discrete receive power levels, which are suitably chosen to enable immediate successful interference cancellation and reception of up to two packets transmitted simultaneously. Using four feedback messages and by exploiting a simple receive signal strength measurement, Turbo-DPMA achieves a stable throughput of 0.793 packets per slot, which is higher than all previously known contention algorithms. Unlike other MPR-based algorithms, this was achieved without the receiver having to store and process soft information from previous time slots.
Given the fundamental importance of multiple access, the algorithm is widely applicable in wireless networks. The encouraging results motivate future work that involves generalizing the algorithm to handle inaccuracies in channel knowledge and exploiting further the capabilities wider dynamic range receivers that can support more than two power levels.
