Abstract-Contention-based multiple access is a crucial component of many wireless systems. Multiple-packet reception (MPR) schemes that use interference cancellation techniques to receive and decode multiple packets that arrive simultaneously are known to be very efficient. However, the MPR schemes proposed in the literature require complex receivers capable of performing advanced signal processing over significant amounts of soft undecodable information received over multiple contention steps. In this paper, we show that local channel knowledge and elementary received signal strength measurements, which are available to many receivers today, can actively facilitate multipacket reception and even simplify the interference canceling receiver's design. We introduce two variants of a simple algorithm called Dual Power Multiple Access (DPMA) that use local channel knowledge to limit the receive power levels to two values that facilitate successive interference cancellation. The resulting receiver structure is markedly simpler, as it needs to process only the immediate received signal without having to store and process signals received previously. Remarkably, using a set of three feedback messages, the first variant, DPMA-Lite, achieves a stable throughput of 0.6865 packets per slot. Using four possible feedback messages, the second variant, Turbo-DPMA, achieves a stable throughput of 0.793 packets per slot, which is better than all contention algorithms known to date.
separated by the receiver, are provably more efficient than ALOHA [7] [8] [9] [10] . However, MPR often requires receivers that are capable of advanced signal processing. For example, by means of a polynomial phase-modulating sequence, the cyclostationarity of different received packets was used to color-code packets from multiple transmissions [9] . Signal separation was achieved in [11] using a rotational invariance technique. In Network-assisted Diversity Multiple Access (NDMA) [12] , when k packets collide in a time slot, the network forces the transmitters to retransmit another k − 1 times. So long as the channel changes sufficiently from one slot to another, these k consecutive transmissions allow the receiver to invert the channel matrix and recover all k collided packets. However, such channel variation can be difficult to ensure in low Doppler regimes. As can be seen, these algorithms also require receivers that can store and process significant amounts of soft information about signals received over multiple transmissions.
A more direct MPR approach uses successive interference cancellation (SIC) [13] to improve the throughput of multiple access [14] . For example, the SIC Tree Algorithm (SICTA) [14] stores soft information about the undecodable received signal whenever the receiver detects the presence of a message but cannot decode it successfully. This soft information improves the chances of decoding all the signals received thus far. When the receiver does eventually decode a packet, it subtracts its contribution from all previously stored received signals, and thereafter attempts to again decode them. The SICTA protocol is stable for arrival rates up to 0.693 packets/slot. This is substantially better than the First-come-firstserve (FCFS) binary tree algorithm, which becomes unstable when the packet arrival rate exceeds 0.487 packets/slot [4] , [5] . However, like all other MPR schemes, SICTA requires the receiver to store soft information of the received signal of all previously undecodable messages. This also implies that decoding successively the possibly many packets that have collided over time can lead to long delays. Another important consideration is the feedback message size. Most protocols use a set of 2-bit feedback messages: "idle (0)", "success (1)", and "collision (e)" messages. Instead, SICTA's set of feedback messages consists of "0", "e", and, in addition, the number of packets that were finally resolved in the previous time slot. This number can be large, and requires allocation of more bits for feedback signaling.
In this paper, we propose a new and simple multiple access paradigm that uses local channel state information (CSI) at the transmitter to control the power received at the destination from each node (or, equivalently, the node's transmit power) 1536 -1276/09$25.00 c 2009 IEEE so as to actively facilitate MPR. This local CSI can be easily obtained using channel reciprocity in time division duplex systems [2] , and has been exploited in other multiple access schemes [15] [16] [17] . While the receiver still uses SIC, a key advantage of our approach is that it does not need to store signals from previous transmissions, which significantly reduces its memory and processing requirements. Instead, the receiver effectively utilizes elementary information from the total received signal strength (power) indication (RSSI) -a capability that is present in many commercial receivers already [18] , [19] . As we show, not only is this paradigm more efficient than the best multiple access schemes known to date, but its receiver is also significantly simpler.
In particular, we propose a Dual Power Multiple Access (DPMA) algorithm in which the nodes transmit such that their received power takes one of two power levels. The key lies in setting the two power levels carefully so as to enable MPR using SIC at the receiver. As mentioned, DPMA does not require the receiver to store soft information of any of the undecodable signals over time -MPR is achieved simply by the use of successive interference cancellation of packets received at the same time. We introduce two versions of the DPMA algorithm, both of which operate over a 2-bit feedback channel. The first version, called DPMA-Lite, uses three feedback messages, as used in many contention algorithms, and a RSSI-capable receiver. Stability and delay analyses of both the algorithms are developed, and verified using simulations. As we shall see, depending on the dynamic range of the receiver, DPMA-Lite is stable for arrival rates up to 0.686 packets/slot; this is quite close to that of SICTA, which requires a more sophisticated receiver. We also introduce a more aggressive version called Turbo-DPMA that instead uses a set of four feedback messages, and is stable for a arrival rates up to 0.793 packets/slot. This is better than all the algorithms proposed in the literature to date. The proposed scheme works in systems with a single information sink. It has applications in uplink communication in cellular systems and in initial and periodic 'ranging' required for system entry and handover in WiMAX [20] . Another instance of a sink is the cluster head in a cluster of a wireless ad hoc or sensor network.
As mentioned, the use of local CSI to improve multiple access has been looked into previously. For example, in channel-aware ALOHA [15] , each user transmits only if its channel gain exceeds a system-determined threshold. The Opportunistic ALOHA (O-ALOHA) protocol [16] sets the probability of transmission as a function of local channel knowledge. In [17] , the time required for identifying the user with the highest priority through multiple access was substantially reduced by ensuring that the receive power levels were discrete. Splitting algorithms for capture were developed in [21] . However, all the above algorithms assume single packet reception, in which no packet is decoded when multiple nodes transmit simultaneously. While the use of multiple receive power levels has been considered in [22] [23] [24] , the power levels were selected at random in each contention step and without adapting to feedback. SIC along with multiple power levels for multiple access has been considered in [25] . However, the different power levels were only used for simultaneous transmission of users with different bit error rate requirements. No feedback was assumed, and no collision resolution mechanism or RSSI measurement was used. To the best of our knowledge, DPMA is the first algorithm to use local CSI and RSSI to actively facilitate MPR and simplify receiver design.
The remainder of the paper is organized as follows. The system model is described in Sec. II. The DPMA-Lite and Turbo-DPMA algorithms are developed in Secs. III and IV, respectively. Section V contains the stable throughput and packet delay analyses of the algorithms, and is followed by simulations in Sec. VI and conclusions in Sec. VII.
II. SYSTEM MODEL
We consider a wireless network consisting of a number of packet generating nodes that need to transmit packets to a message sink. The packets of each node are assumed to arrive for transmission at unique times. The packets are transmitted from the nodes in a time-slotted manner; it is assumed that all packets have the same size. Without loss of generality (wlog), the duration of a slot is set to unity. The channel power gain between transmitting node i and the message sink is denoted by h i , and is assumed to be known at the transmitter (and nowhere else). This assumption is similar to the one made in channel-aware ALOHA [16] , [26] . To facilitate analysis, we make the standard assumptions of a Poisson packet arrival process with a mean arrival rate (over all users) of λ, and that each new packet is generated at a unique node [4] , [5] , [14] .
Let P i denote the power received at the sink from node i. (We shall henceforth call it 'receive power'). The sink can decode the packet from node i successfully if its received signal to interference and noise ratio (SINR) exceeds a threshold:
where σ 2 is the noise power andγ ≥ 1 is a threshold that depends on the modulation and coding used for the packet transmission [27] . Thus, a packet can be decoded successfully even when two or more users transmit simultaneously.
Consider now the specific case where every node i, which has local CSI, adjusts its transmit power so that its receive power, P i , is either q 0 or q 1 (wlog, let q 1 > q 0 ). When two nodes each transmit a packet, one with receive power q 0 and another with q 1 , both packets can be decoded successfully using SIC if
A checksum field in the packet enables the receiver to determine whether a packet has been decoded successfully. The power level settings in (2) can be generalized to handle simultaneous transmissions by more than two contending users. Note that no packet can be decoded successfully if more than one user's receive power is q 1 . However, if only one user's receive power is q 1 , and if the power levels are set as follows:
then the packet with receive power q 1 can be decoded so long as there are at most a users with receive power at q 0 . We shall refer to a as the adversary order, as it ensures that a signal with receive power q 1 can overcome interference from up to a users with receive power at q 0 . Note that a can take any real value [17] . The assignment of receive power levels to contending nodes is described in the next section.
A. Controlling Receive Power Levels and Exploiting RSSI
It is the local channel knowledge that enables the transmitting node to control the receive power level. Each node can easily and locally compute its channel gain to the message sink by listening to a (predefined) pilot sequence that is periodically broadcast by the sink. For a target receive power P and an estimated channel gain h, a node transmits its message at power P/|h| 2 . This technique is analogous to power control that is ubiquitous in second-and third-generation CDMAbased cellular systems. The mechanisms for enforcing discrete receive powers are the same in our case, though the motivation is subtly different. In power-controlled secondgeneration CDMA, it is essential that the received powers from all users are identical. In third-generation systems, several discrete receive power levels are foreseen (related to the fact that users with higher data rates need higher power). In our DPMA schemes, the different power levels are used for data sent at the same transmission rate.
The total receive power, specified by RSSI at the receiver, is the sum of receive powers of all received packets in a time slot. This measurement is made by many wireless receivers today. In our case, the receiver can extract useful side information from RSSI regarding the number of packets received at each of the two power levels since the receive power of each packet takes a limited set of values (q 0 and q 1 ). We will use this side information in the development of the DPMA algorithm in the following sections.
For clarity, we also define a quantity called the Residual Receive Power (RRP), which can be derived from the RSSI after the receiver successively performs SIC. RRP is defined as the power of the received signal that remains after all decodable messages have been canceled from it. For example, if the receiver gets two packets, one at power q 1 and the other at power q 0 , the RRP is on the order of the noise power, σ 2 , as both packets will be successively decoded and canceled from the received signal. Consider another case in which the receiver gets three packets, one at power q 1 and two at power q 0 , for a ≥ 2. Then, it decodes the packet at q 1 successfully, and it fails to decode the remaining two packets at q 0 . Therefore, the RRP is now 2q 0 + σ 2 . Finally, when no packet is received, the RRP is on the order of σ 2 .
B. Practical Feasibility of Two Discrete Receive Power Levels
The discussion above in (2) and (3) used two power levels. Two such levels can be easily accommodated by receivers of existing systems. For example, if the minimum SINR threshold for successful decoding isγ = 10 dB, it follows from (1) that the transmitter and receiver dynamic range should be at least 10 dB if the packet of higher received power is to be received successfully. In existing systems, the mobile station transmit power dynamic range is 35 dB in GSM systems [28] and 74 dB in third generation Wideband CDMA systems [18] .
After accounting for variations in receive signal strength due to fading and near-far problem, one can reasonably assume that the receiver has about 20 dB of dynamic range. Thus modern wireless transmitters are easily capable of providing the dynamic range required to achieve the two power levels, as long as the adversary order a is not extraordinarily large.
While the proposed scheme can also be generalized to handle more than two power levels to deliver even better performance, this comes at the expense of a larger dynamic range requirement and a greater feedback overhead.
C. Relevant SIC Receiver Properties
For the case of the two receive power levels specified in (3), a SIC receiver that processes only the signal received in the current time slot exhibits the following properties, which shall be important in the algorithm development that follows.
• If only two packets are received, one with power q 0 and the other with power q 1 , then both can be decoded.
• If only one packet is received with power q 0 , then it can be decoded.
• If one packet is received with power q 1 , then it can be decoded so long as no other packet is received with power q 1 and the number of packets with receive power q 0 does not exceed a . For example, if a = 2.1 (that is, q 1 = q 0 (2.1γ + 1)), and a packet A is received with power q 1 and two other packets are received with power q 0 , then only packet A is received successfully.
• Otherwise, none of the received packets can be decoded. A practical SIC receiver may cancel only 1 − fraction of interference power, where 0 < < 1. This can be handled by increasing the power levels to q 0 = σ 2γ2 +σ 2γ 1− γ 2 a and q 1 =γ(aq 0 +σ 2 ). This leads to more stringent requirement on the dynamic range of the transmitter. It also caps the largest allowed value for a to 1/( γ 2 ). However, we will omit this non-ideality in the rest of this paper.
III. DPMA-LITE
The discussion so far has brought out the following two things: (i) controlling the receive power level of each user to two discrete values enables the receiver to decode up to two packets simultaneously, and (ii) valuable information can be derived from the RRP, and reflected in the feed back messages to better control the next stage of the contention process. Specifically, the RRP determines the receiver behavior as follows 1 :
This implies that the packets transmitted in the slot have been resolved. Hence, the receiver broadcasts the Resolved-All (RA) feedback message.
It implies that the packet with receive power at q 1 (if it was transmitted) was decoded successfully, and at least two packets were received at power q 0 . Hence, the receiver broadcasts the ResolvedHigh (RH) feed back message. 
Finally, this case implies that the receiver could not decode any of the transmitted messages. Hence, it feeds back the Resolved-None (RN) message. In Table I , we exhaustively list a number of scenarios that may occur at the receiver and the corresponding feedback. As can be seen, 2 feedback bits are required to send one of the three feedback messages (RA, RN, and RH).
Given this feedback, we now describe how each node behaves in subsequent time slots under DPMA-Lite. Briefly, DPMA-Lite makes the packets that have arrived in the past contend with each other (using different power levels as described below) and allows newer packets to contend only after having successfully resolved all the packets that contend. It also uses a time-limited gated access strategy [5] , which limits the number of packets that contend.
A. Queuing, Gating and Contention Resolution Interval
When a new packet arrives at a user, it is stored in the user's local queue if the system is in the process of resolving the contention due to previously transmitted packets. The new packet is stored in the queue with its arrival time stamp. Consider the time slot in which the system clears the (k−1)-th contention. The k-th contention resolution interval (CRI) then begins at this time. Let b k denote the number of backlogged time slots with unresolved packets at this time.
DPMA-Lite uses a time-limited gated access strategy [5] , which allows only packets in a maximum interval of t 0 time slots to participate in the k-th CRI. That is, if b k is smaller than t 0 , then all unresolved packets (in the queues of all nodes) participate in the k-th CRI. Otherwise, only the packets with time stamps in the earliest t 0 time slots participate in the kth CRI. The other packets remain in the queue until a future CRI. We refer to t 0 as the gating interval [5] . It will play an important role in optimizing the protocol's performance.
B. Formal Definition of DPMA-Lite Algorithm
For clarity, we first provide a formal definition of the DPMA-Lite algorithm and then explain the reasoning behind it. An example is also provided to illustrate its various possible steps.
To specify the algorithm, we first define the following terminology. Let X = [x min , x max ) denote a contiguous time interval. We define the functions H(X) and L(X) to split the interval X into two equal-sized 'higher' and 'lower' intervals, respectively, as follows:
. Let U be a stack of unresolved contiguous time intervals.
Let τ denote the current time slot number, and d denote the latest time stamp that was included in the previous CRI. At system initialization, we set τ = 1 and d = 0, so that the packets with arrival time stamps in [0, 1) have not entered any CRI.
At the beginning of each CRI, the algorithm computes the number of back-logged time slots b = τ − d. As per the gating mechanism, the algorithm starts with [d, d + min(b, t 0 )) in stack U , so that all packets that arrived within this interval (of a duration of at most t 0 slots) participate in the CRI. Thereafter, we update d to d + min(b, t 0 ). At each time step of the CRI, all the transmitting nodes and the receiver (sink) implement the DPMA-Lite algorithm as follows. (Which part of the algorithm is implemented by whom will be clear from context.) Figure 1 demonstrates the operations of DPMA-Lite at each node except the initialization and termination steps.
• • At the end of a CRI: The current time τ is updated to be the next time slot (which is also the slot in which the next CRI begins).
C. Explanation
DPMA-Lite is essentially a splitting algorithm that optimally gates/controls the number of packets that can be 
transmitted in a slot, and when once a number of packets collide in a slot, it splits the space of contending users into two parts that contend separately. Specifically, the response to different feedback messages can be explained as follows:
• A feedback of RA implies that every packet that was transmitted in the slot has been successfully resolved. 
D. Example
We now illustrate how DPMA-Lite proceeds by means of an example, the parameters of which are artificially chosen to exercise the many scenarios defined in the algorithm. In Table II, [0.375, 0.5), respectively). As the RRP is again less than q 0 + σ 2 , another RA is fed back. In slot 5, the low power nodes of slot 1 (D and E) transmit such that their receive power is q 1 (time stamps lie in [0.5, 0.75)), and no packet gets decoded. As the RRP is larger than q 1 + σ 2 , the receiver feeds back a RN message. In slot 6, the high power nodes of slot 5 transmit. Now D and E are resolved simultaneously as they are received at power q 1 
IV. TURBO-DPMA While DPMA exploits RSSI, it does not do so fully. This can be seen from the example shown in Table II . In slot 5, both D and E are received at q 1 , and no packet is received at q 0 . DPMA-Lite hence feeds back RN since RRP > q 1 . This choice eventually leads to an empty slot in slot 7, which effectively lowers the maximum throughput that can be supported by the system. Being able to discriminate such a scenario can help increase the efficiency of the algorithm.
This can be done by checking whether the RRP less noise power is an integer multiple of q 1 . If yes, then it is highly likely that no packet was received at q 0 because of the considerable gap that typically exists between the two power levels. For example, forγ = 10 dB and a = 1, the odds that a sufficient number (10) of nodes transmitted at q 0 so as to cause the RRP to be an integral multiple of q 1 is of the order of 10 −10 , which is small. We now develop the Turbo-DPMA algorithm that actively exploits this fact. Before we do so, we discuss the implications of the unlikely event in which Turbo-DPMA is mistaken, i.e., it assumes that no q 0 receive power packet was received when several q 0 receive power packets were indeed received (such that the RRP less noise power is still an integral multiple of q 1 ). As we shall see, this scenario causes these packets to drop out of the current CRI. This has negligible impact since these packets recontend in the next CRI.
Turbo-DPMA therefore uses an additional feedback message Resolved-Low (RL) in addition to the three used by DPMA-Lite. To be specific, it generates its feedback messages from the RRP values as follows: 2 1) 0 < RRP < q 0 + σ 2 : As in DPMA-Lite, the receiver therefore feeds back RA.
As in DPMA-Lite, the receiver therefore feeds back RH.
3) RRP ∈ {mq 1 + σ 2 : m ≥ 2, m ∈ Z}: This implies that no packet is received at/near q 0 , and the receiver cannot resolve the packets received at power q 1 . The receiver therefore feeds back RL. 4) RRP > q 1 + σ 2 and RRP / ∈ {mq 1 + σ 2 : m ∈ Z}: This implies that at least one message was received with power q 1 and the receiver could not decode any of the messages. The receiver therefore feeds back RN. 3 
A. Formal Definition of Turbo-DPMA Algorithm
The formal definition of Turbo-DPMA algorithm is similar to that of DPMA-Lite in Sec. III-B, with the following one additional detail pertaining to the feedback message RL: 5) If feedback = RL, then push H(W ) into stack U and continue. As in DPMA-Lite, in each time step of the CRI, all the transmitting nodes and the receiver (sink) implement the Turbo-DPMA algorithm. The full formal description is not repeated here due to space constraints. (As mentioned, in the unlikely event that a packet with time stamp τ is omitted in a CRI, it joins the next CRI by updating its time stamp to a uniformly chosen random value in the new interval.) Table III exhaustively lists all the different scenarios that can be experienced at the Turbo-DPMA receiver, and the corresponding RRP and feedback message. Cases d, h, i and j are different from DPMA-Lite. In case i, Turbo-DPMA, unlike DPMA-Lite, can discern that no packet is received at q 0 . In cases d1, h1 and j1, no packet is decoded successfully, the resulting RRP exceeds q 1 and it is not an integer multiple of q 1 after subtracting noise power. For these cases, the algorithm will feedback RN. In cases d2, h2 and j2, the RRP less noise power is an integer multiple of q 1 . Therefore, Turbo-DPMA here incorrectly feeds back RL (i.e., it makes the wrong assumption that no packet is received at power q 0 ). As discussed above, the probability of these cases is extremely low.
For the example in Table II , the first four slots proceed in exactly the same manner as DPMA-Lite. However, in slot 5, Turbo-DPMA sends RL to the transmitters. This causes the collision resolution interval to end after slot 6 itself (unlike slot 7 for DPMA-Lite).
V. STABLE THROUGHPUT AND DELAY ANALYSES OF DPMA-LITE AND TURBO-DPMA

A. DPMA-Lite Stable Throughput Analysis
The analysis assumes Poisson packet arrival processes, which imply that packets are uniformly distributed in the time interval. Consider the expected number of slots, L n , required to resolve a collision involving n nodes. Clearly, when only zero or one packet is received in a slot, it takes exactly one slot to resolve the packet. Thus, L 0 = L 1 = 1.
When two packets are received in a slot, each packet can be received at power q 0 or q 1 . The following are the four possible cases:
• Both packets are received at q 1 : This case occurs with probability
0 . The receiver feeds back RN. In subsequent slots, the high nodes and the low nodes contend separately and are thus resolved separately. Resolving the high nodes takes a duration L 2 because there are two packets to be resolved. Resolving the low nodes takes duration L 0 , because there are no packets to be transmitted by low nodes.
• Both packets are received at different power levels: This case occurs with probability , and the receiver feeds back RH. 4 Hence,
The general case n ≥ 3 consists of the following mutually exclusive cases:
• n − i nodes at level q 1 and i nodes at level q 0 , with 0 ≤ i < n − 1: This case occurs with probability 1 2 n n i . It requires an average of L n−i slots to resolve the packets in the high nodes and L i to resolve the low nodes.
• 1 node at level q 1 , and n − 1 nodes at level q 0 : This occurs with probability 1 2 n n n−1 . If the SINR for the high node is below the thresholdγ (in other words, if a < n − 1), then the receiver feeds back a RN message; in the subsequent timeslot, the high node is the only one allowed to transmit, and its packet is resolved. If the total power from the low nodes is below q 1 , then the high node is immediately resolved, and a RH message is fed back. In either case, the n − 1 low nodes are resolved subsequently, which requires L n−1 slots.
• n nodes at level q 0 : If the total received power exceeds q 1 + σ 2 (i.e., if aγ + 1 < n), then the receiver feeds back an RN message. In the subsequent slot meant for high nodes to contend, no transmission occurs since there are no high power nodes with messages, and the receiver feeds back RA. Subsequently, the low nodes are resolved, which takes duration L n slots. If the total received power is lower than q 1 , then the system immediately starts to resolve the low nodes. Hence, for n ≥ 3,
where I {x} is the indicator function that is equal to one if x is true, and zero if x is false. After rearranging the equation, we arrive at the following recursion
When the packet arrival follows a Poisson process with mean arrival λ, and when the time interval (in units of the number of slots) to be included in a CRI is t, then the expected number of slots required to resolve a CRI is 
The following theorem describes the stability region of DPMA-Lite.
Theorem 1: The necessary and sufficient condition for stability is
Proof: Let the backlog b k be defined as the number of slots with unresolved packets in the system at the beginning of the k-th CRI. It is clear that b k is a Markov process as b k depends only on b k−1 . Due to the time-limited gated access design, all packets in the interval b k enter the CRI when b k < t 0 ; otherwise, only the packets in the first t 0 slots of the backlog enter the CRI. Hence, the expected number of backlogged slots in the next CRI, conditioned on b k , is
For stability, we note that if b k is a super-martingale whenever b k ≥ t 0 , then the backlog is finite with probability one [29] . This holds true when the drift satisfies
which is equivalent to the condition in (7). Equivalently, (7) can be restated as λ < R −1 (t 0 )/t 0 . However, doing so is not very useful because R −1 (·) is hard to obtain. (7) can be intuitively understood as follows. λt 0 is the expected number of packets entering a CRI when the maximum gating interval t 0 is used, and R(λt 0 ) is the expected number of time slots required to resolve λt 0 packets. Hence, λt 0 /R(λt 0 ) is the rate at which packets are successfully decoded at the receiver when there is a significant backlog. Therefore, Theorem 1 states that stability is ensured when the arrival rate of packets into the system is less than the expected rate at which packets are decoded successfully by the receiver.
For a given adversary order a, we can numerically evaluate the stability region of the DPMA-Lite algorithm, in terms of t 0 and λ. Figure 2 shows the stability region, and shows the value of t 0 that gives the maximum value for λ. As a increases from 1 to 5, the maximum stable value of λ also increases, as expected, from 0.6517 when 1 < a < 2, to 0.6791 when 2 < a < 3, to 0.6854 when 3 < a < 4, and to 0.6865 when 4 < a < 5. The value of t 0 that leads to the maximum stable arrival rate changes from 2.476 to 2.551, 2.607, and 2.628 as a increases. This is expected since increasing a allows the receiver to resolve more cases; thus, the algorithm can become more aggressive in making more users contend. Using DPMA-Lite, we see that the maximum stable arrival rate is 0.6865 when a ≥ 4. For an SINR threshold ofγ = 10, this implies that q 1 ≥ 41q 0 , i.e., q 1 is 16 dB above q 0 . Such a dynamic range can be readily supported by many receivers today. Even though the arrival rate of 0.6865 is marginally below the 0.693 result using the SICTA algorithm, DPMA-Lite is superior from an implementation complexity point of view since it does not require the receiver to store soft information of the received undecodable packets. Also, it only uses three feedback messages.
B. Turbo-DPMA Stable Throughput Analysis
The throughput analysis for Turbo-DPMA is very similar to that for DPMA-Lite above. Hereafter, we do not consider cases d1, h1, and j1, in which Turbo-DPMA incorrectly assumes that no packets were received at q 0 , given that they are extremely unlikely events. Maximum stable value of λ 1 < a < 2 2 < a < 3 3 < a <4 4 < a < 5 Fig. 3 . The boundary of the stability region of Turbo-DPMA for different values of adversary order a.
When only zero or one packet is received in a slot, it takes exactly one slot to resolve the packet. Thus, L 0 = L 1 = 1. When two packets are received in a slot, the system behaves in the same way as for DPMA-Lite, except in the case that both packets are received at q 1 . In that case the receiver feeds back RL (instead of RN). Since the system knows that only high nodes have packets to transmit, a duration L 2 (instead of L 2 + L 0 for DPMA-Lite) is required to transmit the packets. Hence, the total resolution time is
which, when solved, leads to L 2 = 2 (which is lower than L 2 = 2.5 slots of DPMA-Lite). Similarly, if n ≥ 3 packets are transmitted, the only difference from DPMA-Lite occurs for the case in which all packets are received at level q 0 . Here, RL is fed back instead of RN. Thus,
Rearranging the equation gives
The expected number of slots required to resolve a CRI given a collision resolution window of size t is given by (6) , and the stability condition is the same as in Theorem 1.
For a given a, we can numerically evaluate the stability region of the DPMA algorithm in terms of t 0 and λ. Figure 3 shows the stability region boundary, and shows the value of t 0 that achieves it. As the adversary order, a, increases from 1 to 5, the maximum stable value of λ also increases, as expected, from 0.743 when 1 < a < 2 to 0.793 when 4 < a < 5. The corresponding value of t 0 that leads to the maximum stable arrival rate also increases from 2.37 to 2.50. 
C. Delay Analysis
Having analyzed the stable throughput region of the protocols, we now analyze the average delay observed by a packet that arrives in an arbitrary user's local queue. For brevity, we only show the derivation for Turbo-DPMA, with the derivation for DPMA-Lite being similar.
Consider a packet that arrives between the (k − 1)-th and k-th CRI. Figure 4 shows the two time components that contribute to the packet delay: (i) the backlog delay W BL , which is the time that a packet waits in a backlog before the CRI in which it is resolved begins and (ii) the collision resolution delay W CR , which is the time during the CRI before the receiver decodes the packet successfully. We analyze these two components separately below. 1) Backlog Delay: As before, let b k denote the number of slots backlogged at the start of the k-th CRI. In general, for an arbitrary t 0 , the backlog b k can be any positive value in the set B = {n + mt 0 : n, m ∈ Z}. For example, for t 0 = 2.5, a typical value for the gating interval, B is the set of all half integers. Since b k is a super-martingale, b k forms an ergodic Markov process, and we can find its steady-state distribution. To do so, we first compute below the transition probability,
denote the probability of resolving n packets in exactly i slots, and let Q n (z) denote its probability generating function. Hence,
0 = 0 for all n ≥ 0. Since it takes exactly one slot to decode zero or one packet, we have p
For n = 2, the probability of resolving two packets in the first slot is 1 2 (when both users transmit at different power levels). Otherwise, both packets need to be resolved all over again in future slots (since either RL or RH is fed back). Hence, Q 2 (z) =
For n ≥ 3, we arrive at the following expression, which is similar to (11):
This leads to the following
, for n ≥ 3. Therefore, from the properties of a Poisson packet arrival process, the probability generating function of the backlog transition probabilities p(b k+1 |b k ) can be written in terms of Q n (z) as:
(14) For b k < t 0 , all the backlogged packets enter the k-th CRI, which implies that the start time of the next backlog interval coincides with the end time of the current backlog interval. In time b k , the probability that n packets arrive is
(this follows from the Poisson arrival assumption). When b k ≥ t 0 , the backlog packets of the (k + 1)-th CRI are the ones that were not allowed to contend in the k-th CRI (i.e., they arrived during the b k − t 0 slots that precede the k-th CRI) and over the duration of the k-th CRI. The duration of the kth CRI depends on the number of packets that arrived over t 0 slots.
Recall that the coefficient of
The steady state probability of the backlog duration l, denoted by P b (l), l ∈ B, can now be evaluated from p(b k+1 = i|b k ) using the global balance equation since the b k is an ergodic Markov process [29] . 5 Given the steady state backlog duration probabilities P b (l), the expression for the expected backlog delay W BL follows from the following theorem.
Theorem 2: The average backlog delay is
whereP b (l) is the probability of a packet arriving in the system when the backlog is l, and equals
Proof: From the definition of W BL , it follows that we do not need to consider packets that arrive after the first t 0 time slots of a backlog since these packets will be serviced only by subsequent CRIs. Therefore, for a backlog of length l, the arrival time, t, of the packet lies within [0, min{l, t 0 }]. Given the Poisson arrival assumption, it follows that t is uniformly distributed in [0, min{l, t 0 }]. The packet will have to wait for a time l−t before the CRI in which it contends and is resolved begins. The average backlog delay is then 2) Contention Resolution Delay W CR : In general, the expected collision resolution delay depends on when the packet arrives in a particular collision resolution window, which is analytically difficult to compute exactly. However, this can be upper bounded by assuming that the packet of interest is the 5 Since the backlogs are unbounded, the transition matrix is an infinite dimensional one. In practice, we truncate the transition matrix to a finite size to compute P b . last to be resolved in a CRI. This depends only on the size of the collision resolution window. Hence,
Both the analytically evaluated and simulated values of the delay are shown in Fig. 5 , and match each other well.
VI. SIMULATIONS
We confirm our analyses of the two DPMA algorithms through simulations. As mentioned, our simulation uses the infinite nodes assumption [4] , where a new node is introduced for each new packets arriving at the system. The packet arrival follows a Poisson process with mean arrival rate λ, which is a simulation parameter. We assume perfect CSI at each transmitter so that the receive power of any packet is either exactly q 0 or q 1 . The receiver noise power is assumed to be −100 dBm, and the decoding thresholdγ = 10 dB. Hence, q 0 = −90 dBm. Figure 6 shows the average delay of the DPMA-Lite for a simulation consisting of 3 × 10 5 consecutive packets. The simulations use adversary orders of a = 1.3 and a = 4.3 (which set the values of q 1 ), and the respective optimal gating intervals of 2.476 and 2.628. In both cases, we see that the average delay remains low until the packet arrival rate, λ, approaches the maximum value for stability, which is 0.6517 for a = 1.3, and 0.6865 for a = 4.3.
Using the same simulation parameters, the average delay for Turbo-DPMA is shown in Fig. 5 . Once again, the delay increases rapidly as the packet arrival rate approaches the maximum value for stability, which is 0.743 for a = 1.3, and 0.793 for a = 4.3. The figure also plots the evaluated average delay values calculated using the analytical results in Sec. V-C. We see a good match between the two. However, the match is not perfect for the following two reasons. First, the probability generating function W b k (z) needs to be truncated to numerically evaluate the backlog transition probability. Second, as the arrival rate increases, the renewal time (when the all the packets in queue enter a particular CRI) becomes larger; thus, the simulation needs to cover significantly longer time intervals in order to obtain the correct average delay.
Finally, in Fig. 7 , we examine the sensitivity of the Turbo-DPMA algorithm to the maximum gating interval t 0 , at different arrival rate values, when a = 4.3. We see that when the network load is light, the average delay is fairly insensitive to the gating interval. Even when λ = 0.6, which is higher than the stable throughput of most contention algorithm, DPMA achieves an average delay of about 4.2 for a wide range of gating interval. We also see that the average delay is sensitive to t 0 only for packet arrival rates close to the stability region boundary.
VII. CONCLUSIONS
In this paper we introduced the concept of Active MultiplePacket Reception (Active-MPR) for multiple access of several transmitters to a single common receiver that is capable of MPR. In Active-MPR, the transmitters use local channel state information to help improve the multiple-access performance, e.g., the stable throughput. We proposed the Dual Power Multiple Access (DPMA) algorithm that employs two discrete receive power levels to enable serial interference cancellation and, thus, successful MPR. Unlike other systems that employ MPR, DPMA did not need to store soft information at the receiver across multiple time slots.
We proposed two versions of DPMA. The more conservative variant of the algorithm, DPMA-Lite, guaranteed that all packets are received within a collision resolution interval. Using three feedback messages (the same number as in commonly known contention algorithms, but with different meaning), DPMA-Lite achieves a stable throughput of 0.6865 packets per slot for typical receiver dynamic ranges. We also proposed a more aggressive version of the algorithm, Turbo-DPMA, that uses four feedback messages. It achieves a stable throughput of 0.793 packets per slot, which is better than all previously known contention algorithms.
The algorithms have wide applicability for wireless networks, especially as more and more wireless receivers start using interference cancellation. Depending on the dynamic range of the receivers, we also envision generalizations of the algorithm to the case that three or more packets can be resolved simultaneously. While this would slightly increase the feedback overhead and the required dynamic range of the receiver, it would increase the stable throughput even more.
