Abstract-In this brief, we introduce a priority encoder that uses a novel priority lookahead (PL) scheme to reduce delays associated with priority propagation. Two priority encoder approaches are presented, one without and the other with a PL scheme. For an -bit encoder, the circuit with the PL scheme requires about 0.1 more transistors than the circuit without the scheme. However, a 32-bit very large scale integration (VLSI) encoder with the PL scheme is about 2.5 times faster than the other encoder. The worst case operation delay is 4.4 ns for this lookahead encoder using a 1 m scalable complementary metal-oxide-semiconductor (CMOS) technology.
I. INTRODUCTION
Priority encoders are used in a number of computer systems as well as other applications. When several processes, modules, or units request a single hardware (or software) resource, a decision has to be made to allow a single request to use such a resource. The priority encoder implements a fixed selection function where the resource is granted to the request with the highest priority. Examples of some subsystems that use encoder functions include: bus [1] , [2] , fixed and floating point units [3] , I-O [4] , data comparator [5] , and interconnection network router [6] . The number of inputs in these subsystems ranges from 16 to 64.
Out of a number of requests a priority encoder selects only one of them to be served. This selection is based on a static priority that can only be changed by explicitly changing wiring of the encoder. The proposed priority encoder accepts N input request lines and sets only one of the outputs that corresponds to the request that has the highest priority. The critical path need be studied to ensure fast execution. In this brief, we propose a novel lookahead approach in order to minimize the propagation delay of the priority status during the worse case operation, hence increasing the speed of the circuit.
In this study, the encoded priority (EP) for a bit position; i is expressed as
where M i is the priority encoder's input request and P i is the priority status for this bit position. M i and P i are both one bit values. The operator (1) in (1) and in the subsequent equations represents a logic AND, hence the encoded priority by definition is the result of ANDing the input with the priority status. P i depends on the previous priority (Pi01) due to the linear priority requirement. Thus,
Manuscript received April 9, 1998; revised March 7, 2000 . This work was supported in part by the National Science Foundation under award CCR9 900 643. This paper was recommended by Associate Editor M. Glesner.
J.G. Delgado-Frias is with the Department of Electrical Engineering, University of Virginia, Charlottesville, VA 22904-4743 USA.
J. Nyathi is with the Department of Electrical Engineering, State University of New York, Binghamton, NY 13902-6000 USA.
Publisher Item Identifier S 1057-7122(00)07067-7.
This means that if the previous request Mi01 is "0", the priority Pi01 is passed on to the next cell's priority (P i ). If we apply (2) recursively, then (1) becomes
Equations (1) and (2) describe the encoding scheme of the first approach. We propose a priority lookahead scheme which shares some common features with carry lookahead adders [7] . We develop expressions for a priority lookahead (PL) status. In our approach an N -bit priority encoder is partitioned into 4-bit lookahead segments. This number of bits represents a good compromise between fan-out and delay. Thus, the expressions used to describe the 4-bit PL scheme are
. . .
Using the lookahead scheme the encoded priority becomes 
A second level priority lookahead (P 2L) can be used to reduce EP i delays when i is larger than 16. Having a four-input P 2L will yield P 2L 0 = PL 3 1 PL 2 1 PL 1 1 PL 0 ; thus, we have
For the sake of simplicity, we do not explore this option any further in this paper.
The very large scale integration (VLSI) circuit design of the two approaches is presented in Section II. Performance of the schemes is evaluated by means of simulations; in Section III these results are presented and studied. Some concluding remarks appear in Section IV.
II. PRIORITY ENCODER VLSI CIRCUITS
In our complementary metal-oxide-semiconductor (CMOS) design, we use pass transistor logic and dynamic circuitry to achieve high performance and minimize silicon real estate [8] . In this section, we introduce a CMOS priority encoder with no lookahead first. Circuitry is added to this design, with a negligible impact on the overall number of transistors, to produce an extremely fast encoder that uses the PL scheme. which sets transistor T pdp on or off. This transistor is used to pull down the priority; thus, at precharge, T pdp is set off. The priority status is normally precharged to "1" to avoid delays in propagating Pi = 1. Once the precharging of M 0 i and Pi has been completed, the request (M i ) is passed when the clock signal is "1." Transistor T lch allows the request to be dynamically latched on the gates of the inverter and transistor Tpp. If the current request has the highest priority and M i = 1; the priority status P i+1 which has been precharged must be set to "0." Fig. 1 shows that this is accomplished by using the value of M 0 i , to turn on the pull down transistor (T pdp ) and hence, discharge P i+1 . It should be observed that when M i = 1 transistor T pp is turned off by the signal M 0 i to prevent a discharge of priority Pi. If the current request is "1" it's priority status has not been discharged, then the corresponding encoded priority (EP i ) is set to "1." Fig. 2 shows a four-input encoder unit; the priority status is conditionally propagated to the lower cells. If M 0 is "0," then P 0 gets propagated to the next cell, and to minimize the delay of propagating a "1,"
A. Basic Priority Encoder Cell
P1 has been precharged. If request M0 is a "1," P0 is not propagated, instead P 1 is discharged to "0." The priority encoder unit shown in Fig. 2 can be extended to accommodate a larger number of inputs. A buffer per every four-input encoder unit is needed to reduce delays in the priority chain.
B. Priority Encoder Cell With Lookahead
In order to reduce priority status propagation delays, we propose a circuit that implements the PL in a simple, but effective manner. Fig. 3 shows the basic cell for the PL scheme. In this figure, the unit cell from Fig. 1 has been extended to include a lookahead line and a pull-down transistor (T pdl ) whose gate is driven by the request. At precharge time, the lookahead line is set to "1." When the clock goes to "1" and if M i = 1, transistors T pdl and T pdp conduct discharging PL j and P i+1 , respectively. Cascading several basic cells of Fig. 3 , results in a chain of pass transistors (Tpp), and a delay increase in propagating Pi = 0. The PL scheme provides a fast path for P i to propagate to other cells through the lookahead line (PL j ). The signal on the PL line propagates to the lower cells much faster than the priority signal that propagates through the pass transistors (T pp ). Fig. 4 shows four cells with the PL line put together to form a 4-bit PL encoder. The PL line is precharged to "1;" this is done through transistor T pchl . An n-type transistor precharges the PL line in order to reduce the time required to pull-down PLj if a single input request within the four-input unit is at "1." In order to determine the status of the PL line, all the requests M0 through M3 are considered. The logic value of the PL line in this case is: PL 0 = M 0 1M 1 1M 2 1M 3 . If request M 0 has the highest priority, and the rest of the requests (M1 through M3) are at "0," then, the PL line gets discharged by transistor T pdl corresponding to the first entry and a "0" gets propagated to the next four entries.
The signal on the PL line gets inverted, turning transistor T pdpc on, resulting in a "0" being propagated upwards through the chain of series pass transistors, on the other hand, the priority status, P1 = 0, is being passed through the same chain of pass transistors, downwards. This approach significantly reduces the delay of propagating a "0" within the four-input encoder unit.
The 4-bit priority encoder can be cascaded to produce an N -input encoder. Additional circuitry needs be included between the modules; this intermodule circuitry appears in the dotted box in Fig. 4 . The intermodule circuitry consists of three n-type transistors two of which are in series, with transistor T dl1 driven by the priority lookahead inverse signal and transistor T dl2 driven by the precharge signal. Transistor T dl1 could have been used as the pull-down transistor, however, transistor T dl2 is required to accommodate delays. When several modules are put together, the precharge cycle of PL j+1 could begin while transistor T dl1 is still conducting. To allow the PL j+1 PL line to be charged properly, transistor T dl2 is set off. The transistor labeled as T pdpc serves to discharge the priority status of the next module as well as the previous module, if a request with the highest priority has been encountered in any of the entries above.
III. PERFORMANCE RESULTS
The circuits described in the previous section have been simulated for functionality and performance using SPICE. A 32-bit encoder using a 1-m scalable CMOS technology has been used as a test case. The priority encoder may be implemented using cascaded standard logic gates. However, this would require many more transistors without much gain in performance [8] . Thus, this standard logic gate approach is not considered any further in this study.
We have arranged the transistor geometries to achieve better performance. The Tpp transistors that pass the priority from cell to cell have a gate ratio (width : length) of 4 : 1 to reduce delays in a transistor chain. When a high priority request is found the inverter on the PL line must switch its output from "0" to "1." The inverter's p-type transistor must accomplish this switching. This transistor's gain factor needs be large enough to provide a good switching speed. This transistor's gate ratio is increased to 6 : 1. When an encoded priority (EP i ) in the upper entries has been set to "1," priority status P i+1 gets set to "0," and propagated to the lower entries, to ensure that the rest of the encoded priorities are set to "0," since a request input (M i ) with the highest priority has been encoded. The critical operation of the proposed priority encoder occurs when the first and last input requests are both at "1" while the rest of the input requests are at "0." Priority status P n01 , must be set to "0" in order to prevent EPn01 from being set to "1." In this critical case P 1 = 0 needs to be propagated to the last entry.
This, in turn, may impose restrictions on the system clock, since the clock should allow enough time to process any case (including the worst case). This critical path (or worst case) is simulated and reported below. We measure the propagation delay for the worst case operation, as the time it takes for priority status Pn01 to reach 10% of V dd , once the clock reaches 90% of V dd . The SPICE simulation results show that the circuit with the PL scheme performs much better than the circuit without the scheme. Fig. 5 depicts the SPICE simulation results of the circuit without the PL, while Fig.  6 displays the results of the circuit that uses the lookahead scheme. It takes 11.4 ns to propagate the priority status if the first design approach is considered, and 4.4 ns when the PL approach is used. These results show a significant improvement in performance of one implementation over the other. This improvement can be assessed by computing the ratio of the delays (7). This ratio shows 
Table I provides a summary of the transistor count for the two approaches. Based on the expression of transistors shown in Table I, we can compute the ratio of the number of transistors for an N -bit encoder (8) . The increase in transistor count is just 9.4%; this small increase (in transistor count) yields a 159% performance improvement. 
IV. CONCLUDING REMARKS
In this brief, a priority encoder scheme and its CMOS design are presented. We have developed and designed two dynamic circuits that implement the priority encoder, one of them uses a lookahead scheme. The circuit for the priority encoder that uses the lookahead scheme requires an additional transistor and a lookahead line as additions to the basic cell of the encoder without lookahead. The lookahead line permits the priority status (P i ) to be propagated to the cells below much faster than the P i 's within the circuit that does not use the lookahead scheme.
The lookahead cell circuit has less than 10% more transistors than the circuit without the lookahead scheme. This increase in transistor count is greatly offset by the improvement in performance. A 32-bit priority encoder and a 1-m SCMOS technology have been used as means to show the potential of the proposed encoder scheme. We used SPICE to simulate the performance of the circuit. The worst case delay (which has to be used to set the maximum frequency of the encoder clock) have shown that the design with PL scheme outperforms the design without lookahead by 159%.
