Analysis and Design of 8-Bit CMOS Priority Encoders by Wang, Xiaoyu & Feng, Yukang
Analysis and Design of 8-Bit CMOS Priority Encoders
Xiaoyu Wang and Yukang Feng
University of Virginia
{xw5ce, yf4rs}@virginia.edu
ABSTRACT
A comprehensive review and fair comparison of previous pri-
ority encoder (PE) designs over the past one and a half
decades are presented using a 45 nm technology. Further,
potential limitations of existed PEs are identified, based on
which we propose a robust PE design. The new PE is able to
eliminate race condition and charge sharing problem which
are suffered by almost all the previous designs. Besides, the
proposed PE can also be used in comprising higher order
PEs by incorporating a carefully designed look-ahead struc-
ture. Simulation results demonstrate that our design can
achieve one of the best power and delay performance among
previous PEs and are free from potential risks.
Keywords
Priority Encoder, Fair Comparison, Race Condition, Charge
Sharing, Higher Order Priority Encoder
1. INTRODUCTION
A Priority encoder (PE) is a basic but critical unit in digital
systems, and it has been widely used in many applications,
such as fixed and floating point units, comparators, incre-
menter/decrementer circuits, sequential address encoder of
content addressable memories and so on. In a multi-bit PE,
each bit is assigned a priority weighting according to its own
position weighting. A logic-1 priority token is initially and
temporarily given to all bits. While the bit with a higher
priority accepts a logic-1 input, it will pass a logic-0 signal
to update the priority tokens of those lower priority bits to
disable their priority. Meanwhile, when any one bit accepts
a logic-0 input, it will also lose its priority by definition. For
each input pattern, only the bit keeping the logic-1 token
can generate a logic-1 output, while all the other bits will
get logic-0 outputs.
Since late 1990s, more than eight different PE designs were
proposed to achieve lower power dissipation, shorter delay
and also less complexity. However, critical issues like pos-
sible charge sharing and race condition which might lead
to fatal breakdown were rarely addressed. Based on these
observations, this paper first presents a fair comparison of
different PEs using the updated 45 nm process technology
in order to shed some light on how to choose proper PEs in
different applications. Next, we identify major limitations
for each of existing PE and verify our analysis through simu-
lations. Furthermore, a robust PE architecture suitable for
serial cascading is proposed to get rid of potential charge
sharing and race problems and maintain a decent delay and
power performance at the same time.
2. PREVIOUSWORKSANDLIMITATIONS
A series of PE designed were proposed during the past decade.
In this section, a brief review of the history of PE’s advance-
ment is presented and some limitations neglected before are
identified.
2.1 Related Works
Delgado-Frias and Nyathi [1] designed a priority encoder
that permits sequential passage of priority token from the
highest priority primary input to the lowest priority input
– the disadvantage of this design being that the sequential
passage of priority token encounters a delay of O(n), where
’n’ represents the total number of primary inputs or out-
puts. To alleviate the linear increase in delay, Wang and
Huang [7] put forward two 8-bit priority encoder designs,
comprising two 4-bit encoder blocks with the provision of
an internal lookahead signal – one of the designs extensively
utilizes pMOS transistors while the other design widely de-
ploys nMOS transistors.
Kun et al. [4] came up with the design idea of an 8-bit pri-
ority encoder module, eliminating the need for sub-modules
and internal lookahead signaling. While Huang et al. [2]
proposed a serial cascading architecture to realize higher or-
der priority encoders, with the lookahead output of a 8-bit
encoder module serving as the lookahead input for the suc-
ceeding encoder block, Kun et al. [4] proposed a parallel
priority-based cascading topology to implement larger size
priority encoders. Mohanraj et al. [5] presented a new 8-bit
priority encoder design, which is in fact a refinement of Kun
et al.’s encoder design by exploiting shared logic to reduce
the number of devices needed for physical realization.
Huang and Chang [3] introduced a new NOR- based priority
encoder, where during the precharge phase of the clock, all
the outputs are driven to logic-high state, and in the eval-
uation phase, based upon input request(s), the input that
assumes a higher priority is enabled and its corresponding
output is retained as logic high, while the other primary
outputs are pulled to logic low. In this aspect, the Huang
and Chang’s design is similar to Huang et al.’s pMOS-based
priority encoder design. Panchal et al. [6] modified Huang
and Chang’s work and came up with a similar PE based on
active-low logic, which implies that input(s) have to be logic
low so as to activate the PE to produce a desired logic high
output.
ar
X
iv
:1
80
6.
01
44
3v
1 
 [e
es
s.S
P]
  5
 Ju
n 2
01
8
2.2 Limitations of Previous Works
The above described PE designs were proposed in a time
interval of more than one and a half decades, and thus im-
plemented using different process technologies, including 90
nm, 250 nm and 900 nm, most of which are outdated. There-
fore, the studied metrics might not be representative enough
to reflect their performance disparity. To the best knowledge
of the authors, there has not been any thorough and fair
comparison of all the major PE designs in the past decade,
i.e., utilizing the same updated process technology. For this
consideration, we implement these PEs and measure main
design metrics using the same 45 nm process technology,
aiming to provide some insights for choosing different PEs
under different application circumstances.
Based on extensive and in-depth study of previous works, it
is noticed that almost all the designs focused on improving
the three metrics, i.e., power dissipation, worst-case delay
and number of transistors, which are vital aspects for PE
design. However, what they failed to consider is the ro-
bustness of these PEs, i.e., whether they could still function
correctly in some extreme or untypical scenarios. For in-
stance, with some certain input combinations, PEs designed
without considering the possibility of charge sharing prob-
lem will generate outputs with lower voltage than logic high
or even result in flipping. Another example is that due to
unexpected delay of the look-ahead signal, a race condition
might happen causing the stage losing priority still outputs
logic high signal(s). These two cases will both severely com-
promise PE’s robustness and thus limit their applicability
or require more complexity when designing other parts of a
system. In addition, since a typical PE only consists of 8-bit
inputs and higher order PEs are widely used in various sys-
tems, the ability of 8-bit PEs to comprise higher order ones
though serial cascading is desired in most cases. However,
some of the previous designs are not suitable for realizing
higher order PEs, even though they included a look-ahead
signal in their circuits.
Table 1 summarizes major limitations of previous PE de-
signs in terms of race condition, charge sharing problem and
feasibility of cascade, in which checkmarks represent that
corresponding designs suffer from those limitations. From
Table 1 we notice that all the previous PE designs failed to
fix at least one of the problems except the high-speed PE
proposed in [7], which on the hand, has another disadvan-
tage that its power dissipation is much higher than the rest
of PEs. Given these observations, we propose a robust PE
architecture which is race-condition and charge-sharing free,
as well as suitable for realizing higher order PEs.
3. PROPOSED ROBUST PE
In a multi-bit PE, the output of the i-th bit is OPi = IPi ·Pi,
where IPi is the corresponding input data and Pi stands for
the priority token passed onto this bit. When the input of
the lower significant bit is 0, the priority token is passed onto
the next bit, i.e., Pi = IP i−1 ·Pi−1. The general expression
of outputs OPi can be written as
OPi = IPi · IP i−1 · IP i−1 · IP i−3 · · · IP 1 · IP 0 (1)
For the proposed 8-bit PE with a three-level look-ahead
structure shown in Figure 1, the fundamental equations gov-
Table 1: Limitations of different PEs
Limitations
Race
condition
Charge
sharing
Unsuitable
for cascade
Wang and
Huang 1 [7]
Wang and
Huang 2 [7]
Kun
et al. [4]
Huang &
Chang [3] (flipping)
Mohanraj
et al. [5]
Panchal
et al. [6] (flipping)
erning the PE are given as follows
OP0 = LA · IP0
OP1 = LA · IP 0 · IP1
OP2 = LA · IP 0 · IP 1 · IP2
OP3 = LA · IP 0 · IP 1 · IP 2 · IP3
LAinter = LA+ IP0 + IP1 + IP2 + IP3
OP4 = LAinter · IP4
OP5 = LAinter · IP 4 · IP5
OP6 = LAinter · IP 4 · IP 5 · IP6
OP7 = LAinter · IP 4 · IP 5 · IP 6 · IP7
(2)
When Clock becomes 0, the circuit is in the pre-discharge
phase. LAinter is 0 and all outputs are pre-discharged to
0. When Clock becomes 1, the circuit enters the evaluation
phase. In the circuitry, the p-type dynamic gates for OP0 ∼
OP3 realize the first-level look-ahead functions with la0 ∼
la2 acting as the look-ahead signals. Owing to the first-level
look-ahead structure, the four outputs OP0 ∼ OP3 evaluate
at the same time.
LAinter is used to realize the second-level look-ahead func-
tion between the higher-priority and lower-priority 4-bit cells
and LA is used to realized the third-level look-ahead func-
tion to decide whether the current 8-bit macro cell owns
the priority. Note that the new design uses active-low look-
ahead signals, which means that another stage with higher
weighting owns the priority when LA is logic 1. In such a
case, OP0 ∼ OP7 will be set to logic 0 during the evaluation
phase. If LA is logic 0 to pass the priority into the current
macro cell, OP0 ∼ OP3 are decided by IP0 ∼ IP3 directly,
while OP4 ∼ OP7 are decided by both IP4 ∼ IP7 and the
second-level look-ahead signal LAinter.
There are a number of advantages of the new 8-bit PE cell
over the conventional ones. First, the PE cell is designed to
be race-free by using rs0 ∼ rs7. At the beginning of the
evaluation phase, each output bit is evaluated immediately
according to the input signals and at most one output bit
will be charged from 0 to 1. However, these outputs may
be incorrect. When the correct look-ahead signal arrives a
IP7
IP0
IP1
IP2
IP3
IP4
IP5
IP6
Clock
LA
AL_inter
OP7
OP5
OP6
OP3
OP4
OP2
OP1
OP0
la0
la2
la1
pd0
pd2
pd1
pd4
pd3
pd5
pd7
pd6
rs7
rs6
rs5
rs3
rs4
rs0
rs1
rs2
Figure 1: Proposed 8-bit robust priority encoder.
little bit later than the rising edge of the clock signal, if the
current stage owns the priority both signals LA and LAinter
will remain at 0 and the previously evaluated outputs are
exactly correct. Otherwise, if the current stage loses it pri-
ority, both signals LA and LAinter will be 1 to turn on
rs0 ∼ rs7 to enforce all the outputs of the current stage to
be logic low. Second, because the circuit utilizes the three-
level look-ahead-signal structure, it has the high-speed char-
acteristics. Third, the PE design will not suffer from charge
sharing, since there only exist two parallel NMOS transis-
tors between each output and ground. Fourth, due to the
series-type circuit structure, all outputs will evaluate in the
evaluation phase but with only one output being charged
after the pre-discharge phase and also only the output with
high voltage will be discharged in the next pre-discharge
phase. This means a significant reduction of the switching
activity and the corresponding switching power. Last but
not least, given the carefully designed look-ahead signal, the
new PE could also be used as a macro cell for comprising
higher order PEs by utilizing the parallel priority look-ahead
architecture of Kun et al. [4].
4. PERFORMANCEEVALUATIONANDEX-
PERIMENTAL RESULTS
4.1 Fair Performance Comparison of PEs
Seven 8-bit dynamic CMOS PEs including the proposed de-
sign have been implemented at the transistor level and sim-
ulated using Cadence based on a 45nm CMOS process de-
        Clock 
        LA 
        IP0 
        IP1 
        OP0 
       OP1
Figure 2: Race condition in the power-optimized
PE.
sign kit from NCSU, i.e., FreePDK, with a supply voltage
of 1.1V. A combination of all the possible inputs are applied
at a clock frequency of 50 MHz to verify the functionality of
these different PEs, as well as to estimate the average power
dissipation. The total average power dissipation and critical
path delay metrics of different 8-bit PEs are given in Table
2, along with the device count required for physical design.
The device count, in terms of number of transistors needed,
is assumed to be representative of the area occupancy of the
circuit.
From Table 2 we notice that the PE presented by Wang
and Huang 1 [7] is the fastest design, while its power dis-
sipation, PDP and transistor number are much larger than
the rest of the designs, which makes its much less desir-
able in applications. In terms of the four metrics considered
here, the power, delay and area optimized PE proposed in
[5] achieves the optimal overall performance, while it comes
with the cost of possible race condition and charge sharing
as discussed in Section II. The proposed robust PE has a
balanced performance in these four aspects, i.e., with a rel-
atively small number of transistors, the new PE has one of
the smallest power consumption, delay and PDP.
4.2 Potential Failure of Previous PE Designs
Given the identified potential limitations of previous PE de-
signs, in the section, we provide some simulation results to
confirm our analysis in Section II. For a PE design failing
to consider potential race condition, i.e., the look-ahead sig-
nal that disables the current stage arrives after the clock
edge starting a evaluation phase, the outputs might not be
disabled immediately, leading to unexpected results. Here
the power-optimized PE [4] is used as an example and corre-
sponding timing diagram is given in Figure 2. For this design
with active-high look-ahead signal, when the look-ahead sig-
nal is logic 0, all the outputs of the current stage should be
disabled no matter what the input values are. However, as
shown in Figure 2, when the rising edge of Clock arrives, LA
is high and the second bit owning the priority outputs logic
1, while when LA arrives later, OP1 would not be disabled,
leading to the possibility that more than one of the outputs
have logic 1 in a higher order PE.
Next, we consider another fatal problem existing in previ-
ous works – charge sharing. In Figure 3, a possible output
flipping of the power, delay and area optimized PE pro-
Table 2: Comparison of Design parameters of different 8-bit dynamic CMOS PEs
Design
metrics
Wang and
Huang 1 [7]
Wang and
Huang 2 [7]
Kun
et al. [4]
Huang &
Chang [3]
Mohanraj
et al. [5]
Panchal
et al. [6] New design
Power (µW) 79.11 6.119 9.422 6.544 3.189 7.100 6.879
Delay (ns) 0.177 0.346 0.292 0.281 0.274 1.018 0.278
PDP (fJ) 14.002 2.117 2.751 1.839 0.874 7.228 1.912
Number of
transistors 102 103 62 76 55 60 79
        Clock 
        LA 
        IP0 
        IP7 
        OP0 
       OP7
Figure 3: Output flipping due to charge sharing in
the power, delay and area optimized PE.
posed in [5] is presented. Under normal situations, if the
input(s) with higher priority is logic 1, all the outputs of
lower-priority bits should be logic 0, which is shown as in
the green circle in Figure 3. However, given some special
combinations of inputs, some outputs will be flipped, lead-
ing to severe malfunction of PEs. The output flipping due
to charge sharing is displayed with the red circle in Figure
3, where the logic 1 of OP7 is unexpected.
4.3 Robustness of the Proposed PE
The advancement of the proposed PE is mainly reflected in
three aspects, i.e., free of race condition, charge sharing and
the suitability for comprising higher order PEs. The latter
two advantages could be verified straightforward given the
fact that the outputs are connected to the lowest part of
the circuit and that the third-level look-ahead signal LA is
adopted to realize cascading in higher order PEs. Here we
only present the simulation result in Figure 4 to validate the
race-condition-free property of the new PE.
Consider the scenario given in the left green circle in Fig-
ure 4. At the beginning of the evaluation phase, the current
stage owns the priority (LA is logic 0), OP3 is charged. Then
LA arrives later, which will discharge OP3 immediately, out-
putting the correct results. Again, when the current stage
changes from disabled to enabled due to the arrival of a logic
0 LA, OP1 corresponding to the bit which owns priority will
immediately turned to logic 1 without waiting for the next
evaluation phase. Due to careful design of the look-ahead
structure, any potential erroneous arrival time of the look-
ahead signal due to problematic timing design of other parts
of a system will be fixed within the PE, without passing fault
outputs to following levels.
Clock 
Active-low 
LA 
IP0 
IP1 
IP2 
OP0 
OP1 
OP2
Figure 4: Robustness of the proposed PE against
race condition.
5. CONCLUSIONS
A comprehensive review of existing PEs and their fair perfor-
mance comparison are presented using a 45 nm technology
in terms of three design metrics – power, delay and number
of transistors. Moreover, we analyze these designs closely to
identify their potential limitations, including possible race
and charge sharing problems, and infeasibility to compris-
ing higher order PEs. Our analysis shows that almost all the
existing PEs suffer from one or more of these disadvantages.
In order to obtain a PE which is capable of overcoming these
shortages, a robust PE structure is proposed, which is val-
idated to be charge-sharing and race-condition free, proper
for cascading and also have one of the best power and delay
performance.
6. REFERENCES
[1] J. Delgado-Frias and J. Nyathi. A vlsi
high-performance encoder with priority lookahead. In
Proceedings of the 8th Great Lakes Symposium on
VLSI, 1998., pages 59–64. IEEE, Feb. 1998.
[2] C.-H. Huang, J.-S. Wang, and Y.-C. Huang. Design of
high-performance cmos priority encoders and
incrementer/decrementers using multilevel lookahead
and multilevel folding techniques. IEEE Journal of
Solid-State Circuits, 37(1):63–76, 2002.
[3] S.-W. Huang and Y.-J. Chang. A full parallel priority
encoder design used in comparator. In Proceedings of
53th IEEE International Midwest Symposium on
Circuits and Systems (MWSCAS), pages 877–880.
IEEE, 2010.
[4] C. Kun, S. Quan, and A. Mason. A power-optimized
64-bit priority encoder utilizing parallel priority
look-ahead. In Proceedings of the 2004 International
Symposium on Circuits and Systems, 2004. ISCAS’04.,
volume 2, pages II–753–II–756. IEEE, 2004.
[5] J. Mohanraj, P. Balasubramanian, and K. Prasad.
Power, delay and area optimized 8-bit cmos priority
encoder for embedded applications. In Proceedings of
10th International Conference on Embedded Systems
and Applications, pages 111–113, 2012.
[6] P. Panchal, C. Vinitha, R. Srivastava,
P. Balasubramanian, and N. Mastorakis. Design of
8-bit dynamic cmos priority resolvers based on
active-high and active-low logic. Communication
Systems, pages 82–85, 2013.
[7] J.-S. Wang and C.-H. Huang. High-speed and
low-power cmos priority encoders. IEEE Journal of
Solid-State Circuits, 35(10):1511–1514, 2000.
