Abstract-This short report first indicates a design flaw in the contention resolver unit proposed in [1] and then proposes an improved design which is simpler and faster.
INTRODUCTION
CONTENTION resolver is required in a parallel or distributed system to resolve the request conflicts in accessing system resources. A complete logic designed for the contention resolver in a TCAM-based parallel search engine is proposed in [1] . We will describe the design in [1] and its flaw and propose an improved design whose logic is simpler and faster.
Consider a generic parallel system that has M selectors and N search engines. Each search engine is prepended with a contention resolver (CR). After receiving a search key, each selector employs some selection criterion to select which search engine the received key will be sent to for processing. If selector x selects the search engine i, a one-bit request signal Req i x is set and other request signals Req i x for j 6 ¼ i are reset. Each selector is given a priority called hold priority (HP) which is an ðM À 1Þ-bit value that is a continuous string of k 0s followed by a continuous string have no superscript i because all of the CRs receive the same keys and hold priorities. CR allows only one of the contended requests with the highest priority to proceed and puts the other requesting selectors on hold. A selector that is put on hold will not receive a new search key in the next cycle, but will repeat the requesting process for the same key. To avoid starvation, the on-hold selectors must adjust their priorities as follows: Initially, a selector sets its HP value to zero, i.e., a continuous string of M À 1 0s. When a selector receives a Hold signal, its HP value is shifted left one bit and a "1" is added in its least significant bit (LSB). The function of the ith contention resolver (CR i ) is implemented by the equations in Table 1 .
The highest HP value among all the requesting selectors for CR i is computed in (1) . In (3), we set
x ¼ 0 otherwise. Since we only need to take the requesting selectors into consideration for computing B i x , (2) obviously needs to be corrected as shown in (2)'.
Equations (4)- (7) determine the final winner (say selector x) among all of the candidate selectors by setting its S Table 2 shows the incorrect result of S i i ¼ 1 at cycle 1, while the correct one should be S i 4 ¼ 1. This incorrect result will occur once every three cycles and, thus, CR i processes only two requests every three cycles.
PROPOSED CONTENTION RESOLVER DESIGN
The main design flaw of the contention resolver in [1] is that the converted priority (i.e., H i x À!
) cannot be used to distinguish whether the request signal Req i x of a selector is set or not. As a result, a nonrequesting selector may be selected, as shown in the example of Table 2 . This design flaw also slows down the speed of the contention resolver because the special case of all requesting selectors having a priority zero must be handled by some additional equations (i.e., (5)- (7)). To solve this design flaw, we Table 1 will be set to 1. Therefore, we use (11) to select the one with the lowest ID among all of the selectors whose B i x is one. Because the selector that is on hold for M À 1 cycles will be the only one having the highest HP of 1 . . . 1 (M 1s), it will be selected within M cycles and thus will not starve. The proposed contention resolver clearly uses less number of logic equations than that in [1] . The example in Table 2 shows that CR i correctly processes one request every cycle. To show the speed and cost advantage of the proposed contention resolver over the one in [1] , we conduct simulations by using Verilog HDL synthesized in Synopsys design vision with the standard cell from the Artisan TSMC 0.18m cell library. With the number of selectors (M) varying from 4 to 15, our results show that the proposed contention resolver needs only 53-71 percent of the chip area and 67-77 percent of the critical path delay needed in [1] .
