ABSTRACT
INTRODUCTION
Fuzzy logic [1] has been successfully employed in many complex applications. With the progress made in real-time applications, the inference speed required may reach up to the range of Mega fuzzy logic inferences per second (MFLIPS). Several hardware architectures [2] [3] [4] [5] [6] have been proposed to support general-purpose fuzzy inference execution at higher speed. However, these architectures have the following property: in each fuzzy inference execution, all the rules in the knowledge base are required. As a result, the inference speed is limited by the total number of rules in the knowledge base.
In fact, many fuzzy applications have the following property: in an individual fuzzy inference execution, the active rules are only a small part of the total rules. For example, as the fuzzy systems described in [7] [8] [9] [10] , the maximum number of active rules among all the combinations of input values is only four. Since the non-active rules make null contribution to a fuzzy inference execution, the non-active rules can be ignored. Based on this observation, some hardware architectures [11, 12] have been proposed to ignore the non-active rules in the earlier stage of the pipeline. However, they [11, 12] still need to compute the weight of each rule during the fuzzy inference execution; otherwise, they cannot determine the set of active rules.
In this paper, we propose a new architecture for the fuzzy applications, whose active rules are few in each fuzzy inference execution. Note that, once the input values are given, the active rules in a fuzzy inference execution has been determined. Based on this property, our approach only extracts the active rules according to the input values. Compared with previous hardware architectures [11, 12] , the main advantage of our approach is that: previous hardware architectures ignore the non-active rules during the fuzzy inference execution, whereas our approach ignores the non-active rules before the fuzzy inference execution.
Following the same specification in [12] , each membership function is assumed to be trapezoid-shaped. The new features in this paper are below:
(1) The rules in the knowledge are sorted in the sequence of their antecedent membership functions. Therefore, we can use binary search strategy to extract the active rules with respect to the input values. (2) The set of active rules are dependent on the input values. Therefore, to handle the dynamic condition, we design a scheduling unit to arrange 4 active rules to enter the fuzzy inference execution per pipeline stage cycle.
MOTIVATION
If the weight of a rule is zero, it has null contribution to the fuzzy inference. Therefore, when the input values are given, we say that a rule is active if and only if the weight of this rule is positive. Note that, in each fuzzy inference execution, only a part of the total rules are active.
Let's use the following fuzzy system as an example. In this example, we observe that: for each input variable, the number of membership functions that are overlapped with a fuzzified input is only 2. Therefore, in each fuzzy inference execution, the number of active rule is only 4 (i.e., 2*2). Especially, with the increase of the total rules, the percentage of active rules is very low. Based on the above discussion, we know that the active rules are only a part of the total rules. If we can ignore the non-active rules before fuzzy inference execution, the performance can be significantly improved.
THE PROPOSED ARCHITECTURE
The proposed fuzzy inference processor has two inputs and one output. For each input, the maximum number of sets of membership functions is 32. For each output, the maximum number of sets of membership functions is 16. The maximum number of fuzzy rules is 1024. To speed up the process of fuzzy inference, we apply parallel and pipeline structure to design the fuzzy inference processor.
Our architecture is designed based on [12] . We divide the fuzzy inference processor into 8 pipeline stages: Fuzzifier (fuzzy rule database), Detection, Scheduling Unit, Fuzzy Decoder, Access Rule, Fuzzy Decision, Maximum Unit, Accumulator and Divisor. Each pipeline stage needs 8 clock cycles to complete its process. Note that the last six pipeline stages of our fuzzy inference processor are the same as [12] . Due to the limitation of pages, in this paper, we only introduce the first two pipeline stages of our fuzzy inference process.
FUZZY RULE DATABASE
In order to match up our fuzzy inference processor structure, the membership function and the fuzzy rule database are designed as fixed forms. In the design of membership function, we have some constrains. As shown in Figure 2 , if i is smaller than j, then A ia must be equal to or smaller than A ja , and A id must be equal to or smaller than A jd (A ib is not necessarily equal to or smaller than A jb , and A ic is not necessarily equal to or smaller than A jc ). 
DETECTION
The detection unit is to find intersection between membership functions and input variables. It is time-consuming if we sequentially check the intersection between membership functions and input variables. Since our design of fuzzy inference processor has most 32 sets of membership function at the antecedent part., we use binary search strategy as shown in Figure 3 . The detection unit is to find: (1) the address of starting membership function (StartA) and the address of end membership function (EndA) that intersects with input variable X; (2) the address of starting membership function ( StartB) and the address of end membership function (EndB) which are intersected simultaneously with input variable Y.
In Figure 3 (a), we use two registers to store the offset between the present memory address and the next memory address. The Control Signal is used to control the Add/Sub to perform addition or subtraction. Figure 3 (b) illustrates the change of two registers in Figure 3 (a) , in which the notation Clk1 denotes the first clock cycle, Clk2 denotes the second clock cycle, and so on. Figure 4 gives the circuit used in the detection unit to perform the binary search strategy for finding the address of starting membership function that intersects with input variable X (StartA). In Figure 4 , when the Rnew signal is 0, it will obey the rule data and rule address given to update the corresponding membership function; when the Rnew signal is 1, it will access the membership function corresponds to the address of starting membership function which obtained from binary search.
Let's consider an input variable X that has five membership functions: A0, A1, A2, A3 and A4. Suppose that the elements of A0, A1, A2, A3 and A4 are (0, 0, 5, 10), (5, 10, 15, 20) , (10, 15, 20, 25) , (25, 30, 35, 40) , and (35, 40, 45, 50), respectively. When the element of input variable X is (13, 14, 15, 16), we can use the following binary search strategy to find the address of starting membership function (StartA) that intersects with input variable X:
1. At the beginning, the initial values of memory address and offset are 2 and 2, respectively. Because the membership function A2 intersects with input variable X, the address of starting membership function (StartA) that intersects with input variable X have to be smaller than or equal to 2. The Control Signal sends out Sub signal at the same time. Consequently, memory address is 0 and offset is 1 at next global clock cycle.
2. Due to the intersection between membership function A0 and input variable X is a null set and the place of A0 is in the front of input variable X, the address of starting membership function (StartA) that intersects with input variable X have to be bigger than 0. The Control Signal sends out Add signal simultaneously. Consequently, memory address and offset are updated to 1 and 0, respectively, at next global clock cycle.
3. The address of starting membership function (StartA) that intersects with input variable X is updated to 1, since A1 that intersects with input variable X.
4. We can obtain StartB using similar steps. The StartB is updated to 2. 
IMPLEMENTATION RESULTS
The proposed fuzzy inference processor has been implemented by using a 0.35μm cell library. Through verification and timing analysis, the clock rate of the global clock is up to 190MHz. Because a pipeline stage takes 8 global clock cycles, the maximum performance of the proposed architecture is 23.75 MFLIPS. Table 2 depicts the maximum performance of fuzzy inference processor under different number of active rules. Table 3 tabulates the comparisons of our approach with other hardware architectures, including [2], [4] , [5] , [6] , [11] , and [12] . Note that, due to the parallel processing, the number of inputs (outputs) has almost no influence on the circuit performance. Therefore, even though the input numbers of these architectures are not the same, we still can compare them. Furthermore, although these architectures are implemented in different process technologies, we find that the improvement of our approach is very significant. Therefore, our approach is the fastest hardware implementation.
We use the proposed fuzzy processor to implement two control systems, including backing-up control system and cart-pole balancing. Figure 5 gives the control surface of backing-up control system. Figure  6 gives the control surface of cart-pole balancing. 
CONCLUSION
In this paper, we present a high-speed VLSI fuzzy inference processor with rule analysis. The maximum frequency reaches to 190MHz, and the maximum performance reaches p to 23.75 MFLIPS (Mega Fuzzy Logic Inferences Per Second). Compared with the existing hardware implementations, our approach is the fastest hardware implementation. 
REFERENCE
[
