Optimization methods of logic circuits for Moore finite-state machines are proposed. These methods are based on the existence of pseudoequivalent states of a Moore finite-state machine, a wide fan-in of PAL macrocells and free resources of embedded memory blocks. The methods are oriented to hypothetical VLSI microcircuits based on the CPLD technology and containing PAL macrocells and embedded memory blocks. The conditions of effective application of each proposed method are shown. An algorithm to choose the best model of a finite-state machine for given conditions is proposed. Examples of proposed methods application are given. The effectiveness of the proposed methods is also investigated.
Introduction
A control unit is a very important block of any digital system (De Micheli, 1994) . A model of a Moore finite-state machine (FSM) is used very often to represent the control unit (Baranov, 1994) . One of the most important steps in the design of FSM logic circuits is the encoding of its internal states. This step is known as the state assignment problem (De Micheli, 1994) . In this step binary codes are assigned to FSM internal states. The quality of the resulting combinational part of the FSM (cost/area, power consumption, maximum frequency) depends heavily on the of outcome this step. Because of their importance, state assignment methods are continually being developed. There are effective state assignment methods based on symbolic minimization (Devadas et al., 1988; Kam et al., 1998; Villa et al., 1990; 1998) . Genetics algorithms (Chattopadhyay, 2005; Micheli et al., 1985; Xia and Almaini, 2002) and other heuristics (Barkalov, 1998; 2005; Kania, 2004) are used for this problem solution, too. Let us point out that there is no universal effective state assignment algorithm fitting to any kind of control algorithm to be interpreted and logic elements to be used for the implementation of FSM logic circuits. This means that the peculiarities of components such as an FSM model, a control algorithm and logic elements should be taken into account to optimize the main characteristics of FSM circuits. Rapid evolution in semiconductor technology has resulted in the appearance of sophisticated VLSI circuits such as complex programmable logic devices (CPLDs) and fieldprogrammable gate arrays (FPGAs) (Maxfield, 2004; Altera, 2007; Xilinx, 2007; Latticesemi, 2007) . Such devices have enough resources to implement a complex digital system using only a single chip (Maxfield, 2004) . One of the issues of the day in this area is a decrease in the hardware amount in FSM logic circuits (Adamski and Barkalov, 2006; Barkalov and Węgrzyn, 2006) . The solution to this problem would permit to decrease the chip area occupied by an FSM circuit and give the potential possibility to increase the amount of digital system functions within the bounds of a single chip. In this article we are going to discuss the methods of Moore FSM design using a CPLD, which are popular to implement complex controllers (Barkalov and Węgrzyn, 2006; Kania, 2004) . Unfortunately, in contrast to the FPGA, modern CPLDs have no embedded memory blocks, which can be used to implement the system of data-path microoperations. Therefore, in this article we deal with hypothetic CPLD chips, where programmable array logic (PAL) macrocells are used to implement the systems of Boolean functions and embedded memory blocks are used to implement the table functions of the digital system (Barkalov and Węgrzyn, 2006) . The peculiarities of PAL macrocells are a wide fan-in and a very limited number of conjunctions (terms) per cell (Kania, 2004) . A peculiarity of the known embedded me-mory blocks is their configurability (Maxfield, 2004) . For example, an embedded memory block of FLEX 10K can be configured as a memory block with the following characteristics: 256 × 8, 512 × 4, 1024 256 × 8, 512 × 4, × 2, 2048 256 × 8, 512 × 4, × 1 (Xilinx, 2007 . This means that the number of embedded memory block outputs belongs to the set {1, 2, 4, 8}. The peculiarities of the Moore FSM are the existence of pseudoequivalent states (Barkalov, 1998) and the regular character of the system of output functions (microoperations) that makes its effective implementation possible using embedded memory blocks (Barkalov and Wegrzyn, 2006) . In this article, we propose methods to optimize the amount of PAL macrocells in the logic circuit of the Moore FSM based on the above mentioned peculiarities.
Background of Moore FSM Design
Let the control algorithm of a digital system be specified by a graph scheme of algorithm (Baranov, 1994 ) Γ = (B, E), where B = {b 0 , b E } ∪ E 1 ∪ E 2 is a set of the vertices and E is a set of edges. Here b 0 is an initial vertex, b E is a final vertex, E 1 is a set of operational vertices, and E 2 is a set of conditional vertices. The vertex b q ∈ E 1 contains a collection of microoperations Y (b q ) ⊆ Y , where Y = {y 1 , . . . , y N } is a set of microoperations of the digital system data-path (De Micheli, 1994) . The vertex b q ∈ E 2 contains some logic condition x e ∈ X, where X = {x 1 , . . . , x L } is a set of logic conditions (flags) (Adamski, 2006) . The initial and final vertices of the graph scheme of algorithm correspond to an initial state a 1 ∈ A, where A = {a 1 , . . . , a M }is a set of internal states of a Moore FSM. Each operational vertex b q ∈ E 1 corresponds to a unique state a m ∈ A. The logic circuit of the Moore FSM U 1 is represented by the following systems of Boolean functions:
where Here the combinational circuit (CC) forms the functions (1) and the circuit of formation of microoperations (CFMO) forms the functions (2). The register (RG) keeps the code K(a m ). The pulse "Start" is used to load the code of the initial state a 1 ∈ A into the register. The pulse "Clock" is used to change the content of the register. In this article we discuss the case when the CPLD technology is used in some SoPC. In this case the combinational circuit is implemented using PAL macrocells and the circuit of formation of microoperations is implemented using embedded memory blocks.
As a rule, the number of transitions H 1 (Γ) exceeds the number of transitions H 0 (Γ) of the equivalent Mealy FSM (Barkalov and Węgrzyn, 2006) . It leads to an increase in the number of PAL macrocells in the circuit of the Moore FSM compared with the equivalent Mealy FSM. The value H 1 (Γ) can be decreased taking into account the pseudoequivalent states of the Moore FSM (Barkalov, 1998) . The states a m , a s ∈ A are pseudoequivalent states if identical inputs result in identical next states for both a m , a s ∈ A. This is possible if the outputs of the operational vertices marked by these states are connected with the input of the same vertex of the graph scheme of algorithm Γ. Let Π A = {B 1 , · · · , B I } be a partition of the set A by the classes of pseudoequivalent states (I ≤ M ). There are two main methods of Moore FSM optimization based on pseudoequivalent states (Barkalov, 1998; Barkalov and Węgrzyn, 2006) :
• optimal encoding of the states;
• transformation of the codes of states into the codes of classes of pseudoequivalent states.
In the first case, the states a m ∈ A are encoded so that the codes of the states a m ∈ B i (i = 1, . . . , I) belong to a single generalized interval of the R-dimensional Boolean space. This leads to a Moore FSM U 2 that has the same structure as the Moore FSM U 1 . The algorithm from (De Micheli, 1994) can be used for such an encoding. In (Barkalov, 1998) it is shown that the number of transitions H 2 (Γ) of U 2 is decreased to H 0 (Γ). But such an encoding is not always possible (Adamski and Barkalov, 2006) . In the second case, the classes B i ∈ Π A are encoded by the binary codes K(B i ) with R 1 =] log 2 I[ bits. The variables τ r ∈ τ are used for such an encoding, where |τ | = R 1 . Let us point out that I = M 0 , where 567 M 0 is the number of the states of the equivalent Mealy FSM. This approach leads to a Moore FSM U 3 , with a code transformer (TC) (Fig. 2) combinational circuit implements the functions
and the code transformer implements the functions
The number of transitions of the Moore FSM U 3 is equal to H 0 (Γ). The drawback of U 3 is the existence of a block of the code transformer that consumes additional resources of embedded memory blocks (in comparison with U 1 ).
In our article we propose to combine the application of an optimal encoding of the states and the transformation of the states codes. In this case the block of the code transformer can be even eliminated if some condition holds. The proposed method is based on the following features of the hypothetical CPLD in use:
• the fan-in of PAL macrocells exceeds significantly the maximal possible number of literals in terms of the system (1),
• the number of the outputs of the embedded memory block can be chosen from some restricted area.
The first feature permits us to use more than one source to represent the code of the current state a m ∈ A. The second feature permits us to use some bits of the embedded memory block to represent the codes of the classes of pseudoequivalent states.
Main Ideas of the Proposed Method
Let the embedded memory block have q words if the number of its outputs t F = 1. If q ≥ M, then the embedded memory block should be configured in such a manner that it has
outputs. The final value of the number of the outputs t F is chosen from the set S p that contains the possible fixed numbers of outputs. For example, if t max = 6 and S p = {1, 2, 4, 8}, then t F = 4.
The total amount of the outputs t s of all embedded memory blocks of the circuit of formation of microoperations is determined as
In this case,
outputs are free and they can be used to represent the codes of the classes of pseudoequivalent states. If
then the graph scheme of algorithm Γ can be interpreted by a Moore FSM U 4 ( the combinational circuit forms the functions (3), and the circuit of formation of microoperations and the codes of the classes (CMOC) implements both the systems (2) and (4). In this case the block of code transformer is eliminated and the FSM states can be encoded in an arbitrary manner.
If (8) is violated, then we propose the following approach. Let us represent the set
otherwise
It is clear that the circuit of the code transformer should generate only the codes K(B i ), where B i ∈ Π B . Let us encode the states a m ∈ A in an optimal way (Barkalov, 1998), and let us represent the set Π B as Π B = Π D ∪ Π E . Here B i ∈ Π D if the codes of the states belong to a single generalized interval of the Boolean space. Now only the codes of the states a m ∈ A (Π E ) should be transformed, where A (Π j ) is a set of the states, where B, C, D, E) . It is to take enough R 2 = ]log 2 (|Π E | + 1)[ binary variables to encode the classes B i ∈ Π E . Let these variables form a set Z, where |Z| = R 2 . If
568
A. Barkalov et al. then the graph scheme of algorithm Γ can be interpreted by a Moore FSM U 5 (Fig. 4) .
Here the combinational circuit forms the functions
the CMOC forms both functions (2) and the functions
In the FSM U 5 the block of the code transformer is missing and the variables T r ∈ T represent both the states a m ∈ A(Π C ) and the classes B i ∈ Π D . The classes B i ∈ Π E are represented by the CMOC. In this case the number of inputs in the PAL macrocells is increased from
does not increase the hardware amount in the CC in comparison with the FSM U 3 . The cycle times of U 1 and U 5 are the same in the worst case. In the best case, the combinational circuit of U 5 has fewer levels than the combinational circuit of U 1 . This means that the cycle time of U 5 can be less than that of U 1 . Therefore, the proposed approach permits us to decrease the hardware amount without the decrease in the performance of the digital system. Let us point out that the cycle times of U 2 , U 3 , U 4 , U 5 are the same.
If (8) and (10) are violated, then we propose to represent the set
The codes of the classes B i ∈ Π F are kept in the CMOC and the variables z r ∈ Z are used for their representation, where |Z| = Δ t . The set Π G includes
classes, where
These classes can be encoded using the variables τ r ∈ τ , where |τ | = R 3 and
In this case we propose to interpret the graph scheme of algorithm Γ by a Moore FSM U 6 (Fig. 5) . Here the combinational circuit forms the functions
the CMOC forms both the functions (2) and (12), and the circuit of the code transformer forms the functions (4). In the FSM U 6 the number of the inputs of the PAL macrocells is equal to L + R + Δ t + R 3 , but the combinational circuit has the same hardware amount as in the case of the FSM U 3 . The block of the code transformer of U 6 has less hardware than that of U 3 . The Moore FSM U 6 has the most complex structure and its design method includes the biggest amount of steps in comparison with the FSM U 1 − U 5 . In our article we propose the design method of the FSM U 6 including the following steps:
1. Construction of a marked graph scheme of the algorithm Γ and the construction of the set of internal states
Construction of the partition Π
3. Optimal encoding of the states and the construction of the sets Π D and Π E .
4. Calculation of Δ t t and the construction of the sets Π F and Π G .
Encoding the classes
6. Construction of the table of the CMOC.
7. Construction of the modified structure table of the FSM.
8. Construction of the table of the code transformer.
9. Implementation of the FSM logic circuit.
The choice of a particular model depends on some conditions. In this article we propose the algorithm given in (Fig. 6) .
If the condition (8) holds, then the model U 4 should be chosen. Otherwise the optimal encoding of the states should be executed. If all classes B i ∈ Π A are represented by unique generalized intervals of the Boolean space (Π E = ∅), then the model U 5 should be chosen. (10) determines the optimal model of the Moore FSM for the interpretation of the graph scheme of algorithm Γ using the hardware of an SoPC with the CPLD technology.
Application Examples of the Proposed Methods
Let us discuss some examples in the case when the control algorithm is represented by the marked graph scheme of algorithm Γ 1 (Fig. 7) . The design method will be found from Fig. 6 using the parameter q of the embedded memory block in use. We can get the following characteristics of the control unit from Fig. 7: A = {a 1 , . . . , a 16 }, M = 16, (Baranov, 1994) for the states a m ∈ A. If the outputs of the vertices marked by a i , a j ∈ A are connected with the input of the same vertex of the graph scheme of algorithm Γ, then we will combine the transition formulas for these states into a single formula of transition. In the case of the graph scheme of algorithm Γ 1 , we can form the following system: 
It is clear that the states from the left-hand side of each transition formula are pseudoequivalent states. Thus, in the case of the FSM U 1 (Γ 1 ) we can form the partition Π A = {B 1 , . . . , B 7 } , where B 1 = {a 1 } , B 2 = {a 2 , a 3 , a 4 }, B 3 = {a 5 , a 6 , a 7 } , B 4 = {a 8 , a 9 , a 10 } , B 5 = {a 11 , a 12 , a 13 } , B 6 = {a 14 , a 15 } , B 7 = {a 16 } and I = 7. Let |B i | = n i and H i be the number of the terms in the transition formula for the class B i ∈ Π A . The number H 1 (Γ) of the lines in the structure table of the Moore FSM U 1 (Γ) can be found as
In the case of the FSM U 1 (Γ 1 ) we can get H 1 (Γ 1 ) = 45. This means that the structure table of the Moore FSM U 1 (Γ 1 ) has 45 lines. Some part of this table is shown in Table 1 .
This table is a basis to form the system (1). For example, from Table 1 we can get part of the Boolean equation for the function D 4 ∈ Φ :
570
A. Barkalov et al. Let us discuss the case when the system (2) is implemented using embedded memory blocks with q = 64 if t F = 1, and S p = {1, 2, 4, 8}. From (5) we can get t max = 4 and t max = t F , because t max ∈ S p . This means that the circuit of formation of microoperations of the Moore FSM can be implemented using ]N /t F [= 4 embedded memory blocks. From (6) we have t s = 16 and from (7) we have Δ t = 3. In the case of the FSM U 1 (Γ 1 ) we have I = 7. This means that R 1 = 3 and τ = {τ 1 , τ 2 , τ 3 }. The condition (8) holds, and according to the choice algorithm (Fig. 6) we should use the model U 4 for the interpretation of the graph scheme of algorithm Γ 1 .
Let us encode the classes B i ∈ Π A in a trivial way: y 3 y 9 y 11 z 2 z 3 y 2 y 3 z 2 z 3 y 3 y 5 y 7 z 1 y 3 y 9 y 11 z 1 y 1 y 9 y 10 z 1 y 9 y 12 z 1 z 3 y 3 y 13 z 1 z 3 y 4 z 1 z 2 00 01 11 10 00 01 11 10
T3T4
For example, the cell 0111 corresponds to the state a 8 with Y (a 8 ) = (y 6 , y 7 , y 8 ). Because a 8 ∈ B 4 with K (B 4 ) = 011, then the cell 0111 contains y 6 , y 7 , y 8 , z 2 and z 3 . The other cells from Table 2 are filled in the same manner.
To form a modified structure table of the Moore FSM U 4 (Γ 1 ) , replace the states a m ∈ B i and the left-hand side of each transition formula by the corresponding class B i ∈ Π A . This leads to the system 
The modified structure table corresponds to a system similar to (19) and it has the columns B i , K (B i ) , a s , K (a s ) , X h , Φ h and h. Moreover, it has 
The implementation of the logic circuit of the FSM U 4 is reduced to the implementation of the system (3) using PAL macrocells and the implementation of the systems (2) and (4) using embedded memory blocks. There are effective methods for such implementation (Barkalov and Węgrzyn, 2006; ) . We therefore exclude this step from our deliberations.
Let H i (D r ) be the number of the terms in the function D r (r = 1, . . . , R) for the FSM U i (i = 1, . . . , 6 ) . An analysis of the complete structure table of the
. An analysis of the complete modified structure table of the FSM U 4 (Γ 1 ) shows that
Let Q i (D r , S) be the number of PAL macrocells with S terms to implement the function D r ∈ Φ for the FSM U i (i = 1, . . . , 6) . Using the results from (Barkalov and Wę-grzyn, 2006 ), the value of Q i (D r , S) can be calculated as
If, e.g., S = 6, then Q 1 (D r , 6) = 5 and Q 4 (D r , 6) = 2 (r = 1, . . . , 4) . This means that the combinational circuit of U 1 (Γ 1 ) includes Q 1 (Γ 1 ) = 20 PAL macrocells and the combinational circuit of U 4 (Γ 1 ) includes Q 4 (Γ 1 ) = 8 PAL macrocells. Therefore, in this case the hardware amount in the combinational circuit is decreased to 60%. The numbers of embedded memory blocks in both the CMOC of U 4 (Γ 1 ) and the circuit of formation of microoperations of U 1 (Γ 1 ) are the same. The cycle times of both U 1 (Γ 1 ) and U 4 (Γ 1 ) are the same. Let us point out that in the case of the graph scheme of algorithm Γ 1 we have
Now let us discuss the case when q = 32, if t F = 1, and S p = {1, 2, 4, 8} . From (5) we can get t max = t F = 2. This means that the circuit of formation of microoperations of the Moore FSM U 1 (Γ 1 ) is implemented using ]N /t F [ = 7 embedded memory blocks.
From (6) we have t S = 14 and from (7) we have Δ t = 1. This means that the condition (8) is violated and an optimal encoding of the states should be applied. Using an algorithm from (De Micheli, 1994) we can get the following result regarding the optimal encoding of states of the FSM U 1 (Γ 1 ) ( Table 4 ). From the Karnaugh 
map of Tab. 4 we get
. From (9) we have R 2 = 3 and Δ t < R 2 . This means that the condition (10) is violated and the Moore FSM U 6 should be applied to interpret the graph scheme of algorithm Γ 1 . From (13) we get n F = 1, which implies n G = 3. Now we have the following sets of classes From the Karnaugh map (Tab. 4) we get the following codes: K (B 1 ) = K (a 1 ) = 0000, K (B 6 ) = * 110, K (B 7 ) = K (a 16 ) = 1010. Since Δ t = 1, we have Z = {z 1 } . Let K (B 2 ) = 1 and let z 1 = 0 means that the codes of the classes B i ∈ Π F are not used to form the current transition of the FSM. The number of variables in the set τ can be determined using (15). In our example we have R 3 = 2 and τ = {τ 1 , τ 2 } . Let us encode the classes B i ∈ Π G in the following manner: K (B 3 ) = 01, K (B 4 ) = 10, K (B 5 ) = 11. The input assignment τ 1 = τ 2 = 0 means that the codes of the classes B i ∈ Π G are not used to form the current FSM transition.
The CMOC of the Moore FSM U 6 (Γ 1 ) is represented by Tab. 5. 
The modified structure table of the Moore FSM U 6 is constructed based on a modified system of the formulae of transitions. In the case of the FSM U 6 (Γ 1 ) this system is represented by (19). This table has the same columns as the modified structure table of the Moore FSM U 4 . The column K (B i ) contains the code
j is the code of the class B i ∈ Π j (j = C, D, F, G) , ' * signifies concatenation. The number of lines H 6 (Γ) is determined as H 4 (Γ) . In the case of the FSM U 6 (Γ 1 ) we have H 6 (Γ 1 ) = 18. The transitions for the classes B 1 , B 2 , B 3 ∈ Π A are shown in Table 3 .
In this case the code of a m ∈ A is ignored and it is represented by the signs ' * in the column K (B i ) . This table is a basis to form the system (16). From Table  3 we can get, e.g.,
The table of the circuit of the code transformer contains the columns
In the case of the FSM U 6 (Γ 1 ) this table includes 6 lines (Table 6) . If some line of this table includes more than one state, then the column K (a m ) contains the generalized interval corresponding to the codes of these states. The table of the code transformer is a basis to form the functions (4). The codes of the states a m / ∈ A (Π G ) can be treated as "don't care" input assignments (McCluskey, 1986) and they can be used to minimize the functions (4). The Karnaugh map for the function τ 1 ∈ τ is shown in Tab. 8.
From this map we can get τ 1 = T 1 . Using the same approach, we can get τ 2 =T 1 ∨T 2 . Implementation of the logic circuit of the finite-state machine U 6 is reduced to the implementation of systems (4) and (16) using PAL macrocells and to the implementation of the systems (2) and (12) using embedded memory blocks.
In the case of the Moore FSM U 6 (Γ 1 ) we have (20) we get Q 6 (Γ 1 ) = 8. To implement the circuit of the code transformer of the FSM U 6 (Γ 1 ), it is enough to take only T C 6 (Γ 1 ) = 1 macrocell. Here T C i (Γ j ) means the amount of hardware to implement the circuit of code transformer of the FSM U i that interprets the graph scheme of the algorithm Γ j . Thus, only Q 6 (Γ 1 )+T C 6 (Γ 1 ) = 9 macrocells should be used to implement an arbitrary logic of the FSM U 6 (Γ 1 ) . Therefore, in this case the number of PAL macrocells is decreased to 55% in comparison with the FSM U 1 (Γ 1 ) . The other characteristics of both U 1 (Γ 1 ) and U 6 (Γ 1 ) are the same (the cycle time and the number of embedded memory blocks).
Analysis of the Proposed Method
Let us find an area where the FSM U i (i = 4, 5, 6) has less hardware amount than the FSM U j (j = 1, 2, 3). Let us use the probabilistic approach described in (Barkalov and Barkalov, 2005) . There are three key points in such an approach:
1. The use of the class of graph schemes of algorithm instead of a particular graph scheme of algorithm Γ. Each class is characterized by the parameters
It is clear that
where K(Γ) = |B| . Therefore p 1 (resp. p 2 ) can be treated as the probability of the event that a particular vertex of the graph scheme of algorithm Γ is an operational (resp. conditional) one.
2. The use of the matrix realization of the FSM circuit (Baranov, 1994) instead of the implementation using some standard VLSI. In this case we can find a hardware amount as the area of the matrices for a given structure of the logic circuit of the finite-state machine.
To study the relations S(U
are the areas of the matrices for the FSMs U i and U j , respectively. In (Barkalov and Wegrzyn, 2006) it is proved that such relations for the cases of the matrix realization are the same as for circuits implemented with standard programmable logic devices, such as PAL, PLA or PROM.
A matrix realization of the Moore FSM U 1 is shown in Fig. 8 . Here M 1 is a conjunctive matrix that implements the system F of the terms of the system (1). M 2 is a disjunctive matrix that implements the functions of the system (1). M 3 is a conjunctive matrix that implements the Reduction in the number of PAL macrocells in the circuit of a Moore FSM system A 0 , where each function corresponds to the conjunction A m (m = 1, . . . , M) to the code K(a m ) of the state a m ∈ A; M 4 is a disjunctive matrix that implements the functions (2). It is clear that the matrices M 1 and M 2 represent the combinational circuit, and the matrices M 3 and M 4 represent the circuit of formation of microoperations. The complexity of these circuits can be expressed as
A matrix realization of the finite-state machine U 4 is shown is Fig. 9 . Here the set F includes H 0 (Γ) elements, the set τ includes R 0 elements, where R 0 is the number of internal variables of the equivalent Mealy finite-state machine. It means that the complexity of the combinational circuit can be calculated as
It is clear from the method of design of the finite-state machine U 4 that
To find the range of effective application of the Moore finite-state machine U 4 we should examine the functions:
The function f 1 shows the decrease in the total area occupied by in matrices M 1 and M 2 due to the application of the model U 4 instead of the model U 1 . The function f 2 shows the total decrease in the hardware amount in this case.
To reduce the number of variables in the expressions (26)- (31) we can use the results of (Barkalov and Wegrzyn, 2006) , where the parameters L, R 0 , R, H 0 (Γ), H 1 (Γ) are expressed as functions of K(Γ) and some coefficients:
Here p from Fig. 10 that the application of the proposed method always gives less amount of hardware than the known methods. This gain is increased with a decrease in the number of the vertices of a graph-scheme of the algorithm Γ and an increase the number of operational vertices of graph-schemes of the algorithm Γ (increase in the parameter p 1 ). The average gain for the graph-scheme of algorithm with K (Γ) = 500 is equal to 39%. It follows from Fig. 11 that the Moore FSM with the proposed structure always requires less hardware amount than the known models of finite-state machines. This gain is increased with a decrease in the number of microoperations N. The average gain for graph-schemes of the algorithm with K (Γ) = 500 is near 32%. initial graph-scheme of algorithm (resp. decreasing the parameter K(Γ)) and decreasing the length of the codes of sets of microoperations in the initial graph-scheme of algorithm (resp. increasing the parameter P 3 ). The maximal gain is achieved for graph-schemes of algorithm with the number of vertices 100
The correctness of these results was checked in the following way for the case of an industrial CPLD with PAL macrocells. Some software was written for the design of all FSM models discussed in this article. This software uses the standard package WebPack of Xilinx (www.xilinx.com) and VHDL models of Moore finitestate machines. A separate program is used to set up the main parameters of embedded memory blocks to estimate their amount and to choose a particular FSM model. Our software permits the estimation of the number of PAL macrocells in the combinational part of the FSM. Experiments conducted with the use of the software confirm the correctness of the tendencies shown in Fig. 10 and 11 . But the total average gain was a bit less than it follows from these theoretical curves, and it was equal to, on average, near 28%.
Similar results were obtained for the comparison of the base models U 1 −U 3 and the proposed models U 4 −U 6 .
Conclusion
The proposed methods of the implementation of the Moore finite-state machine using PAL macrocells and embedded memory blocks allow decreasing the cost of the logic circuit of the control unit in comparison with the known methods of Moore finite-state-machine design. In this article the proposed methods are based on the following peculiarities of both the Moore finite-state machine and CPLD:
1. Existence of pseudoequivalent states (P 1 ).
2. Wide fan-in of PAL macrocells (P 2 ).
3. Existence of the set of fixed numbers for the outputs of the embedded memory block (P 3 ). Let us remind, that such blocks exist only for our hypothetical CPLD.
There following structures of the logic circuit of Moore finite-state machine are proposed in this article:
1. Moore finite-state machine U 4 based on the properties P 1 and P 3 .
2. Moore finite-state machine U 5 based on the optimal encoding of the pseudoequivalent states and properties P 2 and P 3 .
3. Moore finite-state machine U 6 based on the optimal encoding of the pseudoequivalent states, the properties P 2 and P 3 and the use of the code transformer.
Each of the proposed methods can be applied only if some conditions hold, which are different for different methods.
The choice of a particular method is supported by a special algorithm proposed in this article. Let us point out that these methods cannot be applied in the case of the Mealy finite-state machine, because it has no pseudoequivalent states. Our analysis of the effectiveness of the proposed methods showed that the method optimal in the given conditions always permits a decrease in the hardware amount in comparison with earlier known methods of Moore finitestate machine design. This decrease in hardware does not lead to a decrease in the performance of the control unit. There are some special cases such as Δ t = 0 or Π i = ∅ (i = B, C, . . . , G), where some other models of the Moore finite-state machine are more effective. These cases are the subject of our further research. The proposed methods can be modified for real CPLD, where embedded memory blocks are absent. In this case the system of microoperations is implemented using PAL macrocells, too. The same effectiveness of the proposed methods should be tested for both cases of the FPGA with embedded memory blocks and for the CPLD CoolRunner (www.xilinx.com) based on the PLA technology. Of course, the proposed methods should be modified to meet specific requirements of these chips.
