Abstract. In this paper, the problem of state encoding of FPGA-based synchronous finite state machines (FSMs) for low-power is addressed. Four codification schemes have been studied: First, the usual binary encoding and the One-Hot approach suggested by the FPGA vendor; then, a code that minimizes the output logic; finally, the so-called Two-Hot code strategy. FSMs of the MCNC and PREP benchmark suites have been analyzed. Main results show that binary state encoding fit well with small machines (up to 8 states), meanwhile One-Hot is better for large FSMs (over 16 states). A power saving of up to the 57 % can be achieved selecting the appropriate encoding. An areapower correlation has been observed in spite of the circuit or encoding scheme. Thus, FSMs that make use of fewer resources are good candidates to consume less power.
Introduction
Low-power design is nowadays a central point in the construction of integrated systems. It allows expensive packaging to be avoided, chip reliability to be increased, cooling to be simplified, and the autonomy of batteries be extended (or their weight to be reduced). The dynamic power dissipated in a CMOS circuits can be expressed by the well-known formula:
where, c n is the load capacitance at the output of the node n, f n the frequency of switching and V DD supply voltage. The dominant source of power dissipation in CMOS circuits is the dynamic power: the energy required in each cycle to charge and discharge each node capacitance. It is also referred as the capacitive power dissipation.
Main idea in the design of low-power FSMs is minimize Hamming distance of the most probable state transitions. However, this solution usually increases the required logic to decode the next state. Then, a tradeoff between switching reduction and extra capacitance exists. This paper addresses the state encoding problem in LUT based programmable logic, using Xilinx 4K-series FPGAs as technological framework. In Section II, the basic definitions are summarized, and a review of the traditional approaches is presented. In the next section, the characteristics of the benchmark circuits are highlighted. Finally, the main experimental results are summarized.
Preliminaries
A finite state machines is defined by a 6-tuple M = (Σ, σ, Q, q 0 , δ, λ), where Σ is a finite set of input symbols, σ ≠ ∅ is a finite set of output symbols, Q ≠ ∅ is a finite set of states, q 0 ∈ Q is the "reset" state, δ(q, a) : Q × Σ → Q is the transition function, and λ (q, a) : Q × Σ → σ is the output function.
The 6-tuple M can be described by a state transition graph (STG), where nodes represent the states, and directed edges, labeled with the input and output values, describe the transition relation between states. In hardware materializations, each state corresponds to a binary vector stored in the state register. From the current state and input values, the combinational logic computes the next state and the output function.
The binary values of the inputs and outputs of the FSM are usually fixed by the particular application, while the state encoding can be defined by the designer.
Traditional approaches for State Encoding
The traditional methods used to generate state machines result in highly-encoded states. This type of machines typically has a minimum number of flip-flops but require implementing wide combinatorial functions.
Early research on FSM state encoding intended to minimize area or delay. For example, the NOVA tool implements an optimal two level state encoding [3] , while the MUSTANG state assignment system [4] is targeted to multilevel networks. The JEDI tool [5] is a general symbolic encoding program (i.e., for encoding inputs, outputs, and states) targeted for multi-level implementations. This tool is included in the SIS system [6] .
Approaches for Low Power State Encoding
Main works in low-power FSMs compute first the switching activity and transition probabilities [7] . The key idea is the reduction of the average activity by minimizing the bit changes during state transitions. In [8] , a probabilistic description of the state machines is used. Then, the state assignment minimizes the Hamming distance between states with high transition probability. To obtain the probabilistic behavior of a general FSM, the STG is modeled as a Markov Chain, and the state algorithm problem is solved using log 2 n bits, where n is the number of states. A spanning tree based state encoding algorithm is implemented in [9] . The most important characteristic is that the representation is not limited to log 2 n. The resulting encoding can be ranging from log 2 n to n bits. Other interesting contribution are in [2] , [21] , [22] . 
FPGA State Encoding
The research line described above was targeted to gate arrays or cell-based integrated circuits. FPGA manufacturers and synthesis tools use One-Hot as default state encoding [10] , [11] . This assignment allows the designer to create state machine implementations that are more efficient for FPGA architectures in terms of area and logic depth (speed). FPGAs are plenty of registers but the LUTs are limited to few bits wide. One-Hot increases the flip flop usage (one per state) and decreases the width of combinatorial logic. In addition, the Hamming distance of One-Hot encoding is always two in spite of the machine size. It make easy to decode the next state, resulting attractive in large FSMs. However, a better implementation of small machines can be obtained using binary encoding.
Experiments
In this paper, each circuit was encoded in four ways: binary, One-Hot, Two-Hot, and a style proposed by JEDI [5] , named "out-oriented" in this paper. This last algorithm uses a binary state encoding that minimizes the output logic. Two-Hot reduces flipflop usage maintaining at the same time easy-decoding characteristic of One-Hot. Binary and "out-oriented" are highly encoded techniques, whereas One-Hot and TwoHot can be considered sparse encodings.
All the experiments use the MCNC91 benchmark set [12] together with two FSMs extracted from the former PREP consortium [13] . The original MCNC FSMs are defined using the KISS2 format [6] . So, the first step has been to write a KISS format translator into VHDL. It takes the KISS file, infers a Mealy or Moore machine, and finally writes the corresponding code. The program also generates a file containing an entity with the machine, and another with a top-level VHDL code with tri-states buffers in the pads to measure the off-chip current separately.
The benchmark FSMs were first minimized with STAMINA [14] . The number of inputs, outputs, next state rules and states (for both, the original circuit and the minimized one) are presented in Table 1 . Then, each description was translated into VHDL. The resulting code was compiled using FPGA Express [15] and Xilinx Foundation tools [16] into a XC4010EPC84-1 FPGA sample. All circuits have been implemented and tested under identical conditions. That is, all the electrical measurements are related to the same FPGA sample, output pins, tool settings, printed circuit board, input vectors, clock frequency, and logic analyzer probes. Random vectors were utilized to stimulate the circuit. At the output, each pad supported the load of the logic analyzer, lower than 3pf [17] . The circuits were measured at 100 Hz, 2MHz, and 4 MHz to extrapolate the static power. All prototypes include a tri-state buffer at the output pads to measure the offchip power [18] . Other alternatives to measure power are reviewed in [19] [20]. Table 2 shows the area, delay and power obtained for each benchmark circuit. Area is expressed in CLBs, but the number FF utilized is also indicated. The delay, expressed in ns, corresponds to the critical path. Finally, the dynamic power is shown in mW/MHz.
Experimental Results
Power Saving: Fig. 1 points out the power saving comparison: (a) OH (One-Hot) vs. binary encoding and (b) OH vs. "out-oriented". Positive values indicate power reduction obtained using OH encoding. The x axis represents the number of states for the FSM. The figure can be separated in three zones. For machines with up to eight states, binary encoding must be utilized to reduce power. For machines with more than 16 states always OH is the best choice. Finally, between 8 and 16 states, there is not clear the relation, but "out-oriented" is better than pure binary. On the other hand, TH (Two-Hot) encoding consume more than OH in almost all cases, but it is better than "out-oriented" and Binary for big FSMs. Table 2 . Area, Time and Power for the benchmark set.
States-Power relationship:
For any state encoding, the power is linearly correlated with the number of states. The coefficient R 2 for the different regression analysis is over 0.85 (Fig. 2) . Power is even more correlated (R 2 ≅ 0.87) respect to n+i (number of states plus number of inputs).
States-Area relationship:
In this case, the correlation is similar to the previous analysis, with a R 2 ≅ 0.80. ≅ 0.91) and it can be used as a primary approach to decide for a state assignment. The Fig. 3 represents this distribution. A comparison between area and power shows that the 77% of the benchmark circuits, the smaller circuit consume lower power.
Other correlation like States-Delay are not visible (R 2 lower than 0.6). Area, time and power correlation with the others FSM parameters (inputs, outputs, rules) and combinations of this parameters, neither produce significant results. 
CONCLUSION
This paper has presented an analysis of the state encoding alternatives for FSMs. The main conclusions are that in small state machines (up to 8 states), area, speed and power is minimized using binary state encoding. On the contrary, One-Hot state encoding is better for large machines (over 16 states). A comparison between 26 test circuits shows important differences in power consumption. Depending on the state encoding, reaching up to 57% of power saving can be obtained. The Two-Hot approach do not offer advantages over One-Hot, nevertheless it is better than binary for big FSMs. The Out-oriented is a binary encoding that's minimize the decode logic and its in average better than pure binary. Finally, a clear area-power relationship exists. It can be used to estimate power during the design cycle using the information provided for the synthesis tool.
