Abstract-To reduce dynamic power dissipation in digital circuits, a dependency graph (DG) is derived for a sequential circuit to accomplish verification and synthesis of clock-gated circuits. This is used recursively to derive sufficient conditions for a given bank of flops (flip-flops) to be legally clock gated (disabled.) These conditions are expressed with linear temporal logic (LTL)/past LTL (PLTL) properties, which can be used to create hardware monitors and justified by hardware model checkers. For sequential equivalence checking (SEC), LTL/PLTL properties are formulated to be proved on a clock-gated circuit (R) derived from a "golden" circuit (G). If these sufficient conditions can be proved on R, then the clock gating structures are proved redundant and can be removed. This creates a simplified circuit (R') and makes the SEC task easier. Experiments were performed on a set of benchmarks. It was observed that since the properties are expressed in terms of the control signals which only appear in the DG, they are quite easy to prove on R because the DG abstracts away complicated arithmetic logic. Similarly, the miter between G and R' is usually proved easily by model-checking methods because of the increased similarity between G and R' in sequential behaviors, compared to the changes between G and R. The proposed formulation is extended to provide a systematic and automatic method for sequential clock-gating synthesis. Experiments showed that the DG-based framework for synthesis gave encouraging results.
To reduce the frequency of updating FFs and hence dynamic power consumption, clock gating synthesis disables the clock to a bank of FFs under appropriate conditions. It is highly effective since FF switching is a major source of power usage. Usually the change in the circuit from the golden one (G) to a synthesized clock-gated version (R) is small. However, this transformation is sequential, i.e., next-state functions of FFs are different, which makes SEC hard. Since the I/O behavior of the two circuits must be equivalent, the extra signals created by clock gating synthesis must be redundant. If this redundancy can be identified, proved redundant and removed, SEC can be made much easier.
We focus on clock-gating done on data paths and identify clock-gating constructs by building a dependency graph (DG) for a circuit with conditional dependencies controlled by clock enabling signals. The DG abstracts away combinational operators like arithmetic components, which are usually the source of most of the difficulties in SEC. The DG is used recursively to formulate linear temporal logic (LTL)/past LTL (PLTL) properties that are sufficient for a targeted enabling signal to be redundant, i.e., its removal preserves I/O equivalence. If these redundancies are removed one by one, then a revised optimized circuit R' is created, which is more similar to G. A final step in SEC is to create a miter between G and R' and use model checking to prove equivalence.
For synthesis, the process of clock gating is the reversal of the above; instead of removing redundancies we add them; a DG is constructed for G and for each bank of FFs identified as a candidate for clock gating; an LTL/PLTL property, sufficient for legal clock gating is formulated. An enabling signal is synthesized by translating this formula into a monitor using the methods of [7] . Since this is sufficient, the output of the monitor provides a legal clock gating signal. This enabling signal may be stronger than that identified by the designer since it explores the DG recursively and arbitrarily deeply.
II. PRELIMINARY

A. Sequential Clock-Gating
Clock gating is done usually by replacing the clock input to an FF by the AND of the clock with an enabling signal (en) (see Fig. 1 ). When en = 1 the clock input to the flop causes new data to be captured; otherwise the data in the FF is unchanged. To represent this in a circuit to be used for synthesis or verification, we need to model the gated clock behavior because usually the clock input to the FF is not an explicit signal. Hence an equivalent model of behavior as in the right side of the figure is used, which shows a multiplexer (MUX) enabled by en in front of the FF and a feedback loop from the FF output to an input of the MUX. Clock-gating synthesis can be based on either satisfiability or observability of signals. Satisfiability clock-gating turns off clocks for FFs when the input data is identical to the value in the previous time frame. Hence a satisfiability clock-gating condition depends on circuit behaviors happening from the past to the current clock cycle. For instance, the algorithm proposed in [2] focused on finding satisfiability clock-gating conditions with satisfiability solvers and three-value abstractions.
On the other hand, observability clock-gating is used to disable updating FFs when these updates are not observable at any POs. In other words, the differences (updating or not updating) of gated FFs cannot be propagated to any of the POs. Hence an observability clock-gating condition relates to the circuit behaviors from now to the future.
After clock-gating synthesis using appropriate synthesized enabling signals for various banks of FFs, verification requires that the circuits before and after synthesis are sequentially equivalent [3] . SEC for general circuits is typically formulated as a model checking problem on the miter between the two sequential circuits to be compared (where the PI pairs are merged and the PO pairs are XORed to form the outputs of the miter circuit). Then, sequential model checking techniques, including induction [16] , bounded model checking [8] and property-directed reachability (PDR) [12] , can be applied to check if the outputs of the two circuits are always identical under all possible input sequences. If any pair of POs is ever evaluated to different values under the same input sequence, sequential equivalence is violated. In this case, model checking can provide an input sequence leading to the violation. Otherwise, the two circuits are proved sequentially equivalent (or the problem was too hard).
Due to the P-SPACE complexity of general sequential model checking, some SEC problems may be too hard. Savoj et al. [14] , [15] proposed a combinational approach to SEC for clock-gating synthesis. This approach aimed at circuits synthesized using satisfiability and observability don't cares [13] . Moreover, Dai et al. [11] constructed characteristic graphs (CGs) to bypass irrelevant combinational logic and formulate sufficient conditions of legal clock-gating, based only on found clock-gating structures. However, the CG method used in [11] can fail to prove some legal clock-gating conditions because CGs can lack essential information.
B. Transparent Logic and Data Dependency
To include control logic and data dependency information for clock-gating conditions while possibly excluding irrelevant combinational logic in the analysis, a DG for a circuit is constructed based on all recognized transparent logic. 
Definition 1:
A transparent word is a set of signals, {w k }, with supports, {S k }, where under some evaluation of
For example, an m-bit word from a set of 2-to-1 MUXes controlled by the same selector signal s
comprises a transparent word C, where
). For this case, word C is transparent from words A or B, depending on the value assigned to s. This transparent block (words A and B to word C) is called completely specified because each minterm of its control signal s makes the output word equivalent to one of the input words. An, incompletely specified transparent block introduces more complicated data dependencies. Consider the functional block from words A, B, and C to I in Fig. 2 . I is equivalent (transparent) to one of the input words only when one-hot assignments are applied to the control signals, e.g., (s 1 , s 2 , s 3 ) = (1, 0, 0) makes I equivalent to A. When a non one-hot assignment is applied to the control signals, such as (1, 1, 0), I depends on both A and B. Thus, to establish all transparencies for incompletely specified transparent blocks, all minterms of control signals must be examined.
Recognizing transparent logic provides more insight into the data dependencies in circuits, where the control signals are used to regulate data flows. Methods for detecting transparent logic and its control conditions were defined and implemented in [10] . Considering transparent blocks of logic can strengthen clock-disabling conditions for some FFs, and hence save more dynamic power. For satisfiability clockgating cases, more detailed information about data dependency can assist in defining more conditions when data in a flop need not be updated. For observability clock-gating cases, paths to observable outputs can be blocked by transparent logic, and this can result in the clock being disabled more often.
Example 1: Fig. 2 contains a set of gated FFs, F 2 , which are processed by a word-level square root operation ( √ ) and then fed into a transparent block (words A, B, C to I.) I depends on A (through F 2 ) only when F 1 = s 1 = 1. Hence, F 2 need not be updated when s 1 = 0 in the next time frame; when s 0 = 0, the clock to F 2 can be disabled. This example of legal observability clock-gating can be identified only when transparent logic is considered. Note that transparent logic is not just simply a set of MUXes controlling a data-path; it can be an incompletely specified transparent block.
To formulate legal clock-gating conditions on circuits that may have transparent blocks, we propose to construct a DG and then formulate a set of properties sufficient for legal clockgating.
C. Formulation and Proof of LTL and Past LTL Properties
In this paper, to formulate and verify legal clock-gating conditions on sequential circuits, LTL and PLTL operators are used to represent sequential properties.
1) LTL and Past LTL Properties:
To address observability clock-gating conditions, the following LTL operators are used in this paper.
1) X a-"next": a holds in the next cycle.
2) [a U b]-"until": a remains True at least until b becomes True, which can happen at the current or a future time frame. Temporal formulas for satisfiability clock-gating conditions are expressed in PLTL. Only those used for constructing clockgating properties in this paper are explained. Detailed formal semantics for PLTL can be found in [4] .
The following PLTL operators are used. 1) Y a-"Yesterday": a was True in the previous time frame and False in the first time frame. 2) Z a-"Invariant Yesterday": a was True in the previous time frame and True in the first time frame. 3) [a S b]-"Since": a) b was True at least once in a past (or the current) time frame and b) a was True since the cycle after b was last True. Note that Y and Z are past-duals of X (next), while S is the past-dual of U (until) in common LTL.
2) Proving Properties on Circuits: Claessen et al. [7] provided methods for expressing LTL/PLTL formulae as circuits, including G(z ⇒ {op}a) and G(z ⇒ [a{op}b]), where {op} is an arbitrary LTL or PLTL operator. They constructed a hardware monitor circuit to represent each target property to be proved.
In Section IV, after constructing all signals required by (2)-(4), the algorithm proposed in [7] is used to verify clock-gating conditions. For further details, please refer [7] .
III. DEPENDENCY GRAPH
A DG is an abstraction of a circuit. The DG addresses clockgating conditions related to data dependencies at a high-level, providing essential information for synthesis and verification.
A. Dependency Graph
A DG = (E, V) is a directed graph, where each vertex associates a set of signals with a certain subcircuit in the corresponding sequential circuit. Each directed edge represents a data dependency.
Eight types of vertices are defined: 1) primary inputs; 2) constants; 3) primary outputs; 4) standard flip-flops; 5) transparent blocks; 6) combinational clouds; 7) signal gated FF vertex with one enable signal. More details for such MUX collapsing are discussed in Section III-B. Notice that for incompletely specified transparent blocks, such as words A, B, and C to I in Fig. 2 , the assignments for transparency as well as all possible assignments for selector signals should be represented in the DG. This is because: 1) there is no guarantee that only the control assignments valid for transparency can happen and 2) other assignments can still result in output updating, which should be considered for satisfiability clock-gating conditions.
Example 1: Fig. 4 demonstrates the DG constructed for the circuit in Fig. 2 
B. Construction of Dependency Graph
A DG is constructed from a sequential circuit and its identified transparent blocks as follows: 1) recognize gated FF vertices; 2) complete incomplete transparent block vertices; 3) create other vertices, including combinational clouds; and 4) build dependencies.
According to the algorithm for identifying transparent blocks in [10] , initially a set of bits is grouped together because they have identical control signals and assignments. Then for each group, if some bits are supported by different sources, or some of them drive different transparent words in their fanout cones, the group is decomposed into subgroups as transparent words. Thus, the identified transparent blocks are examined and decomposed to ensure that: 1) all bits of any one word are supported by identical input words and 2) all bits support the same set of words in their fanout cones. Hence for each DG, a vertex depends on all bits of each vertex supporting it, so there are no redundancies where a bit is connected to a vertex but cannot influence the vertex value.
1) Recognizing Gated FF Vertices: Given a set of recognized transparent blocks, MUXes that drive FFs directly are investigated first. When a fully specified transparent block forms a clock-gating structure (see Fig. 1 ), where the FF output drives one data input of a corresponding 2-to-1 MUX (a MUX feedback loop), then the other input word is examined. If it comes from another transparent word, and each bit of this is supported by the FF in its fanout cone under the same control condition, then this transparent block also contributes to the behavior of a set of clock-gated FFs. The procedure continues until no such MUXes connecting to the target FFs can be found. Then all selector signals of the collected MUXes are combined into a single enable signal for this clock-gating vertex.
Example 2: In Fig. 5(a) , the MUXes (controlled by s 4 ) directly driving FFs (F) are examined first. The process examines the other input side (not driven by F) and collects the MUXes controlled by s 3 . Continuing, when it reaches transparent word D, this process terminates because D is driven by two input words (C and D), where none of them is F. In this paper, a gated FF vertex can switch only between one input data and itself, so the block from A, B, and F to D cannot be included in the gated FF vertex. Finally, s 3 and s 4 are combined to form the enable signal of a gated FF vertex, with data input D.
Note that although the MUXes controlled by s 0 are supported by F, they are excluded from the gated FF vertex F because there exists another transparent block (outside the clock-gating part) between its output C and F.
2) Complete Transparent Block Vertices: After creating all gated FF vertices, the remaining transparent blocks are analyzed to create corresponding vertices. For an incompletely specified transparent block, some assignments of controls do not result in transparency, but dependencies between inputs and outputs still exist. Sometimes the transparent logic may be used to reset some FFs to constants and then make the output value different from the previous time frame. Hence, it is necessary to analyze all possible control assignments and create extra constant vertices to represent the entire functionality of the block. if block forms clock-gating structure then 4: en = collectGating(block) 5 :
6: for each block in {TransparentBlocks − V} do 7: ConstVs = analyzeTransparency(block) 8 :
for each signal in PO or FF or PI do 10: if signal is not covered by any v in V then 11 : 14: for each v in V do 15: if v drives multiple fanouts then 16 :
for each input vertex supporting v do 18:
vertices are created to cover these. Other words, consisting of internal signals (outputs of some combinational clouds), like A in Fig. 2 , are labeled temporarily as words. Then vertices for the remaining FFs, PIs, and POs are created one by one, where each vertex only covers one signal. Note that if an FF or PI only controls transparent blocks (not supporting other signals), like F 1 in Fig. 2 , a vertex is not needed for it, because it can only influence paths to POs through controlling the transparent block. This dependency has been addressed in the transparent block vertex.
The next task is to create combinational cloud vertices. First, each word or vertex accumulates a list of support vertices (words) by following the fanin cone of this word (vertex) until reaching another vertex (word). One combinational cloud is created for each set of vertices that have identical sets of support words. Different sets of support words lead to different combinational vertices.
The algorithm for constructing the DG G = (V, E) for a sequential circuit, Cir, with its set of identified transparent blocks, is shown in Algorithm 1.
In lines 2-5, gated FF vertices are constructed, where collectGating(...) performs backtracking to find clock-gating conditions for each set of gated FFs. Then the remaining transparent blocks are processed by lines 6-8, where analyzeTransparency(...) examines all minterms of control signals for incompletely specified blocks to ensure all data dependencies are represented. During the two phases above, the vertices for some input data words are created if one word is a set of PIs or a set of FFs.
Other vertices, except signal branches and combinational clouds, are created by createVertex(...). Then createClouds(...) explores the fanin cone of each word and collects the vertices or words supporting it. Finally this function creates one combinational cloud vertex for each set of words with the same support list.
After topologically sorting all existing vertices, lines 14-18 check each vertex and create a signal branch if the vertex drives multiple vertices. For combination clouds, signal branches are created for all output words with multiple fanouts. Then each vertex is connected to all vertices supporting it. This cannot be done without the topological sort, because a branch vertex should be created before a signal is connected to its fanouts.
Once the DG is constructed, it can be used to formulate properties of legal clock-gating conditions as detailed in the next section.
IV. LEGAL CLOCK-GATING CONDITIONS
The goal of clock-gating synthesis is to create extra control logic to reduce the frequency of updating FFs. To verify that an added clock-gating condition is legal, i.e., the revised circuit is sequentially equivalent to the original, properties of the circuit are formulated that are sufficient for the extra control to be legal.
In this section, we derive and prove sufficient conditions for legal satisfiability and observability clock-gating on sequential circuits and formulate properties for FFs that are identified for clock-gating. DGs are used to formulate problems and to derive properties. Then these properties are proved on the original sequential circuit. It is important to note that the properties formulated are relatively easy to prove because they are derived using the DG, which is independent of complicated combinational logic in the circuit.
A. Problem Formulation Using DGs
For a sequential circuit, clock-gating synthesis on a set of FFs can be effected by adding MUXes with feedback loops. This has several representations in a DG. Given a set of target FFs, which are covered by a single DG vertex (a standard FF or a gated FF vertex), two possible differences between a golden DG and its revised DG can indicate that clock-gating synthesis has been performed.
1) There has been a change in the DGs from a standard FF to a gated FF vertex. This is done when a standard FF, updated to the input data at each time frame, now has a proposed control condition en; when en = 0, those FFs are kept at the same values as saved. 2) There has been a change in the en signal of an existing gated FF vertex. The FFs were already gated by en old , but clock-gating synthesis proposed en extra , which is combined with en old to build en new = en old ∧ en extra ; thus, (en new = 1) ⇒ (en old = 1). Note that it is assumed that en old is legal because it is given as part of the golden model and in general we may not have enough information to test this for legality. Both of the above cases can be viewed as changing an enable signal from en old to en new , where in the first case, en old = constant-1, meaning the set of FFs always keeps updating to its input. To verify if the synthesis is legal, the following algorithms only need to check the legality of en old ∧ (¬en new ), because that is where the enabling signal has been changed from 1 to 0 in the revised circuit. Given a sequential circuit with a set of FFs gated by en old , the goal of this section is to verify if clock-gating these FFs with en new (proposed by some clock-gating synthesis methods) is legal.
Definition 2: A proposed clock-gating synthesis condition, where the enable signal of a set of FFs is modified from en old to en new is legal, if and only the circuits before and after this change are sequentially equivalent.
In Given a set of mapped FFs existing in both G and R, assuming clock-gating synthesis has been applied to this set of FFs, the enable signal of the corresponding DG nodes is modified from en old to en new . Then observability clock-gating requires checking if the inputs of the target FFs are not observable when en old ∧ (¬en new ) happens. Because the target FF can be out-of-date (keep the current value even when the input signal has been modified in a future time frame), while the saved old value can be propagated to POs in future time frames, we cannot just check the observable condition of the FF outputs in the next time frame.
Theorem 1: For the target FF with input in, where the enable signal is modified from en old to en new , the proposed synthesis is valid only if the "observable" property
holds.
Proof: The proposed clock-gating synthesis is legal if the circuit with v old and the one with en new are sequentially equivalent, meaning there is no input assignment sequence resulting in differing outputs. If (2) Consider a gated FF where en = 1 at the nth time frame, such that the FF receives and saves the input value, in(n), at the end of this cycle. If the FF output is observable at the (n+1)th time frame, the input data from the previous time frame in(n) is observable. If en is 0 at the (n+1)th time frame and remains 0 before the (n + k)th time frame, the FF keeps the same value, i.e., in(n), before the end of the (n + k)th time frame. If O c (out) is True at a certain (n + l)th time frame (n < n + l ≤ n + k), the saved value (in(n)) is observable. One extreme case is when en = 1 and O c (out) = True both happen together only at the (n + k)th time frame. This means that en = 0 holds until the point where O c (out) holds. Then the input at the nth time frame is observable at the (n + k)th time frame because it has been stored by the FF until this point.
When verifying the clock-gating condition on a target gated FF vertex, in which the enable signal has been changed from en old to en new , the observable condition of the input of the target gated FF is 2) Property Formulation: Algorithm 2, given a target set of FFs, constructs an observable condition for the target's input O c (in). Its negation is a sufficient nonobservable condition. It is constructed to cover en old ∧¬en new , i.e., (en old ∧¬en new ) ⇒ ¬O c (in). This can be used to verify that en new only turns off the clock when the input for the target FF is not observable.
The function getObs(...) in Algorithm 2 recursively constructs an observable condition for a target signal at a certain time frame by exploring the extended fanout cone until reaching either the depth limit (depth), POs, or some visited vertices.
The term depth indicates how deeply the recursive algorithm can recur. When depth = 0, the algorithm just goes across one time frame from an FF vertex input to its output. By returning True immediately, the FF is interpreted as a primary output. The recursion is terminated after exploring the number of time frames specified by the initial value of depth If this algorithm starts with depth < 0, this algorithm keeps recursively exploring fanout cones until reaching primary outputs. However, some paths might not ever reach primary outputs, resulting in an infinite recursion where some visited vertices are examined again. Hence the function analyzeLoop(...) is used to determine how to handle FFs in loops. Additional discussion about infinite recursion can be found in [9] , but the detailed discussion is omitted in this paper due to the space limits.
The observable condition formulated here is tight in the sense that it captures all possible data dependencies based on 
analyzing control logic only. It is insufficient for true observability because data flow could be blocked also by some combinational logic. Example 4: In the DG in Fig. 4 , assume en old for F 2 is constant-1 and en new = s 0 . We need to check if
where en old = True, and O c (
is False we have
Since G(Xs 1 ⇒ s 0 ) holds (regardless of the initial condition of s 1 ), then G(O c (A 0 ) ⇒ s 0 ) holds and hence s 0 is a legal clock-gating condition for F 2 .
C. Satisfiability Clock-Gating Condition
Satisfiability clock-gating aims at turning off clocks for FFs when their input data is the same as that saved already in the FFs. This is related to control conditions and data dependencies of fanin cones from the past up to now. Common LTL operators are insufficient to describe this because reasoning about the past must be done.
1) Update Condition: For a target FF vertex, where the enable signal is modified from en old to en new , satisfiability clock-gating requires that two properties need to be checked, up-to-date and satisfiability. Both of these properties use the notion of an update condition.
A set of signals is said to update when the signal values are different from those in the previous time frame. This is denoted by U c (signal).
Definition 5: The update condition U c (s) for a signal s is a sequential function of time. When U c (s) = 1 at a certain time frame, the value of s can be different from its value in the previous time frame. If U c (s) = 0 at a certain time, s must be identical to the value it had at the previous time frame.
For each vertex of a DG, the update condition of each output depends on the update conditions of the vertex inputs, as well as related control signals. For each type of vertex, the third column in Table I lists the update condition for outputs in the current time frame, U c (out). Note that this formulation omits the cases where signals may be "updated," but to the same values due to combinational logic, i.e., it omits the cases when only control logic is used to determine the update condition.
As shown in Table I : 1) PIs update in each cycle, independent of the actual input patterns; 2) a constant is always identical to its previous cycle (no update), but for the first time frame, it updates from unknown to a certain value. Hence we use "invariant yesterday" Z to make sure it is True in the first time frame; 3) there is no output signal for PO vertices, so U c (out) is not applicable; 4) a standard FF updates in the current time frame depending on the update condition of its data input in the previous time frame. If the input data updates at cycle n [U c (in) = True at cycle = n], then the output of the FF will update at cycle n+1, independent of the actual values. Z is used here because the initial conditions of FFs are not constrained in this formulation (the same idea will be applied to gated FFs); 5) the output of a transparent block updates when either: a) at least one of the support data inputs updates (the output can depend on more than one input data) or b) the selection of input sources is different from that in the previous cycle. Here "yesterday" Y is used because the update conditions for other vertices (PIs, constants, and FFs) guarantee U c (in i ) must be True in the first time frame; this also holds for the output of each transparent block; 6) the update condition for each output of a combinational cloud is the union of the update conditions for each input to the combinational cloud, U c (in i ); when any of these inputs update, the output must update, independent of the combinational logic. A signal branch vertex behaves like a combinational vertex; 7) a gated FF can receive and update to a new value only when en = 1 in the previous time frame. Thus, either: a) the input data must update (U c (in) = True) or b) the FF has not updated since its input was last updated, i.e.,
Y([¬en]S[U c (in) ∧ ¬en])
is valid in the previous time frame. To explain the condition for a gated FF, the output of the gated FF is updated at the nth time frame only when en = 1 at the (n − 1)st time frame. Then there are two cases allowing the FF to receive new values: 1) at the (n − 1)st time frame, the input data updates and is different from its previous value or 2) the input data updates at a previous (n − k)th time frame while en = 0 at that time. At this point, the FF is out-of-date. en keeps being 0 before the (n − 1)st time frame, so the since property holds. Then the FF receives the input value due to en = 1 at the (n − 1)st time frame, which can be different from the value kept in the FF since the (n − k)th time frame. Hence the output of this FF updates at the nth time frame.
When en old = constant-1, it is possible that the target FF can be out-of-date in the golden model. Therefore the target FF needs to be examined for being up-to-date. The following "up-to-date" property:
means any time in updates, this FF receives and updates to the new value. Otherwise, this FF can be out-of-date. If (3) holds, en new can be checked that it only additionally turns off the clock when the input data remains the same.
Theorem 2: If the "satisfiability" property
holds, then the change from en old to en new is legal for satisfiability clock-gating. Proof: The proposed clock-gating synthesis (from en old to en new ) is legal only when the circuit maintains its sequential behavior after this change. When (3) holds, the target FF must update to its input when the input changes. Then the property in (4) guarantees that the false case (en old ∧ ¬en new ) ∧ U c (in) never happens. In other words, en new must cover en old ∧U c (in), which is equivalent to U c (in) due to the up-to-date property. That is, the FF with en new still updates to its input anytime that the input changes Thus the change on this FF will not be propagated to the POs. Hence the sequential behavior must be the same, and the proposed clock-gating condition is legal.
As mentioned, before verifying a clock-gating synthesis (from en old to en new ) on a target FF, en old needs to be examined. There are two cases. case Constant: 7: return Z(False) 8: case Standard FF: 9: U in = getUpd(DG, inputV(target), depth-1) 10: return Z(U in ) 11: case Transparent Block: 12: for all inputV i of target do 13 :
14:
case Combinational Cloud or Signal Branch: 16: for all inputV i of target do 17:
case Gated FF:
20:
U in = getUpd(DG, inputV(target), depth-1)
21:
change from en old to en new is legal for satisfiability clock-gating. Also, the target FF is always up-to-date. 2) G(U c (in) ⇒ en old ) Fails: Even when en new is proposed by a satisfiability clock-gating condition, it is necessary to verify if en new results in extra observable out-of-date conditions (i.e., when en new = 0, the FF can be outof-date). Hence, for a target FF which fails the up-todate property, it needs to be verified that the proposed en new satisfies (2), the observable property. Additional discussion and examples about the issues of combining satisfiability and observability conditions can be found in [9] .
2) Property Formulation:
The proposed algorithm of creating the update condition for a target vertex (output) is shown in Algorithm 3. The function getUpd(...) constructs the update condition for a vertex output by recursively exploring the update conditions of all support vertices. Each type of vertex has a specialized process handled by one of the case values.
We also use the term depth to control how deep the algorithm can recur. When depth < 0, this algorithm terminates only when reaching primary inputs, constants, or already visited vertices (loops). When working on a loop, which can result in an infinite recursion, the algorithm returns True immediately. Reasons and examples for stopping immediately can be found in [9] .
Calling getUpd(...) for a vertex FF means that the exploration goes across one time frame, so the input depth should be reduced by 1 on the next recursive call. When depth = 1, it explores combinational logic only and returns True when reaching FFs, meaning those FFs are interpreted as free primary inputs. Generally, a larger depth can result in more restricted (better) update conditions, which saves more power. However, a larger depth also implies possibly more effort is spent in formulating and proving the property. The parameter depth can be used to tradeoff power consumption and overall verification effort.
D. Verification Flow for Clock-Gating
A valid flow for verifying an arbitrary clock-gating synthesis following the discussion in the previous section is given in Fig. 6 .
For a gated FF vertex with input in and original enable signal en old , to verify changing en old to en new , the first step is to check if the set of FFs are always up-to-date (3). If this passes, there is a chance that the proposed clock-gating is only based on the update condition of in (satisfiability clock-gating.) Hence it can be verified with (4) . If the target FF with en new satisfies the satisfiability property, it is a legal clock-gating synthesis.
If the target FF vertex can be out-of-date with en old , or it violates the satisfiability property, verification with the observability condition (2) is required. If this holds, the clock-gating synthesis is valid; if it fails, the proposed synthesis still can be valid, but cannot be justified with only control signals and data dependencies; information about combinational logic is required.
The formulated sufficient properties for legal clock-gating, which fully consider the functionalities of control signals, must be proved on the sequential circuit under investigation. The LTL properties can be recast as new hardware property-outputs to be proved using hardware model checkers.
V. SEQUENTIAL EQUIVALENCE CHECKING OF CLOCK-GATED CIRCUITS
For two sequential circuits, golden and revised (G and R), with a mapping between their PIs and POs, SEC can be done in a way similar to the CG method in [11] .
1) Identify the additional clock-gating conditions on R.
2) Verify if they are legal.
3) If so reduce them on R by connecting the enable signals to constants (because they are redundant). After removing the extra clock-gating signals, the revised design R' will be more similar to G, and hence SEC between G and R' is generally easier.
Note that G may be clock-gated already, possibly using external satisfiability or observability conditions which we may not know. Those added clock-gating structures may modify the sequential behavior of a version of G with no clock-gating, but they must be assumed legal when considering external (unknown) logic. Hence there is no need to verify all clock-gating structures in these designs. Here it is assumed the clock-gating synthesis from G to R only adds structures, so in the proposed SEC flow, G remains unchanged, while R can be simplified to R'.
A. Identifying Clock-Gating Conditions
To identify candidate FFs that have additional clock-gating signals in R, the DGs for G and R are constructed as D G and D R . Because the correspondence of FFs between G and R is given, each standard or gated FF vertex in D G has a corresponding vertex in D R . According to Section IV-A, there are two types of differences in the DGs that indicate clock-gating synthesis was done. These are used for comparison between each FF vertex pair to detect any new clock-gating constructs in D R .
Given a candidate gated FF vertex in D R , the enable signal is denoted as en new . If the corresponding vertex in D G is a standard FF, en old = constant-1; if gated and controlled by en G , the difference between en new and en G = en old needs to be detected. Since it is assumed that clock-gating synthesis only adds extra logic, i.e., MUXes with feedback loops, the signal en extra in R can be found by checking all controls covered by the collapsed clock-gating condition en new = en old ∧ en extra . Then the flow described in Section IV-D can be followed to verify if the new gating condition en new is legal. If legal, then en extra is sequentially redundant and can be replaced with a constant in R creating R'. When working on the next candidate FF, D R and R' are used.
B. Algorithm Flow
Algorithm 4 outlines the proposed SEC flow based on DGs and reports if the input circuits are sequentially equivalent or returns a counterexample trace. The inputs are the golden and revised circuits, G and R, and depth, which is used to limit the number of explored time frames when formulating update and observable conditions. Function dependGraph(...) executes Algorithm 1 to construct the DG for each input circuit. Line 4 findMappedControls(...) merges paired PIs from G and R and then performs SAT-sweeping on this to identify related signals that are combinationally equivalent to each other. For each enable signal of a set of gated FFs in G, there is a corresponding control signal in R. The matchings between these controls are returned as controlPairs.
The loop between Line 5 and Line 13 verifies each candidate and revises D R and R' based on the proved candidates one Algorithm 4 SEC Flow Based on DGs Require: G and R: two circuits with mapped PIs, POs and FFs. Ensure: EQ or NON-EQ 1:
en new = getEnable(target) 7: en old = analyze(en new , mapFF, controlPairs) 8 :
testCir = buildCircuit(P, R') 10: proof = multiProve(testCir) Based on en new , en old and depth, defPropty(...) formulates the update and observable conditions for the input of target separately, and builds property monitors as in Section IV-D. Then, based on R , buildCircuit(...) constructs a corresponding circuit with multiple outputs (subproperties) for checking the up-to-date condition, and the satisfiability and observability clock-gating conditions. Hence the modelchecker multiProve(...) at line 10 model-checks all properties and returns lists of proved and disproved outputs.
The clock-gating synthesis on target is legal if it reaches the "legal clock-gating synthesis" terminal in Fig. 6 . If it is legal, then revise(...) modifies the enable signal of target to en old , and simplify(...) replaces other extra controls with constants and simplifies R'.
Finally, SEC(G, R') is invoked to check if G and the completely revised R' are sequentially equivalent. If it fails, a counter-example trace is available. A running example with detailed explanations can be found in [9] .
C. Experimental Results
Experiments were performed on a 16-core 2.60-GHz Intel Xeon CPU. The example circuits were clock-gated manually at the RTL, and then synthesized into AIGs to create R. We compare the DG method against the CG method introduced in [11] . Both methods are divided into a simplify part (Simplify) and a final SEC part (SEC). For the DG method, the simplify part includes transparent logic recognition, DG construction, property formulation and proving, and simplifying R. The DG method, including the simplify part is implemented in ABC [5] . In the following experiments, the liveness properties for observability clock-gating are simplified into weaker safety properties, which will be explained in Section VI. The function multiProve(...) used here is the pdr command in ABC.
We also apply super_prove [6] (a general purpose gate-level model-checker) to the sequential miter between G and R' for the final SEC step in Algorithm 4. Along with the five cases used in [11] , we added three industrial cases to demonstrate that the DG method is more effective than the CG method. Also, a case from OpenCores, Md5Core, which is gated by the synthesis presented in Section VI, is listed as Md5Core_Syn.
The statistics of those cases and the runtimes for both methods are shown in Table II .
The last column in Table II indicates the total number of FFs in which the enable signals have been modified after the simplifying step. For the cases which have been used in [11] , the reduced numbers are the same for both DG and CG methods. For the three industrial cases, the CG method is incapable of proving and removing any clock-gating conditions. Also, for Md5Core_Syn, the numbers of FFs in G and R are different, so the CG method cannot simplify it. Hence for the four cases, the runtimes under the CG − SEC label refer to the time super_prove spends on miters between the original G and R.
As shown in Table II , for some cases, the proposed DG method needs more time for the steps before the final SEC, due to its more sophisticated preprocessing steps. After the simplification based on DGs, the final SEC problem is easier than the original SEC problem and can be solved efficiently.
Constructing DGs also provides more insight about the input circuits. For example, the second case in Table II , Md5Core, is a pipeline circuit with 64 stages, where each stage has four 32-bit data words and one 512-bit word for control saved in FFs. Hence the total number of FFs is about (32 × 4 + 512) × 64 ≈ 40k. In this case, only the control words (one 512-bit word in each stage) can be gated due to the data dependencies in G. Also, the revised circuit (R) used in the experiment was gated for only one stage, so the number of resynthesized FFs (the enable conditions are reduced by the DG method) is 512 (one control word) in Table II . However, the control words in other stages can be gated as well, as shown in the next section on synthesis.
VI. CLOCK-GATING SYNTHESIS
The proposed DG concepts, including update and observable conditions, also can be used for clock-gating synthesis.
A. Synthesis With Update Conditions
Given a set of up-to-date target FFs with input in, satisfiability clock-gating aims at building an enable signal en new = en old ∧ en extra , which satisfies
This can be rewritten as
Because the FFs are up-to-date at this point in the flow, U c (in) ⇒ en old .
Since en new = en extra ∧ en old , the strongest enabling signal is en extra ≡ U c (in). As mentioned before, U c (in) can be constructed recursively as an extra signal added to the original circuit. Note also that the required update condition, U c (in), has been constructed for checking the up-to-date condition, i.e., the precondition of synthesis. Hence satisfiability clockgating synthesis is relatively straightforward by reusing U c (in) directly.
B. Synthesis With Observable Condition
For a set of target FFs (out) with input in and en old , observability clock-gating synthesis requires the construction of an enable signal en extra that satisfies
where en new = en old ∧ en extra and O c (in) comes with the Until component as explained before. However, there is no straightforward way to construct such en extra using the liveness concept.
To apply the following synthesis method, the property (condition), G(X(O c (out)) ⇒ en old ), must be proved first. The proposed en new = X(O c (out)) definitely satisfies the target property because first, en new 
After substituting en new = O c (in) = X(O c (out)) into the target property, it is clear that the property holds all the time. Therefore, en extra ≡ X(O(out)) is a legal clock-gating condition.
It is possible that the desired observable clock-gating condition cannot be synthesized because it is related to events in the future, which might not be predictable in the current time frame.
1) Synthesis With the Next Property:
To synthesize a signal en new as X(O c (out)), a new signal O out needs to be constructed in the original circuit. Then the target signal X(O out ), depends on the fanin cone of O out back one time frame.
If O out has only FFs in its support, X(O out ) can be constructed by adding a one time-frame fanin cone of O out to the previous time frame, i.e., skipping all FFs between the two time frames. An example is shown in Fig. 7(a) .
In Fig. 7 , O out is evaluated by a combinational block A, which is fully supported only by FFs. Those FFs are the next state function of certain outputs from another block B. The value of O out in the next time frame is determined by feeding the current outputs of B to A. Hence X(O out ) can be built by duplicating A and adding it to B.
If the fanin cone of O out (A) is supported by a primary input, it cannot be built because such inputs are free and unpredictable. Hence, in this case clock-gating synthesis fails.
Moreover, when constructing X(O c (out)) recursively with a depth limit, it is possible to reach gated FFs, which can introduce until (U) properties into the formulation.
2) Synthesis With the Until Property: To synthesize the observable condition, O c (in), for the input of a set of gated FFs, because it is not the target FF, en old = en new = en. Therefore
where O c (out) is the observable condition of the output. Notice that when constructing observable clock-gating conditions, the combinational blocks computing control signals (e.g., clouds A and B in Fig. 7 ) are used to construct enable signals for clock-gating. If those combinational blocks are complicated, the proposed synthesis can result in some timing and area overheads while the overall reduction in power consumption is limited. Hence, applying a proposed clockgating condition should be determined by a comprehensive understanding of the changes in area, timing (critical path) and power consumption.
C. Synthesis Flow
Given a sequential circuit and a set of target FFs, which has been gated by en old (en old = constant-1 refers to standard FFs), the flow in Fig. 8 is proposed for clock-gating synthesis, in which a new enable condition en extra is synthesized and the target FFs are gated by en new = en extra ∧ en old .
In Fig. 8 , the first step checks if the target FFs are up-to-date under en old . This can be done by formulating the update condition for the FFs input and proving (3) with a hardware model checker. If the property holds, the target FFs can be clockgated by using update conditions (satisfiability clock-gating synthesis) to create a legal enabling signal en extra without considering observable conditions. If en extra = constant-1, then a legal clock-gating synthesis is possible. If: 1) the target FFs can be out-of-date and 2) the property G(X(O c (out)) ⇒ en old ) is violated, the proposed flow terminates. If the required condition (G(X (O c (out) ) ⇒ en old )) holds, the target FFs can be clock-gated by using observable conditions (observability clock-gating synthesis) to build an enabling signal en extra . The proposed en new must satisfy the property in (2) . If no such en new exists, the flow stops without modifying the circuit.
Finally, if an additional enable condition en extra is proposed, on the corresponding DG, the enable signal of the target FFs is modified to en old ∧ en extra . On the original circuit, it can be represented as inserting a set of MUXes controlled by en extra between the target FFs and their corresponding input signals, where there are feedback loops from the FFs to MUX inputs. In practice, as in Fig. 1 , it can be achieved by ANDing the clock with en old ∧ en extra .
Based on the flow for each target FF shown in Fig. 8 , for input circuit, Cir, and input parameter depth, a proposed synthesis algorithm is outlined in Algorithm 5. Functions synUPD(...) and synOBS(...) are used: 1) to formulate update or observable conditions on the DG based on Algorithms 2 and 3 and 2) to construct corresponding signals on revCir as described in the previous sections. Note that the property formulation part can be terminated earlier when reaching already analyzed FFs. In addition, enable signals constructed by clockgating synthesis can be used to build other clock-gating conditions.
After constructing the DG for Cir, all standard and gated FF vertices are sorted in a topological order. From lines 4 to 7, each FF vertex is examined and satisfiability clock-gating is performed one by one. At line 5, the update condition for the vertex input is formulated on the DG, and then the corresponding signal U in is constructed on the circuit. Based on U in and the old enable signal, isUpToDate(...) uses a hardware model checker to verify if the target is always up-to-date. If so, the clock-gating condition of target is revised with U in , both on the DG and the circuit.
Then only FFs which satisfy the required property are considered for observability clock-gating. As discussed, observability clock-gating synthesis should be done in a reverse topological order. For each FF vertex, the observable condition for its output is formulated, and the algorithm also builds the corresponding signal O in = X(O out ) on the circuit. Note that 
D. Experimental Results
The proposed synthesis flow was implemented in ABC and applied to the golden/revised circuits in the benchmark pairs that were used in the previous verification experiments. In Table III , the third column indicates the number of gated FFs proposed by a reference manual analysis (which models what a designer might do). Note that the exact number of FFs gated in the industrial cases is unknown(*). The fourth column shows the number of gated FFs proposed by the DG method. The fifth column gives the total runtime for synthesis.
As shown in Table III , the proposed method can synthesize clock-gating conditions efficiently. For some cases, like the pipeline circuit Md5Core, the DG method can propose more clock-gating conditions than the reference synthesis does, demonstrating improved possibilities for achieving low-power circuit design.
Comparing to the unknown synthesis done on the industrial cases, the proposed synthesis flow can identify approximately the same sets of candidate FFs and perform clock-gating synthesis. However, some legal clock-gating conditions might be overlooked (e.g., in Industry_1) by the proposed algorithm, because it is conservative in formulating observable conditions. Also, it is possible that the candidate FFs might be gated by less strict conditions, e.g., some internal transparent blocks might have been missed. Also, in the industrial cases, words might be split into single bits due to transparent blocks, so it might take more time to finish the synthesis flow.
For most of the cases in the experiment, the extra logic synthesized for additional enable conditions is no more than five AND gates, while the pipeline circuit Md5Core requires 63 extra FFs for the satisfiability clock-gating conditions used. In a modern very large-scale integration design flow, a legal clock-gating condition may be proposed, but not actually used depending on the overall improvement in power consumption. In general, the automatic synthesis framework proposed in this paper can make the whole flow more independent of manual efforts.
VII. CONCLUSION
To address both verification and synthesis of clock-gated circuits, an SEC flow and a synthesis flow were proposed for sequential circuits. These are based on the fact that most practical sequential clock-gating synthesis only inserts sequential redundancies into targeted circuits. Reverse engineering is used to identify transparent logic blocks and this information is used in constructing DGs to capture strengthened properties for clock-gating. Legal clock-gating conditions are formulated with LTL and PLTL operators on DGs. Those properties can be converted into equivalent hardware monitors, allowing both safety and liveness hardware model checkers to be used in verification or for synthesizing enabling signals for clock-gating synthesis. Experimental results showed that the proposed methodologies are effective and efficient for either verifying proposed clock-gating conditions or for synthesizing legal clock-gating conditions to reduce the frequencies of updating FFs.
