Abstract: We devise a tool-supported framework for achieving power-efficiency of hardware chips from high-level designs described using the popular hardware description language Verilog. We consider digital circuits as hierarchical compositions of sub-circuits, and achieve power-efficiency by switching-off the clock of each sub-circuit according to some clock-gating logic. We encode the computation of the latter as several small symbolic discrete controller synthesis problems, and use the resulting controllers to derive power-efficient versions from original circuit designs. We detail and illustrate our approach using a running example, and validate it experimentally by deriving a low-power version of an actual Reed-Solomon decoder.
INTRODUCTION
Power efficiency of digital circuits is nowadays of paramount importance for constructing embedded electronic devices, and various mechanisms can be used to reduce power consumption in hardware chips. At the technological level, these include clock-gating, multi-supply voltage, and power-gating for instance (Hariharan and Jaya Kumar, 2012) . In synchronous circuits in particular, clock-gating is used to selectively cut off the clock of components with the aim of reducing the power dissipation induced by the switching activity it incurs. This technique mostly consists in computing the Clock-Gating Logic (CGL) for sub-circuits, and then translating each CGL itself into a piece of circuit whose output wires can be used to switch-off (to gate) the clocks that drive the sub-circuits. In this paper, we observe that CGL computation is actually a feedback control problem, where the sub-circuit constitutes the system to control, and the objective is to switch-off its clock whenever possible. This new perspective constitutes a first step towards employing other control techniques for producing self-adaptive power efficient digital circuits, e.g., that would automatically adapt to the remaining capacity of some battery according to an objective power/performance trade-off.
Discrete Controller Synthesis (DCS)
The control theory of Discrete Event Systems (DES - Ramadge and Wonham (1989) ; Cassandras and Lafortune (2007) ) allows the use of constructive methods ensuring, a priori and by means of control, required properties on a system's behavior. Usually, the starting point of these theories is: given a model for the system and control objectives, a controller must be derived by various means such that the resulting behavior of the closed-loop system meets the control objectives. A typical example is the safety control problem for symbolic systems (i.e., described using state and input variables with associated dynamics), where the desired objective is the enforcement of some invariant a priori not satisfied by the system: a controller is to be computed that Authors' version. restricts the admissible values for a subset of the input variables (referred to as controllable variables) so that the resulting controlled system satisfies the invariant. Finding DCS algorithms computing such controllers in the case of finite-state symbolic systems (i.e., where the state and input variables are Booleans only) has been the objective of several studies, and led to several implementations, e.g., by ; Marchand and Samaan (2000) . Berthier and Marchand (2014) later extended these studies for infinite-state systems, featuring numerical state and input variables. Meanwhile, other studies considered optimization objectives, given partial order relations (Marchand and Le Borgne, 1998; Marchand and Samaan, 2000) , or cost functions (Dumitrescu et al., 2010) . Most of these solutions adapt Bellman's algorithm for the computation of optimal strategies using dynamic programming (Bellman, 1957) . In this work, we exploit DCS principles through the use of the BZR environment (Delaval et al., 2013) , that integrates symbolic DCS within the reactive data-flow language Heptagon; we give more background on Heptagon in Section 2.2.
Contributions We present a framework involving symbolic DCS to achieve energy efficiency of hardware chips. We consider circuits described using the Hardware Description Languages (HDL-e.g., Verilog) at the Register-transfer Level (RTL) abstraction. RTL descriptions are high-level hierarchical compositions of components, registers, and logical operators, linked using wires. They can be converted into equivalent digital chip designs for Application-specific Integrated Circuits (ASICs) or Field-programmable Gate Arrays (FPGAs). The framework broadly comprises the following steps: First, the RTL description of the original circuit is translated into a set of synchronous models where controllable variables represent output wires of CGLs. These models are associated with control objectives whose enforcement guarantee the correct behavior of the CGLs. Second, the latter are obtained using a symbolic DCS algorithm. Last, the CGLs are translated into pieces of circuits that are then integrated into a new, clock-gated circuit design. Our translation algorithm is parametrized with a set of variables to be picked from the HDL description, and that is used idle busy r/m <= ĩ r cnt==100 && r / e; cnt <= 0; m <= i cnt==100 &&~r / e; cnt <= 0 cnt<>100 / cnt <= cnt + 1 Fig. 1 . Mealy machine symbolically encoded by register state in Verilog module m of List. 1, decorated with operations on cnt and m. The initial value of cnt is 0. to abstract away most of the circuit in order to: (i) focus on the portion of its sequential logic that is relevant for expressing the CGLs; and thus (ii) restrict the size of the DCS problems. Our algorithm automatically generates interpreted non-controllable inputs called oracles to model the non-determinism introduced by the abstractions and allow the computation of deterministic, hence implementable, CGLs. We give a running example along with a description of the Verilog HDL and BZR in Section 2, and use it to describe and illustrate the framework in Section 3. We exercise our technique on a realistic case study in Section 4, and give related works in Section 5, and conclude in Section 6.
BACKGROUND & RUNNING EXAMPLE
We now introduce the Verilog HDL using a running example, and then describe the fragment of the Heptagon language that we use for the modeling and computation of CGL for RTL circuits.
The Verilog Hardware Description Language
Verilog is an HDL dedicated to the design of electronic systems. In particular, it can be used to specify synchronous circuits. The description of such a circuit in Verilog consists of a main module made of an assemblage of registers, wires, gates and/or submodules. Each of the latter components features an interface that comprises input or output wires or registers. Verilog provides several constructs to program modules, such as conditional and case statements, wire/register declarations and assignments, and event detection (e.g., positive edge detection, triggered when the value carried by a wire transitions from 0 to 1). One input wire, usually called clk, carries a clock that is used to trigger changes in the values of registers. We give in List. 1 an example specification of a module "m". It starts with the declaration of its interface, here comprising basic wires (e.g., clk, r, and e) or wire arrays (e.g., i and o, here used to carry data). A declaration of constants used internally (idle and busy) follows, along with internal registers. Assignments at lines 7 and 8 describe the logical values taken by the corresponding wires at any instant by means of logical expressions. The code following "always @(posedge clk)" consists in conditional "clock-triggered" assignments to internal or output registers, denoted by "<=". For instance, the statement "state <= busy" at line 11 states that, at every instant t where a positive edge of clk occurs, the value memorized by the register state takes the value busy if the condition r holds (i.e., r carries the value 1 at the very instant where clk becomes 1); notice the value of state would actually become busy at a subsequent instant t + ε. The internal registers in m symbolically encode a finite-state automaton, that we represent in the form of a Mealy machine with variables in Fig. 1 m1 nor m2 is currently computing (r1 &~error), cfg takes the current value of the input wire mode. The selection of values for output data o and end signal e, along with the triggering of computations by sub-module instance m2, depend on cfg: in mode LQ an input data i is only processed by m1, whereas in mode HQ this data is serially processed by m1 and then m2.
In the remainder of the paper, we consider Verilog circuits given as directed acyclic graphs whose nodes are modules, where arcs describe the "instantiates" relation, and with a single source node representing the module that describes the whole circuit. In turn, every Verilog module M is considered as a tuple
where: I M denotes input wires; O M denotes output wires; R M are internal or output registers; Sub M is a set of sub-module instantiations. A Verilog module with a non-empty set of sub-module instantiations is called a super-module.
Clock-gating in Verilog Circuit Specifications Consider a module instance mi. We say that η mi is a Clock-inhibition Predicate (CIP) for mi if, upon an edge of the clock of mi (e.g., clk), η mi holds if the values of every registers and output wires of mi are strictly equivalent before and after the edge of the clock. If translated into a CGL, η mi can then be used to save dynamic power by gating the clock of mi by preventing flip-flop switches. Considering our example Verilog module main again, a piece of circuit encoding the CGL for sub-module instances m1 and m2 would typically output two wires, say η m1 and η m2 , used to filter each of their respective input clocks clk. The extract of Verilog code instantiating the clock-gated instance of m1 (replacing the beginning of line 8 in List. 2) would then be "m m1 (.clk(clk &~η m1 ),".
A Fragment of Heptagon
Heptagon (Delaval et al., 2013 ) is a reactive data-flow language where programs are built as parallel and hierarchical compositions of data-flow nodes, each having input, local, and output flows. The body of a node describes how input flows are transformed into output flows, in the form of a set of equations. These equations define the values of outputs (and possible local flows), using the current values of inputs, and the current state of the node: the latter is made of memorized values expressed by "last" values of flows. New values for input flows are given at each execution step, where equations are then evaluated all together, and values of output flows are updated accordingly. We give an example Heptagon node in List. 3; this node symbolically encodes the Mealy machine given in Fig. 1 by using "last" flows to memorize its state. One can compose Heptagon nodes using instantiations; e.g., like "(e1, o1) = inlined m (r1, i1);" for m.
An invariant and controllable flows (each taking its value in the Boolean domain bool = {false, true}) can be specified for Heptagon nodes using contracts. When it encounters a node featuring a contract, the BZR compiler involves a symbolic DCS algorithm to automatically produce a controller constraining the values of the controllable flows so as to guarantee that the resulting controlled node satisfies the invariant. The controllers produced take the form of as many predicates as controllable flows, that implement the following behavior: considering every controllable flow c in turn (according to their order of declaration), the controller tries to assign c to true unless this could lead to a potential violation of the desired invariant in subsequent execution steps. Given a Boolean output o, a contract enforcing that o holds using controllable flows c1 and c2 for a node is declared as "contract enforce o with (c1, c2: bool)".
Variables & Further Notations
In Verilog terms, a set of variables V represents wires and outputs of registers; equivalently in Heptagon terms, variables in V represent flows, including state ones ("last" flows). P V is the set of propositional predicates expressed using variables in V . Given an instantiation Mi of a Verilog module M and a set of Fig. 2 . Example Verilog module instantiation graph, associated models, and resulting CIPs. Arrows (resp. ) represents Verilog sub-module (resp. Heptagon node) instantiation relations. In turn, (resp. ) denotes modeling (resp. symbolic DCS) steps.
variables V M pertained to M, we denote by V Mi the set of variables substituted according to the instantiation Mi. By extension, we write V Sub M to denote mi∈Sub M V mi , where is the disjoint union.
COMPUTING CGLS USING SYMBOLIC DCS
We now describe our technique for computing CGLs. We give an overview of the modeling principles and how we eventually integrate the resulting CGLs into the original circuit. We then detail the models and results we obtain from our example.
Overview of the Modeling Technique
Our translation algorithm produces two families of models (represented using Heptagon nodes) that each fit two distinct purposes: SM the first family of models, called Sub-module Models, aims at representing generic sub-modules (i.e.
, not yet instantiated). They model their behavior using one internal state flow per marked register, and abstract away any submodule instantiation; CM the second kind of models, referred to as Composed Models, is derived from the Sub-module Models of every Verilog super-module M. CMs instantiate Sub-module Models, and encode the computation of CIPs as symbolic DCS problems. Our algorithm for computing CIPs works by visiting every Verilog module in the instantiation graph according to an inverse topological order. Every module M is first translated into a Submodule Model SM M , accompanied with an idleness predicate idle M that expresses a condition on which the registers' values of any instance of M do not change. Then, every node SM M of a super-module M is further transformed into a Composed Model CM M that instantiates Sub-module models. Each node CM M features a contract that involves control objectives (i.e., at least an invariant involving idleness predicates of sub-module instances) and controllable flows that represent the CIPs of each sub-module instantiated by M: the enforcement of this contract by using a symbolic DCS algorithm results in correct CIPs. We sketch this process using an example circuit specification in Fig. 2 . In this example, three symbolic DCS problems are solved, leading to as many sets of CIPs. Note that SM m1 and idle m1 are never instantiated: SM m1 is only used to derive CM m1 . During the modeling process, one CGL-enabled Verilog module mi is derived from the each original one mi. Each CIP mi is eventually translated into Verilog code, and then integrated into mi to derive a clock-gated Verilog design.
Tackling Complexity Issues Consider a Verilog module M with sub-module instantiations Sub M , and assume a perfect knowledge of the values of all its input wires I M , its registers R M , and the registers of all its direct and indirect sub-module instances. The optimal CIP for each of its sub-module instances mi ∈ Sub M is η
, where Sub * M denotes every direct and indirect sub-module instances within M. In principle, one can then build the CGL computing a value for η optim mi at each clock cycle, and use it to inhibit the clock of mi within M. However, the size of today's circuit designs make the exact computation of optimal CIPs generally intractable. To tackle this problem, we compute under-approximations of CIPs by: (i) using a layered approach, where CIPs are computed separately for each supermodule and only their direct sub-module instances are taken into account; and (ii) devising a parametrized abstraction technique.
Marking Variables
To drive the abstractions, we parametrize our algorithm with a set of variables to be taken into account when modeling the circuits. This key aspect of our approach allows designers to exploit the knowledge they have on their designs. In particular, the usual distinction between command parts and operational parts of hardware circuits permits a quick identification of registers and wires that are relevant for the computation of CIPs that would otherwise be hard to compute. Referring to our example Verilog module m in List. 1, one can observe that the computations on the input data given using wire array i and output using wire array o, are driven by the values held in registers state and cnt, plus input wire r. The output wire e is also relevant w.r.t. the behaviors of any circuit instantiating m as it indicates the termination of its computations. Regarding module main of List. 2, relevant wires and registers include r, mode, e, error, cfg, wait m1, and wait m2. Marked wires and registers shall be specified as a union S of sets S M of variables pertained to a Verilog module M instantiated in the circuit. Note that our modeling algorithm is sound w.r.t. the set of marked variables, meaning that, although some sets S give better results than others (e.g., in terms of dynamic power savings), it always produces functionally equivalent results. In the worst case (S M = ∅), the resulting model for M symbolically describes a single-state automaton, and it is only likely to result in a module that is never considered idle. Abstracting behaviors of the Verilog modules leads to potentially non-deterministic models. Consider for instance an explicit automaton with an input wire i and two transitions from the same source and distinct destinations, respectively guarded with i and ¬i; abstracting away i would lead to a non-deterministic automaton. To still construct Heptagon nodes (that are deterministic by definition), we automatically generate oracles to replace subexpressions whose values are abstracted away, thereby explicitly modeling the non-determinism.
Introducing Oracles Given an expression e on any set of variables, the oracle ω e is an interpreted input that proxies e. In particular, ω e takes its values in the domain of e (e.g., the Booleans if e is a predicate), and can thus be used to model behaviors where e itself is abstracted away and can nondeterministically take any value in its domain. Every knowledge about the modeled behaviors is not lost however. Indeed, assuming that e and e admit the same canonical representation e , every occurrence of e and e can be replaced with the same oracle ω e , and the equality of valuations for e and e can still be represented. For instance, given the expression x + y, where x and y are Integer variables, the oracle ω x+y can non-
deterministically take any Integer value: the addition operation, x, and y are abstracted away, replaced by some undetermined Integer. Additionally, if y + 1 + x − 1 admits the same canonical representation as x + y, then it can also be modeled with ω x+y . When constructing SM M or CM M from a Verilog module M, we introduce a set of oracles Ω M to handle expressions within M that involve non-marked wires and registers (not belonging to S M ): i.e., the actual values of these expressions are abstracted away in the resulting models. Yet, our goal is to actually generate circuits that encode CGLs, and that can thus be used to inhibit the clock of instances of M: the actual values of marked registers and abstracted expressions computed within M are hence required when translating the resulting CIPs into Verilog code. As a result, while oracles Ω M are inputs to the models of M, we also build a CGL-enabled version M' that features one additional output wire per oracle in Ω M that does not represent expressions only involving inputs of M; these additional wires carry the actual values of the corresponding expressions, and are thus used to feed the CGL of super-modules. Additional output wires of M' also carry the value of marked registers belonging to S M .
Bottom-up Clock-inhibition Allowance Of course, the clock of any instance of M also drives sub-module instances mi ∈ Sub M . As a result the clock of M should not be inhibited whenever any sub-module instance mi must not be inhibited. However, SM M does not model the behavior of any of its sub-module instances. Hence, we choose to add a bottom-up clock-inhibition allowance output wire allowη M to M', built as the conjunction of every CIPs of sub-modules instantiated by M, or 1 if there is none.
Resulting CGLs
Eventually, the CGL to be integrated within a Verilog super-module M consists of one CIPη mi ∈ P S M Ω M S Sub M Ω Sub M mi∈Sub M allowη mi per sub-module instance mi ∈ Sub M .η mi under-approximates the condition upon which the clock of sub-module instance mi can be inhibited: i.e., it is such thatη mi ⇒ η optim mi ,η mi beingη mi where every oracle ω e is substituted by e. We rely on a symbolic DCS algorithm to compute such CIPs.
Building Sub-module Models & Idleness Predicates
We outline in Fig. 3 the interface of a Sub-module Model SM M for a Verilog module M. Its inputs include (i) an enable flow τ; (ii) flows mirroring marked input wires selected for this module (S M ∩ I M ); and (iii) a set of input oracles Ω M that are used to model undetermined behaviors of instances of M. The outputs of SM M comprise flows mirroring the marked registers and output wires selected for M (S M ∩ (R M O M )). We further associate each Sub-module Model SM M with an idleness predicate idle M ∈ P S M Ω M , that under-approximates the condition on which the registers' values of any instance of M do not change; i.e., given values for marked input wires, marked internal registers, and for the oracles, if idle M holds then any assignment to any (both marked and non-marked) internal and output registers of M would not change the value it memorizes. One can easily build the Heptagon node SM M and the associated condition idle M from a Verilog module M where every internal wire is substituted with the expression it is assigned to. A traversal of clock-triggered assignments to marked registers clocked List. 6. Heptagon node CM main obtained from the Verilog module main of List. 2 using marked variables S main = {cfg,wait_m1,wait_m2} and S m as used in List. 4.
using clk allows the construction of cascading conditional statements for the assignments to flows encoding the state within SM M ("last" flows). A similar traversal can be used to construct idle M as the conjunction of the negation of every guard leading to the assignment of a register. An efficient introduction of oracles can be performed by building a canonical representation of every expression using (basic and/or multi-terminal) binary decision diagrams for instance. Then, every canonical expression e that involves a non-marked variable becomes an oracles ω e ∈ Ω M . Note that the constructions above do not necessarily traverse every expression, register or wire of a module declaration, hence only a limited number of oracles might be required even for large modules. This claim is supported by the application of our technique on a realistic case study detailed in Section 4. When applied to the Verilog module m of List. 1 with marked variables S m = {state, r, e}, our construction technique for Sub-module Models builds the Heptagon node of List. 4. The assignment to e on line 9 corresponds to the assignment on line 8 in List. 1: the value of "cnt == 100" is abstracted away using an oracle as cnt does not belong to S m . In turn, the predicate that describes the idleness condition of m is idle m = (state = Idle & not r). Finally, we show in List. 5 the additions to module m that are necessary to construct the corresponding CGL-
enabled Verilog module m'. m' features three additional output wires (one for as many oracles in Ω m , one per variable in S m ∩ R m , plus allowη m ). The bottom-up clock-inhibition allowance output allowη m is assigned to 1 as no sub-module instantiation exists within m to prevent the clock of m from being inhibited.
Building Composed Module Models
A Composed Model CM M is derived from SM M by taking submodule instantiations mi into account, and formulating the computation ofη mi 's as a symbolic DCS problem. Basically, the instantiation of a sub-module m by M translates within CM M into the instantiation (say, SM mi ) of the Heptagon node SM m . The input τ of each Heptagon node instantiation SM mi is assigned to the negation of the corresponding CIP "not η mi ". CIPs, in turn, are the controllable flows as they represent the CGL outputs. (The input flow τ modeling the clocks in Submodule Models is no longer required, and can be substituted with true everywhere else in CM M .) Further, we build Ω mi and idle mi according to the appropriate renaming in Ω m and substitutions in idle m . Note that, with the additional output flows from Submodule Models, some oracles in Ω M may represent expressions that are now fully determined. A substitution of such oracles with their respective expressions is thus necessary in CM M so that marked outputs of sub-modules are taken into account. Let Ω M be Ω M pruned from the latter oracles. At last, the invariant to enforce by control ϕ M states that a CIPη mi for a sub-module instance mi should not hold unless idle mi holds:
We sketch the interface of a Heptagon node CM M in Fig. 4 ; note that it also admits as inputs the oracles of every instantiated sub-module. We give in List. 6 the result we obtain for CM main .
Computing & Integrating the CGLs
As stated in Section 2.2, the compilation of a Heptagon node that features a contract (as Composed Models do), involves a symbolic DCS computation step that produces a controller made of one predicate per controllable flow (i.e., CIPs). By virtue of the semantics assigned to such flows by the Heptagon compiler (i.e., assigning them to true whenever possible), one can eventually translate the controller into some Verilog code encoding a CGL that inhibits the clock of sub-module instances whenever possible. We show in List. 7 excerpts of the end result that we obtain for our running example. The assignments to registers 1 holdingη m1 andη m2 are clocked using clk: their respective input value consists in the conjunction between their respective bottom-up clock-inhibition allowance (allowη m1 and allowη m2 ), and their respective CIPs as computed by using symbolic DCS. The clocks of sub-module instances m1 and m2 are now filtered according toη m1 andη m2 . As a side note, remark that ω e1 and ω e2 are output wires of main' since these outputs of sub-module instances are required to construct SM main and idle main . However, ω r and ω mode are not part of these outputs as they represent input wires only. 
15
.state'(m1 state), .ω cnt==100 (ω m1 cnt==100 )); wire allowη m2 , m2 state, ω m2 cnt==100 ; m' m2 (.clk(clk &~η m2 ), .allowη m (allowη m2 ), ...
.state'(m2 state), .ω cnt==100 (ω m2 cnt==100 )); ...
20
// assignments to outputs to be CGL-enabled: assign wait m1' = wait m1, wait m2' = wait m2; assign cfg' = cfg; assign ω e1 = e1, ω e2 = e2; assign allowη main =η m1 &η m2 ; // <-clock-inhibition allowance endmodule (Reed and Solomon, 1960) . Basically, this decoder takes coded words of 204 bytes as inputs, and outputs decoded words of 188 bytes. The original decoder is made of 23 modules that build up a circuit with around 52,000 gates and 3,000 registers (flip-flops). Among them, 6 modules drive the operations to be performed on the data: they feature two easily identifiable families of wires and registers that we marked for our modeling: (i) register arrays named state or step, that take their values into discrete domains made of a few constants (similarly to state in List. 1); these registers are typically used to encode some command automaton that drives operations on data; and (ii) input and output wires named * ready or * done, that signal end of computations. We then produced a "CGL-enabled" circuit including CIPs produced using symbolic DCS.
To experimentally assess the functional correctness and compare the respective dynamic power dissipation of each of the designs at hand (original and CGL-enabled), we first performed logic synthesis on both of them using the Altera Quartus synthesizer. We then used the Altera ModelSim simulation tool to perform functional simulations using the same benchmark (provided with the source code of the original decoder) for the two circuits, and checked that the resulting traces were strictly equivalent. To assess actual dynamic power savings, we have carried out estimations of mean power dissipation on simulations of the 2 https://opencores.org/project,reed_solomon_decoder.
benchmarks, for various target technologies and main clock frequencies as these factors have a great impact on dynamic power. The Altera PowerPlay Power Analyzer tool offers several pre-configured target technologies, among which we chose the Cyclone IV (dedicated to low-power FPGA designs), the Stratix III (for high-performance FPGA designs), and the HardCopy IV (ASICs) families. We also setup the main clock frequencies to be either 100Mz or 1GHz. We show the resulting estimations of power dissipation and respective dynamic power savings in Table 1 . We consider that these results are promising when put in perspective with the relative simplicity of our approach. Indeed, we generated effective CIPs by imposing invariants only, as our technique do not even incorporate control techniques towards any sort of optimization yet.
RELATED WORKS
Low-power Chip Design Several families of design methods permit the (semi-)automated use of power-saving technologies: they can be integrated into high-level (aka system level) or RTL descriptions, or further down the implementation process, during "synthesis" (i.e., translation of RTL descriptions into a network of gates and wires), or placement and routing steps. Nonetheless, Dale (2008) found that considering higher levels of abstraction generally leads to more power savings. Designers most commonly rely on the RTL code itself to implement clock-gating, yet a few approaches automatically generate RTL code with integrated clock gating form higher-level descriptions. Among them, Agarwal and Dimopoulos (2008) developed an environment for high-level design with their own procedural language. Ahuja et al. (2010) also provide a solution to design circuits directly using the C language. In these approaches, designers are responsible for the selection of gated components. One can distinguish three classes of RTL clock-gating algorithms based on the hierarchical level at which they consider the circuit: combinatorial or sequential ones focus on individual registers (Sudhakar et al., 2015; Liu et al., 2015) , system-or module-level (Bhutada and Manoli, 2007) focus on clock-gating whole modules or blocks of clock-triggered assignments. We rely on the latter level for the layered abstractions that it allows. Several commercial and academic tools already target automated clock-gating from RTL code. Raghavan et al. (1999) developed an algorithm that automatically insterts CGL into RTL descriptions of circuits. They focus on the exact computation of idlness conditions for individual registers within a single module, hence their solution suffers from scalability issues. To partially overcome these issues, Babighian et al. (2005) suggest an algorithm that automatically tries to approximate idleness conditions. Later, Chang et al. (2007) used a control-based adaptive clock-gating algorithm to shut down IP cores based on given explicit finite-state models. In an approach that targets the conditional activation of individual hardware components using their "enable" signal (an approach similar to clock-gating), Benini et al. (1994) tried to detect idleness conditions by using explicit finite-state machines. At last, Raghavan et al. (1999) exploited conditional statements and case structures within blocks of clock-triggered assignments in HDL languages to determine such conditions. Our approach draws from the latter ones in the sense that we also operate at the HDL level, and build symbolic finite-state machines from conditional clock-triggered assignments. We additionally bring layered and semi-automated abstractions for the sake of scalability.
Applying DCS for Low-power Hardware Design Few hardware design techniques rely on DCS for saving power. Quadri et al. (2010) present a high-level design flow for reconfigurable FPGA-based System-on-Chip (SoC); they model potential reconfiguration behaviors and manually derive a "controller" that automatically takes reconfiguration decisions. Later, Guillet et al. (2012) and An et al. (2013 An et al. ( , 2016 were among the firsts to apply DCS algorithms for reconfiguration management in SoC design. Doing so, they could automate the generation of controllers, thereby exploiting the formal correctness and guarantees that DCS techniques provide. In particular, An et al. (2013 An et al. ( , 2016 model the applications' behaviors and the needed resources (area in hardware-i.e., regions of the FPGA) using explicit automata; they then use a symbolic DCS algorithm to automatically compute a reconfiguration manager for the system.
CONCLUSIONS & FUTURE WORKS
In this paper, we have described a systematic approach for computing the CGL of synchronous circuits described using the Verilog hardware description language. This approach exercises symbolic DCS algorithms by means of a semi-automated modeling in Heptagon of each individual Verilog modules. We have demonstrated its principles using an example, and have reported on its manual application on a realistic case study.
The next steps involve a formalization of our modeling algorithm to validate its correctness, along with the development of an implementation in a tool. Our approach can also be extended to compute CGLs for individual registers within modules. Although we exercised our technique to implement clock-gating as it currently offers the best trade-off between extra occupied circuit area and power savings (Kathuria et al., 2011) , it is also applicable to other low-power design mechanisms such as powergating (especially for computation-intensive modules that can be shut down for long periods of time). Also, the abstractions induced by our modeling approach make it a good candidate for constructing models suitable for the application of control algorithms that do not scale up to exact whole-circuit models. In particular, the adaptation of our algorithm for the computation of "suspendability" predicates would allow to suspend the computations of sub-module instances by control. Combined with the recent advances in control algorithms for symbolic infinite-state systems with applications to quantitative models (Berthier and Marchand, 2014; Berthier et al., 2015) , our framework could permit the application of optimal control techniques towards the minimization of peak dynamic power or energy dissipation over several clock cycles. Similarly, incorporating stochastic models (e.g., inferred from simulation traces) would provide interesting cases for developing new optimal control algorithms targeting such goals. Also, automatically identifying "good" sets of marked variables constitutes an interesting challenge. At last, the support of black-box sub-modules with simple user-provided models can also be considered.
