Complementary synthesis automatically generates an encoder's decoder with the assumption that the encoder's all input variables can always be uniquely determined by its output symbol sequence. However, to prevent the faster encoder from overwhelming the slower decoder, many encoders employ flow control mechanism that fails this assumption. Such encoders, when their output symbol sequences are too fast to be processed by the decoders, will stop transmitting data symbols, but instead transmitting idle symbols that can only uniquely determine a subset of the encoder's input variables. And the decoder should recognize and discard these idle symbols. This mechanism fails the assumption of all complementary synthesis algorithms, because some input variables can't be uniquely determined by the idle symbol.
INTRODUCTION
One of the most difficult jobs in designing communication and multimedia chips is to design and verify complex encoder and decoder pairs. The encoder maps its input variables i to its output variables o, while the decoder recovers i from o. Complementary synthesis [Shen et al. 2009 [Shen et al. , 2010 [Shen et al. , 2011 [Shen et al. , 2012 Liu et al. 2011 Liu et al. , 2012 Tu and Jiang 2013] eases this job by automatically generating a decoder from an encoder, with the assumption that i can always be uniquely determined by a bounded sequence of o.
However, the encoders of many high speed communication systems employ flow control mechanism [Abts and Kim 2011] that fails this assumption. Figure 1(a) shows such a communication system with flow control mechanism, which includes a faster transmitter and a slower receiver connected by a pair of encoder and decoder. There are This work was funded by projects 61070132 and 61133007 supported by National Natural Science Foundation of China, the 863 Project of China under contract 2013AA014301, project 4345135127 supported by Program for New Century Excellent Talents in University. Authors' addresses: School of Computer, National University of Defense Technology. Correspondence email: yingqin@nudt.edu.cn. c 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only. The flow control mechanism prevents the faster transmitter from overwhelming the slower receiver in the following way:
(1) When the receiver can keep up with the transmitter, the transmitter will rise f to 1, which makes the encoder to transmit However, this mechanism fails the assumption of all complementary synthesis algorithms, because d can't be uniquely determined by the idle symbol I. So to resolve this issue, we only need to consider the case f ≡ 1, in which d can be uniquely determined. For other cases in which f ≡ 0, d isn't needed by the receiver. So, we propose a novel complementary synthesis algorithm to handle flow control mechanism:
(1) First, it applies the classical halting complementary synthesis algorithm [Shen et al. 2011 ] to identify all the input variables of the encoder that can be uniquely determined, and call them the flow control variables f . Other input variables that can't be uniquely determined are called the data variables d. The second step of this algorithm seems somewhat similar to that of Shen et al. [2012] in the sense that both algorithms infer predicates that enable d or i to be uniquely determined. But the essential difference between them is that the algorithm of Shen et al. [2012] infers a global assertion that must be enforced on all steps along the unrolled state transition sequence, while our algorithm infers a local predicate that is only enforced at the current step when we need to recover the value of d. Thus, our algorithm can be seen as a generalization of Shen et al. [2012] .
Experimental results indicate that, for several complex encoders (e.g., Ethernet [IEEE 2012] and PCI Express [PCI-SIG 2009] ), our algorithms can always correctly identify the flow control variables, infer the predicates and generate the decoders. In addition, we also conduct other experiments to compare our algorithms with the state-of-the-art complementary synthesis algorithms. All these experimental results and programs can be downloaded from https://github.com/shengyushen/compsyn.
The remainder of this article is organized as follows: Section 2 introduces the background material; Section 3 identifies the flow control variables, while Section 4 infers the predicate that enables d to be uniquely determined by a bounded sequence of o; Section 5 minimizes the length of unrolled state sequence, while Section 6 characterizes the decoder's Boolean function; Sections 7 and 8 present the experimental results and related works; Finally, Section 9 sums up the conclusion. 
A Boolean formula F over a variable set V is constructed by connecting variables from V with symbols ¬, ∧, ∨ and ⇒, which stand for logical connectives negation, conjunction, disjunction, and implication, respectively.
The propositional satisfiability problem(abbreviated as SAT) for a Boolean formula F over a variable set V is to find a satisfying assignment A : V → B, so that F can be evaluated to 1. If A exists, then F is satisfiable; otherwise, it is unsatisfiable.
According to Ganai et al. [2004] , the positive and negative cofactors of
Cofactoring is the action that applies 1 or 0 to v to get f v≡1 or f v≡0 .
Given two Boolean formulas φ A and φ B , with φ A ∧ φ B unsatisfiable, there exists a formula φ I referring only to the common variables of φ A and φ B such that φ A ⇒ φ I and φ I ∧ φ B is unsatisfiable. We call φ I the interpolant [Craig 1957 ] of φ A with respect to φ B and use McMillan's algorithm [McMillan 2003 ] to generate it.
MiniSat [Eén and Sörensson 2003 ] is used in this paper to solve all formulas. It generates learned clauses from conflicts, and records them to prevent the same conflict from rising again. It provides an incremental SAT mechanism that can share learned clauses between related formulas to solve them faster. This mechanism includes two procedures: addClause(F) used to add a CNF formula F to the clause database, and solve(A) that solves F with a set of literals A as assumptions.
Finite State Machine
The encoder is modeled by a finite state machine ( The behavior of FSM M can be reasoned by unrolling transition function for multiple steps. The state variable s ∈ s, input variable i ∈ i and output variable o ∈ o at the nth step are respectively denoted as s n , i n and o n . Furthermore, the state, the input and the output variable vectors at the n-th step are respectively denoted as s n , i n and o n . A path is a state sequence < s n , . . . ,
A loop is a path < s n , . . . , s m > with s n ≡ s m .
The Halting Algorithm to Determine If an Input Variable Can Be Uniquely
Determined by a Bounded Sequence of Output Variable Vector Shen et al. [2011] proposed the first halting algorithm that iteratively unrolls the transition function. And for each iteration, it uses two approximative approaches to determine the answer. The first one is an under-approximative one that presented in 2.3.1, while the second one is an over-approximative one presented in 2.3.2. We will show in 2.3.3 that these two approaches will eventually converge to a conclusive answer.
2.3.1. The Under-Approximative Approach. As shown in Figure 2 , on the unrolled transition functions, an input variable i ∈ i can be uniquely determined, if there exist three integers p, l and r, such that for any particular valuation of the output sequence < o p , . . . , o p+l+r >, i p+l cannot be 0 and 1 at the same time. This is equal to the unsatisfiability of F PC ( p, l, r) in Eq. (1).
Here, p is the length of the prefix state transition sequence. l and r are the lengths of the two output sequences < o p+1 , . . . , o p+l > and < o p+l+1 , . . . , o p+l+r > used to determine i p+l . Line 1 of Eq. (1) corresponds to the left path in Figure 2 , while Line 2 corresponds to the right path in Figure 2 . These two paths are of the same length.
Line 3 forces these two paths' output sequences to be the same, while Line 4 forces their i p+l to be different. Line 5 and 6 are the assertion predicates given by the user that constrain the valid valuation on i. PC in F PC is the abbreviation of "parameterized complementary", which means F PC ( p, l, r) is used to check whether the encoder's input can be uniquely determined with the three parameters p, l and r.
According to Figure 2 , the first three lines of Eq. (1) are two unrolled transition function sequences with the same output sequences. They can always be satisfied with the same input variable vectors and initial state vector. And the last two lines are constraints on input variable vectors. We always check their satisfiability before running our algorithm. So the unsatisfiability of F PC ( p, l, r) ( p, l, r) . So, the bounded proof of F PC ( p, l, r) [Shen et al. 2011 ], while our algorithms always can. Second, ignoring initial states improves the decoder's reliability by making the decoder's output depend only on bounded history of its input. Thus, any corrupted o can only affect the decoder for finite number of steps.
Of course, ignoring initial states has one drawback that it is a little bit too stronger than necessary. That is, it requires that i must be uniquely determined on a larger state set R p that is reachable in p steps from any states, instead of on the smaller state set R that is reachable from initial states. In some cases, our algorithm may fail to handle properly designed encoders. But this has never happen on all our benchmarks.
2.3.2. The Over-Approximative Approach. For F PC ( p, l, r) presented in last subsection, there are two possibilities:
(1) i p+l can be uniquely determined by < o p , . . . , o p+l+r > for some p, l and r; (2) i p+l can't be uniquely determined by < o p , . . . , o p+l+r > for any p, l and r at all.
If it is the first case, then by iteratively increasing p, l, and r, F PC ( p, l, r) will eventually become unsatisfiable. But if it is the second case, this method will never terminate.
So, to obtain a halting algorithm, we need to distinguish these two cases. One such solution is shown in Figure 3 , which is similar to Figure 2 but with three additional constraints used to detect loops on the three state sequences < s 0 , . . . , s p >, < s p+1 , . . . , s p+l > and < s p+l+1 , . . . , s p+l+r >. It is formally defined in Eq. (2) with the last three lines corresponding to the three new constraints used to detect loops. (1) On the one hand, if there exists such p, l, and r, then let p := max ( p, l, r), l := max( p, l, r) and r := max ( p, l, r Both cases will lead to this Algorithm's termination. 
IDENTIFYING FLOW CONTROL VECTOR
We will first introduce how to find out the flow control vector f in Section 3.1, and then introduce how to speed up it with incremental SAT in Section 3.2.
Finding Out Flow Control Vector f .
To facilitate the presentation of our algorithm, we assume that the input variable vector i can be partitioned into the flow control vector f and the data vector d.
The flow control vector f is used to indicate the validness of d. So, for a properly designed encoder, f should always be uniquely determined by a bounded sequence of the encoder's output vector o, or else the decoder cannot recognize the validness of d.
Thus, Algorithm 2 is proposed to identify f . At Line 2, a while loop is used to iterate on all i ∈ i. At Line 6, the input variable i that can be uniquely determined will be added to vector f . On the other hand, when i is very long, the run time overhead of testing each i ∈ i one by one would also be very large. To speed up this testing procedure, when F LN ( p, l, r) is satisfiable at Line 8, every j ∈ i that has different values for j p+l and j p+l in the satisfying assignment of F LN ( p, l, r) can also be moved to d at Line 10, because their own F LN ( p, l, r) is also satisfiable.
In some particular case, some data variable d ∈ d can be uniquely determined, just like a flow control variable f ∈ f . In this case, d may be identified as a flow control variable by Algorithm 2. But this doesn't harm our overall framework, because the decoder's Boolean function of d can still be correctly characterized in Section 6.
Speeding Up with Incremental SAT
We can further speed up Algorithm 2 by employing the MiniSat's incremental SAT mechanism mentioned in Section 2.1. F PC ( p, l, r) in Eq. (1) can be partitioned into the p, l, r) following two equations by moving the third line of it into Eq. (4):
Similarly: we can partition F LN ( p, l, r) in Eq. (2) into two equations:
Obviously C PC and C LN are independent of any particular i ∈ i, so they can be added into the clause database of MiniSat solver by calling addClause(C PC ) or addClause(C LN ). At the same time, all clauses in A PC and A LN contain only one literal, so they can be used as the assumptions in calling solve procedure of MiniSat solver.
Thus, with these new equations, we can change Algorithm 2 to Algorithm 3 with incremental SAT. The major changes are the two new addClause in Line 4 and 9, and the two new solve in Line 6 and 11. They are the procedures provided by MiniSat's incremental SAT mechanism mentioned in Section 2.1.
INFERRING PREDICATE THAT ENABLES THE ENCODER'S DATA VECTOR TO BE UNIQUELY DETERMINED
In Section 4.1, we propose an algorithm to characterize a Boolean function that makes a Boolean formula satisfiable. In Section 4.2, we apply this algorithm to infer valid( f ), the predicate that enables d to be uniquely determined.
Characterizing a Function that Makes a Boolean Formula Satisfiable
For a particular Boolean relation R( a, b, t), we have the following two assumptions.
( In the remainder of this article, when we use the algorithm introduced in this section, we will show that these two assumptions are fulfilled.
We need to characterize a Boolean function FSAT R ( a), which covers and only covers all the valuations of a that can make R( a, b, 1) satisfiable. It is formally defined here:
Thus, a naive algorithm of computing FSAT R ( a) is to enumerate all valuations of a, and collect all those valuations that make R( a, b, 1) satisfiable. But the number of valuations to be enumerated is 2 | a| , which will prevent this algorithm from terminating within reasonable time for a large a.
We can speed up this naive algorithm by expanding each valuation of a to a larger set with cofactoring [Ganai et al. 2004] and Craig interpolant [McMillan 2003 ]. Intuitively, assume that R( a, b, 1) is satisfiable with a satisfying assignment A : a ∪ b ∪ {t} → {0, 1}, the following new formula can be constructed by cofactoring [Ganai et al. 2004] : Based on the foregoing discussion, Algorithm 4 is proposed to characterize FSAT R ( a) in Eq. (7). Line 2 checks whether there is still some new valuation of a that can make R( a, b, 1) satisfiable, but hasn't been covered by FSAT R ( a). Lines 4 and 5 assign the value of b from the satisfying assignment to R( a, b, 1) and R( a, b, 0), respectively. This will make b to no longer appear as free variables in these two formulas.
Thus, φ A ∧ φ B in Line 6 is unsatisfiable, and the common variables of φ A and φ B is a. So an interpolant ITP( a) can be generated with McMillan's algorithm [McMillan 2003 ]. ITP( a) is added to FSAT R ( a) in Line 7 and ruled out in Line 2 again. ( p, l, r) and FSAT LN ( p, l, r 
assume ITP( a) is the Craig interpolant of φ A with respect to φ B ;
Each iteration of the while loop in Algorithm 4 adds at least a valuation of a to FSAT R ( a), which means that FSAT R ( a) is a Boolean function that covers a bounded and strictly increasing set of valuations of a. So Algorithm 4 is a halting one.
Inferring valid( f ) that Enables d to be Uniquely Determined
As shown in Figure 4 , we first define ¬FSAT PC ( p, l, r) , a monotonically growing underapproximation of valid( f ) in 4.2.1. And then we define ¬FSAT LN ( p, l, r) , a monotonically shrinking over-approximation of valid( f ) in 4.2.2. And finally we show that they will converge to valid( f ) in 4.2.3. Its correctness will be proved in Section 4.3. ( p, l, r) by collecting the 3rd line of Eq. (9):
Computing Monotonically Growing Under-Approximation of valid( f
By substituting T PC ( p, l, r) 
, we have a new formula:
Obviously,
Thus, a ∪ b is the vector that contains all the input variable vectors < i 0 , . . . , i p+l+r > and < i 0 , . . . , i p+l+r > at all steps for the two sequences of unrolled transition function. It also contains the two initial states s 0 and s 0 . In addition, the transition function T in the first two lines of Eq. (11) is a function that computes the next state and the output variable vector from the current state and input variable vector. So a and b can uniquely determine the value of t in F d PC ( p, l, r, t) . That is, with a defined in (12), b defined in (13) and F d PC ( p, l, r, t) as R( a, b, t), Assumptions 1 and 2 in Section 4.1 are both fulfilled. Thus, for a particular combination of p, l, and r, the Boolean function over f p+l that makes F d PC ( p, l, r, 1) ( p, l, r, t) , a and b defined previously:
So FSAT PC ( p, l, r) ( p, l, r) by collecting the third line and the last three lines of Eq. (15):
By replacing the third line and the last three lines of Eq. (15) with T LN ( p, l, r), we got
Obviously ( p, l, r, t), a, b, t) . This is shown intuitively in Figure 4 . ( p, l, r) and FSAT LN ( p, l, r) converge. In this case, ¬FSAT PC ( p, l, r) is return as valid( f ).
The Algorithm to Compute valid( f
The proofs of its termination and correctness are given in the next subsection.
Proofs of Termination and Correctness
First, we need to prove the following three lemmas. 
As shown in Figure 5, ( p, l, r, 1) by aligning the ( p + l )-step to ( p + l)-step, and discard the two prefix and postfix state transition sequences. Formally, for each ( p, l, r, 1) . By restricting the domain of A to f p+l , we got the fourth satisfying assignment A : f p+l → B. According to the mapping presented previously, we know that A ≡ A. Thus, every A covered by FSAT PC ( p , l , r ) is also covered by FSAT PC ( p, l, r) .
Thus, FSAT PC ( p, l, r) monotonically decreases with respect to p, l, and r. FSAT LN ( p, l, r) .
As shown in Figure 6 , we can map ( p, l, r) that makes d to be NOT uniquely determined for some particular p , l and r . So the only covering case is proved.
MINIMIZING THE LENGTH OF UNROLLED TRANSITION FUNCTION
We first introduce why and how to remove the redundancy of l and r in Section 5.1, and then present another possible structure of our algorithm in Section 5.2, and discuss why we have chosen the one in Section 5.1 instead of Section 5.2.
Minimizing the Valuation of l and r
As Algorithm 5 increases p, l, and r concurrently, there may be some redundancy leading to unnecessary overhead in the decoder's area and delay. According to Figure 2 , r affects both the decoder's delay and area, l affects only the decoder's area, while p doesn't affect the decoder.
So as shown in Algorithm 6, we chose to first minimize r, then minimize l. To simplify the presentation, we will only introduce the r case. In Line 2, when F PC ( p, l, r − 1) ∧ valid( f p+l ) is satisfiable, then r is the last one that makes it unsatisfiable, we return it directly. On the other hand, if r ≡ 0 can still make F PC ( p, l, r ) ∧ valid( f p+l ) unsatisfiable at Line 4, we directly return 0. 
An Alternative Approach for Comparison
In the previous discussion, we first find out the flow control vector by increasing p, l, and r concurrently in Algorithm 3, and then minimize their valuations with Algorithm 6. This approach needs to call SAT solver for O(n) times, with n = max ( p, l, r) .
There is another possible way to do this job in Algorithm 3, that is, increasing p, l and r one by one with three nested loops, instead of concurrently. This approach needs to call SAT solver for O(n 3 ) times, We will show in Section 7.6 that, increasing p, l and r concurrently and then minimized them with Algorithm 6 is much faster than increasing them separately.
CHARACTERIZING THE DECODER'S BOOLEAN FUNCTION
The encoder's input vector i has been partitioned into the flow control vector f and data vector d. The algorithms to characterize their Boolean functions are different, so they are discussed separately in the following two sections.
Characterizing the Decoder's Boolean Function that Computes f
Each variable f ∈ f can be uniquely determined by a bounded sequence of the encoder's output. So, for each particular valuation of the encoder's output sequence < o p , . . . , o p+l+r >, f p+l cannot be 0 and 1 at the same time. Thus, the decoder's Boolean function that computes f p+l is exactly the Craig interpolant of φ A with respect to φ B :
It is obvious that φ A ∧ φ B equals F PC ( p, l, r) To speed up characterizing the decoder's Boolean function for all f ∈ f , we can employ the incremental SAT mechanism in MiniSat again by:
(1) removing f p+l ≡ 1 from φ A and f p+l ≡ 0 from φ B , (2) adding φ A ∧ φ B into the MiniSat's clause database, (3) solving with assumptions f p+l ≡ 1 and f p+l ≡ 0 for each f ∈ f and generating Craig interpolants separately.
Characterizing the Decoder's Boolean Function that Computes d
Assume that the predicate over f inferred by Algorithm 5 is valid( f ). Let's define the following two formulas for each data variable d ∈ d: Furthermore, when valid( f p+l ) doesn't hold, the data variable d ∈ d p+l cannot be uniquely determined. So, no function can be used to calculate its value. But this isn't a problem, because the decoder is supposed to only recover the value of control flow vector f , and ignore the exact value of d in this case.
Similarly, we can also use the incremental SAT approach in Section 6.1 to speed up characterizing the decoder's Boolean function for all d ∈ d.
EXPERIMENTAL RESULTS
We have implemented these algorithms in OCaml language, and solved the generated CNF formulas with MiniSat 1.14 [Eén and Sörensson 2003] . All experiments have been run on a server with 16 Intel Xeon E5648 processors at 2.67GHz, 192GB memory, and CentOS 5.4 Linux. All these experimental results and programs can be downloaded from https://github.com/shengyushen/compsyn. Table I shows all benchmarks used in this article. They come from our previous article [Shen et al. 2012] and from Liu et al. [2012] . Each column of Table I shows, respectively, the number of inputs, outputs, registers and gates of each benchmarks. The area column shows the area of the encoder when mapped to mcnc.genlib library by ABC [Berkeley Logic Synthesis and Verification Group 2008] with script "strash; dsd; strash; dc2; dc2; dch; map". In the remainder of this article, all areas, gate number and delay are obtained in the same setting such that we can compare them to that of Liu et al. [2012] . The last column of Table I shows how we will we present their experimental result.
(1) By studying the five benchmarks used in our previous article [Shen et al. 2012] , we found that most of them have built-in flow control mechanisms. This isn't a surprise to us, because these benchmarks all come from real industrial projects. These three benchmarks will be presented in Section 7.2, 7.3, and 7.4. (2) For all other benchmarks in Table I without flow control mechanism, if their input variables can always uniquely determined, our algorithm can recognize all their input variables as flow control variables, and directly generate their decoder's Boolean functions. Their experimental results will be presented in Section 7.5.
In Section 7.6, we will compare runtime overhead of two different approaches:
(1) increasing p, l, and r concurrently in Algorithm 5, then minimizing in Algorithm 6; (2) increasing p, l, and r separately with three nested loops in Algorithm 5.
In Section 7.7, we will compare run times, areas and delay between the two approaches that do and do not minimize p, l and r by Algorithm 6.
Finally in Section 7.8, we will compare the area and delay between the decoders generated by our algorithm and the decoders written manually. Table II . According to the 8b/10b encoding scheme [Widmer and Franaszek 1983] , when TXDATAK ≡ 0, TXDATA can be any value. But when TXDATAK ≡ 1, TXDATA can only be 1C, 3C, 5C, 7C, 9C, BC, DC, FC, F7, FB, FD and FE. So we write an assertion to rule out those combinations not in this schema. Algorithm 2 identifies f := CNTL T XEnable P0 in 0.49 seconds. Algorithm 5 infers valid( f ) := CNTL TXEnable P0 in 1.21 seconds. Algorithm 6 obtains the minimized p := 4, l := 0, and r := 2 in 0.68 seconds. Finally, generating the decoder's Boolean functions for CNTL TXEnable P0, TXDATA, and TXDATAK costs 0.28 seconds. The decoder contains 156 gates and 0 registers with area 366 and delay 7.6.
The major breakthrough of this article's algorithms is their ability to handle invalid data vector. So, it should be very interesting to show how the invalid data vector is mapped to output variable vector o. By studying the source code of this encoder, we find that, when and only when CNTL TXEnable P0 ≡ 0 holds, that is, TXDATA and TXDATAK are invalid, the output electrical idle variable HSS TXELECIDLE becomes 1. So, the decoder can use the output variable HSS TXELECIDLE to uniquely determine the value of flow control variable CNTL TXEnable P0.
10G Ethernet Encoder XGXS
This encoder is compliant with clause 48 of IEEE 802.3 [IEEE 2012] . It has 214 lines of verilog. Its input and output variables are shown in Table III . This encoder also employs an 8b/10b encoding scheme [Widmer and Franaszek 1983] with two inputs: the 8-bit encode data in to be encoded and 1-bit konstant indicating a controlling character. According to the coding table in Widmer and Franaszek [1983] , when konstant ≡ 0, encode data in can be of any value. But when konstant ≡ 1, encode data in can only be 1C, 3C, 5C, 7C, 9C, BC, DC, FC, F7, FB, FD and FE. So, we write an assertion to exclude those combinations that aren't in this table.
Algorithm 2 Although this encoder uses the same coding mechanism as does the PCI Express 2.0 encoder mentioned previously, the way it handles the invalid data vector is different. This encoder doesn't have a separate output variable to indicate the validness of the output data; instead, the validness and exact value of all input variables are both encoded in encode data out. By studying this encoder's source code, we find that when and only when bad code ≡ 1, that is, encode data in and konstant are invalid, the output variable encode data out will become 0010111101. So the decoder can use the output variable encode data out to uniquely determine the flow control variable bad code.
UltraSPARC T2 Ethernet Encoder
This encoder comes from the UltraSPARC T2 open source processor. It is compliant with clause 36 of IEEE 802.3 standard [IEEE 2012] , and has 864 lines of verilog.
Its input and output variables are shown in Table IV . This encoder also employs an 8b/10b encoding scheme [Widmer and Franaszek 1983] , but with yet another flow control mechanism that is significantly different from that of the previous two encoders. The data to be encoded is the 8-bit txd, but there is no standalone variable to indicate the control symbol. But only a 4-bit tx enc ctrl sel used to define the action to be performed, as shown in Table V . Obviously, the functionalities of the control symbol indication and flow control mechanism are combined in tx enc ctrl sel. The last four cases in Table V can never be uniquely determined, because they cannot be distinguished from the case of 'PCS ENC DATA. So we write an assertion to rule them out.
Algorithm 2 Liu et al. [2012] . By comparing the sum of the second and third columns to the sixth column, we can find that Liu et al. [2012] is much faster than our algorithm, especially for the second column. I think this is caused by the fact that our algorithm needs to spend lots of time to check whether each input variables i ∈ i can be uniquely determined, while Liu et al. [2012] can check all input variables with one SAT solving.
For most benchmarks, our decoder's area and delay are similar to Liu et al. [2012] . For HM(15,11), our decoder is much larger than Liu et al. [2012] . This is caused by the immature implementation of our interpolant algorithm, and can be significantly improved by porting similar codes from other formal tools. For CC 4, our decoder is also much larger and slower than Liu et al. [2012] , we will explain this issue in the following two sections with more details.
Comparing Runtime Overhead of Increasing p, l and r Concurrently and Separately
Algorithm 3 increases p, l and r concurrently, and then reduce them with Algorithm 6. We call it A1 case. Section 5.2 shows another possibility that increases p, l and r separately. We call it A2 case. We compare these two cases in Table VII .
By comparing the total runtime in column 6 and 11, obviously A2 is faster than A1 in most cases. But this doesn't mean that we should use A2 instead of A1.
According to Section 5.2, A1 needs to call SAT solver for O(n) times, with n = max ( p, l, r) . And A2 need to call SAT solver for O(n 3 ) times. For benchmarks with small n, these two approaches don't have too much difference on this overhead. But for benchmarks with larger n, such as T2Eth, their difference is significant. A2 beats A1 in smaller circuits, while A1 wins on larger ones. So we chose A1. CC 4 is the only benchmark with different p, l and r on the second and seventh column. This is caused by the different orders in expanding and reducing l and r. For the A1 case, its decoder contains 14 registers, 206 gates, 490 area and 13.3 delay. For the A2 case, its decoder contains 10 registers, 61 gates, 154 area and 9.6 delay, which is comparable to that of Liu et al. [2012] in Table VI . So A2 case is much better than A1. But this still doesn't mean we need to increasing p, l and r separately. We will talk about this in the next section again and explain why. The experimental results are presented in Table VIII . When Algorithm 6 isn't used, the second to sixth columns give, respectively the p, l, and r valuations, the runtime to generate the decoder, the decoder area, the number of registers, and the maximal logic delay. When Algorithm 6 is used, these experimental results are again presented in the last five columns. While the seventh column presents the time used to minimize l and r. By comparing columns 2-6 with columns 8-12 it is obvious that the decoders area and number of registers are significantly reduced, with significant runtime overhead in reducing l and r shown in the seventh column.
Please notice the CC 4 benchmark again, from the forth to sixth column, we can find that the area and delay is similar to A2 case in the previous section, but its value of p, l and r is similar to the A1 case. So this means that the decoder of CC 4 has at least three significantly different implementations with area 154 in A2 case in previous section, area 365 in the tenth column of Table VIII and area 490 in A1 case of previous section. The smaller one with area 154 and delay 9.6 is comparable to that of Liu et al. [2012] shown in Table VI . Which decoder implementation finally generated for CC 4 is determined by some instable implementation detail in the SAT solver and the Craig interpolant algorithm. So this answers the confusion in the previous section that the circuit quality degradation in the A1 case isn't caused by increasing p, l and r concurrently. We should still increase them concurrently and then minimize them with Algorithm 6.
Comparing Area and Delay for Our Generated Decoders and Manually Written Decoders
The experimental results are presented in Table IX . All our decoders are faster and smaller (or at least similar to) than the manual ones.
RELATED PUBLICATIONS

Complementary Synthesis
The first complementary synthesis algorithm was proposed by Shen et al. [2009] . It checks the decoder's existence by iteratively increasing the bound of unrolled transition function sequence, and generates the decoder's Boolean function by enumerating all satisfying assignments of the decoder's output. Its major shortcomings are that it may not halt and that it has large runtime overhead in building the decoder.
The halting problem was independently tackled in Shen et al. [2011] and Liu et al. [2011] by searching for loops in the state sequence, while the runtime overhead problem was addressed in Shen et al. [2012] and Liu et al. [2011] by interpolant [McMillan 2003 ]. Shen et al. [2012] automatically inferred an assertion for configuration pins, which can lead to the decoder's existence. It can be seen as a special case of Algorithm 5, with the restriction that the inferred assertion must hold on all steps. Our Algorithm 5, on the other hand, is the first algorithm that allows states with and without the inferred assertion to be interleaved freely with each other. Tu and Jiang [2013] takes the encoder's initial states into consideration with property directed reachability analysis [Eén et al. 2011] , so that the encoder's infinite history can be used to generate the decoder's output. This algorithm can handle some encoders that cannot be handled by the state-of-the-art algorithms.
Program Inversion
Program inversion derives a program P −1 that negates the computation of a given program P. So it is very similar to complementary synthesis.
The initial work on program inversion [Dijkstra 1979 ] used proof-based approaches, which could handle only very small programs and very simple syntax structures. Glück and Kawabe [2005] inverted first-order functional programs by eliminating nondeterminism with LR-based parsing methods. But the use of functional languages in that work is incompatible with our complementary synthesis. Srivastava et al. [2011] assumed that an inverse program was related to the original program, so the space of possible inversions can be inferred by automatically mining the original program for expressions, predicates, and control flow. This algorithm inductively rules out invalid paths that can't fulfill the requirement of inversion until only the valid ones remain. So, it can't guarantee the correctness of its solution if its assumptions don't hold.
Satisfying Assignments Enumeration
Some algorithms try to enumerate all satisfying assignments faster by enlarging each complete satisfying assignment. McMillan [2002] constructs an alternative implication graph in SAT solver, which records the reasoning relation that leads to the assignment of a particular variable. All variables outside this graph can be ruled out from the complete assignment. In Ravi and Somenzi [2004] and Chauhan et al. [2004] , those variables whose absence can't make obj ≡ 0 satisfiable are removed one by one. In Shen et al. [2005] and Jin and Somenzi [2005] , conflict-analysis-based approaches are used to remove multiple irrelevant variables in one SAT run. In Grumberg et al. [2004] , the variable set is divided into an important subset and an unimportant subset. Variables in the important subset have higher decision priority than those unimportant ones. Thus, the important subset forms a search tree, with each leaf being another search tree for the unimportant set. Ganai et al. [2004] qualifies out unimportant variables by setting them to constant value returned by the SAT solver.
Other algorithms constructs interpolations to cover more satisfying assignments. Jiang et al. [2009] constructs a first formula that contradicts with another formula, from which an interpolation can be derived and used as an over-approximation of the first formula. Chockler et al. [2012] generates interpolation with a framework similar to the iterative enumerating and enlarging approaches mentioned previously. But there are two enlarging steps for each enumerated assignment, in which the assignments are enlarged with respect to the two formulas involved in constructing interpolant. It is the first article that constructs interpolant without proof.
CONCLUSIONS
In this article, we propose, for the first time, a framework to handle flow control mechanism in complementary synthesis problem. Experimental results indicate that our framework can always successfully handle many complex encoders from real industrial projects, such as PCI Express [PCI-SIG 2009] and Ethernet [IEEE 2012] .
