Abstract-The rapid advancements of synthetic biology show promising potential in biomedical and other applications. Recently, recombinases were proposed as a tool to engineer genetic logic circuits with long-term memory in living and even mammalian cells. The technology is under active development, and the complexity of engineered genetic circuits grows continuously. However, how to minimize a genetic circuit composed of recombinase-based logic gates remain largely open. In this paper, we formulate the problem as a cubic-time assignment problem and solved by a 0/1-ILP solver to minimize DNA sequence length of genetic circuits. Experimental results show effective reduction of our optimization method, which may be crucial to enable practical realization of complex genetic circuits.
I. INTRODUCTION
Synthetic biology aims at engineering living organisms to behave in some desired manner [1] . It rapid advancements have shown promising applications in biomedicine, bioenergy, and other areas of societal interest. Recombinases, a kind of genetic recombination enzymes common in nature controlling gene expression and modifying genome structures in living organisms, have been exploited as a biotechnology for genetic engineering [2] , [3] . A recombinase binds to a DNA strand at its specific target sites with respect to its unique recognition nucleic acid sequences. It may enable excision/insertion, inversion, translocation and cassette exchange on DNA sequences. The inversion function of recombinases has been exploited to construct two-input logic gates in Escherichia coli cells with long-term memory [3] . It provides a fundamental tool for logic circuit synthesis using recombinases. In [4] , two mechanisms, tyrosine recombinase DNA excision and serine recombinase DNA inversion are exploited to implement simple arithmetic logic units and Boolean functions such as full-adder, substractor, and decoder in mammalian cells. Moreover, with the help of genome editing tools such as CRISPR/Cas9 systems [5] , complex genetic circuits have great potential to be implemented in the future. The automation for large scale genetic circuit synthesis and optimization becomes indispensable.
In the prior work [6] , a library consisting of 44 recombinase-based gates with up to three inputs is built, and existing logic synthesis tools are adopted to perform optimization and technology mapping for genetic circuit construction. There are several shortcomings of prior approach [6] . Conventional technology mapping algorithms often assume that a logic gate has a single output. In fact, as we shall show later it is natural to consider a genetic logic gate with multiple outputs. Technology mapping under the single-output assumption may yield sub-optimal results. In this work, we aim to overcome the above shortcomings. Our main results include formulating the gate merging problem targeting multi-output gate utilization, building a library of merge-friendly standard cells for logic synthesis of recombinase-based genetic circuits, showing the tractability (cubic time complexity) of gate merging, and solving it with 0/1-integer linear programming (0/1-ILP). Experimental results show promising reduction on the DNA sequence length and level of the synthesized genetic circuits. Note that minimizing the total DNA sequence length is important because a shorter DNA sequence is more likely to succeed in vector insertion into the host cell for the intended computation. On the other hand, minimizing the number of genes and the depth of protein-production cascade is important because protein production in the host cell causes metabolic burden and takes long time due to the additional translation and induction steps. Our methods can be crucial to practical realization of complex genetic circuits for successful applications.
II. PRELIMINARIES
A (combinational) logic circuit is a directed acyclic graph G(V, E) with the set V of nodes and a set E ⊆ V × V of directed edges. A node v ∈ V can be a primary input (PI), primary output (PO), or a logic gate. For each directed edge (u, v) ∈ E, u is a fanin of v, and v is a fanout of u. The fanin (resp. fanout) set of a node v is denoted as FI (v) (resp. FO(v)). A node with no fanin (fanout) is a PI (resp. PO). A logic gate v is associated with a Boolean function that determines the output value of v depending on the values of its fanins.
A recognition site pair associated with a site-specific recombinase is a pair of special DNA sequences, including the attB (attachment site bacteria) and the attP (attachment site phage), targeted by the recombinase. When the recognition site pair is bound by the associated recombinase, an irreversible inversion occurs. The DNA subsequence sandwiched by the recognition site pair, including part of the recognition site sequences attB and attP, is inverted. As a result, the recognition site sequences attB and attP after the inversion are irreversibly 
altered into different products attR and attL, respectively. As shown in Fig. 1 , taken from [6] , the inversion mechanism is illustrated in Fig. 1(a) , where the triangle pair denotes the recognition site pair.
To see how recombinases can be exploited for logic circuit construction, consider the two-input AND gate shown in Fig. 1(b) , where the right-turn arrow denotes a promoter, the two T letters denote terminators, and the green box represents the gene encoding the green fluorescent protein (GFP). (In this paper we read a DNA sequence from left to right assuming the 5'-to-3' direction of the coding strand.) Notice that the first (resp. second) terminator is sandwiched by the recognition site pair of recombinase Bxb1 (resp. phiC31), and recombinases Bxb1 and phiC31 are induced by molecules AHL and aTc, respectively. By the transcription process, the RNA polymerase (RNAP) binds to the promoter and traverses the DNA template strand until a terminator is encountered. Accordingly, the output GFP can be highly expressed only when both terminators are inverted (i.e., disabled) by their respective control recombinases Bxb1 and phiC31. Because the output GFP is expressed if and only if both input molecules AHL and aTc are of high concentration to induce the production of recombinases Bxb1 and phiC31, the DNA sequence effectively implements a two-input AND gate.
In this work, we consider DNA sequences built by specific DNA segments, referred to as DNA units, including promoters, terminators, genes, inverted promoters, inverted terminators, and inverted genes, denoted as P , T , G, P , T , and G , respectively. Table I shows some simple gates and their associated Boolean functions and (abstracted) DNA sequences. Assuming each DNA unit have similar lengths, the cost of a logic gate is reflected in terms of the number of the DNA units in the sequence. Although our subsequent discussion uses this simple cost metric, our methods are valid to other more accurate models. Note also that there may be multiple different DNA sequences associated with the same Boolean function. For example, both sequences P a T b G and P T a T b G implement AND2 gate. However, the former is preferable to the latter because of its lower cost.
III. MULTI-LEVEL CIRCUIT SYNTHESIS
Multi-level implementation of genetic circuits can effectively avoid DNA length blow up. The number of logic levels corresponds to the depth of protein-production cascade in a circuit. Under the multi-level implementation, the output gene of a logic gate may correspond to an inducer of its fanout gate. The multi-level structure allows effective logic sharing at the cost of computation delay due to the cascading of protein production.
A. Gate Merging for Circuit Optimization
To embed a genetic circuit in a DNA molecule for vector insertion into a living cell to conduct the desired computation, the well-formed sequences of the constituent logic gates should be concatenated. Because the transcription process of an RNAP would not stop without a terminator, a terminator should be added at the end of the well-formed sequence of each logic gate to avoid undesired interference between logic gates when their sequences are concatenated. In fact, as we discuss below, there are circumstances under which some of these blocking terminators and output genes can be safely removed.
Consider two logic gates
When RNAP can traverse through gene X, the inducer x is highly expressed and in turn triggers the inversion of P x to a promoter unit P for v. The effect is the same as letting the RNAP continue its traversal through the first DNA unit of v. On the other hand, when RNAP cannot traverse through gene X, the inducer x is not expressed and the first DNA unit P x of v remains an inverted promoter P . The effect is the same as ignoring the first DNA unit of v. Effectively, we can remove the blocking terminator of u and the first DNA unit P x of v. Furthermore, if v is the sole fanout of u, that is, inducer x does not control any other DNA unit except for v's first DNA unit, then gene X can be removed from the sequence of u. For example, let u be a logic gate with output X = NAND2(a, b) and v be a gate with output Y = AND2(x, c), where X is the gene of inducer x. Let the DNA sequences of u and v be P a P b XT and P x T c Y T , respectively. Then the combined DNA sequence can be simplified as follows.
This simplification reduces the DNA sequence by two to three DNA units (depending on whether inducer x is used elsewhere) and reduces one logic level in the cascading of gate u to gate v. The above observation leads to the following optimization problem.
Problem Statement 1: Given a logic circuit G(V, E)
, find a well-formed sequence implementation for each logic gate of V , and merge gates u and v with (u, v) ∈ E such that the total DNA sequence length and the depth of protein-production cascade are minimized.
B. Gate-Merging Algorithm
We propose an algorithm to exactly solve a simplified version of Problem 1, assuming the DNA sequences of the logic gates are already given and focusing on sequence length minimization. As will be seen in the experiments, the objective of sequence length minimization also effectively reduces the depth of protein-production cascade. Given a combinational circuit G(V, E) with gates V = {g 1 , g 2 , . . . , g n } and edges E ⊆ V × V . (Here we exclude PIs and POs from V .) We generate a mergeability graph G (V, E ), where E ⊆ E for an edge (g i , g j ) ∈ E if and only if gate g i can be combined with gate g j . The graph G (V, E ) denotes the mergeability relation over the logic gates. A path {N 0 , N 1 , . . . , N k } in the graph G (V, E ) signifies the feasibility of merging N 0 with N 1 , then merging with N 2 , and so on, and finally merging with N k . That is, the logic gates N 0 , N 1 , . . . , N k are merged into a single multi-output genetic gate. Hence the optimization problem is equivalent to the weighted path covering problem [7] , that is, finding a set of disjoint paths that covers all the nodes of G (V, E ) while the total cost of the paths is minimized.
The path covering problem can be transformed to the minimum assignment problem [8] and can be solved in cubic time using the Hungarian algorithm [9] . However, we reformulate it as a 0/1-ILP problem to take advangtage of the highly engineered modern ILP solvers. Let x i be a Boolean variable to indicate whether a gate g i is not merged with one of its fanouts. The gate g i is not merged with one of its fanouts if and only if variable x i valuates to 1. For each gate g i , its cost C(g i ) can be derived as follows by the discussion in Section III-A.
where |FI (g i )| (resp. |FO(g i )|) denotes the number of fanins (resp. fanouts) of gate g i in the circuit. Also we use a Boolean variable x i,j to denote whether g i is merged with its fanout g j . The gate g i is merged with its fanout g j if and only if variable x i,j valuates to 1. We impose the condition that a gate can only be merged with at most one of its fanouts; also a node can only be merged with at most one of its fanins. These conditions translate to the following 0/1-ILP formulation.
subject to
Under this formulation, there are 2n constraints, and n + gi∈V |F O(g i )| variables, where n is the number of logic gates in the circuit. Notice that the simplicity of the constraints of Eq. (3) and (4) allows very efficient solving as will be evident in the experiments.
To illustrate, consider the circuit of Fig. 2(a) . We are given the DNA sequences of the gates as follows. where G i and g i denote the output gene and its corresponding inducer, respectively, of gate G i . The corresponding mergeability graph of the circuit is shown in Fig. 2(b) . Before gate merging, the total cost is 29 DNA units. To perform gate merging, the following 0/1-ILP instance is formed. 
.
Notice that in the above merging of gates G2, G4 and G 7 , we exploit the fact that G 4 is input symmetric, and rewrite the sequence
C. Overall Synthesis Flow
The overall synthesis flow of recombinase-based genetic circuits is shown in Fig. 3 . Given an input circuit, it first undergoes technology-independent optimization and then technology mapping. After technology mapping, the exact gatemerging algorithm presented in Section III-B is applied to minimize the total sequence length.
To maximize the chance of merging gates, we construct a library including CONST0, CONST1, BUF, NOT, AND2, AND3, AND4, AND5, OR2, OR3, OR4, OR5, IMPLY and NOTIMPLY for technology mapping. The library cells mainly have the functions that are associated with sequences started with a reversed promoter P controlled by some inducer. Note that the sequence of NOT gate does not start with P and cannot be combined with other gates. However it is needed for the completeness of technology mapping. Note that IMPLY and NOTIMPLY gates are not input symmetric, i.e., their two inputs are order dependent. For IMPLY(a,b) , only the fanin gate at input b can be combined with the IMPLY gate; for NOTIMPLY(a,b), only the fanin gate at input a can be combined with the NOTIMPLY gate. Note also that the costs in Table I differ from those in our library by 1 due to the inclusion of a blocking terminator at the end of each gate sequence as discussed in Section III-A.
IV. EXPERIMENTAL EVALUATION
Our algorithm was implemented in the Berkeley synthesis and verification tool, ABC [10] , while CPLEX [11] was adopted for 0/1-ILP solving. The experiments were conducted on a Linux machine with a Xeon 3.4 GHz CPU and 32 GB RAM. We compared the circuits synthesized by our method against those synthesized by prior work [6] . Notice that the library of [6] , which consists of 44 standard cells with Boolean functions associated with 3-variables bwfs, omits the cost of output genes and blocking terminators (discussed in Section III-A). To have fair comparison, we recalculated the circuit cost of [6] . Essentially for each logic gate, an extra cost of 2 has to be added. Table II compares our synthesis method with prior work [6] . In the table, Column 1 lists the circuit name; Column 2 shows the gate count; Column 3 shows the numbers of PIs and POs, Columns 4 and 5 show the number of sequence units and the largest level of required protein-production cascade in the synthesized DNA sequence of [6] ; Columns 6, 7, 8, 9, 10, and 11 show the number of sequence units after technology mapping and before the combining operation, the number of sequence units after the combining operation, the largest level of required protein-production cascade, the total runtime in CPU seconds, the number of ILP variables, and the number of ILP constraints, respectively, of our method. Note that our library used in technology mapping is more constrained than prior library [6] . Our method tries to combine gates after technology mapping to further reduce the length of DNA sequence and the delay of output protein production. Although the sequence lengths after technology mapping (Column 6) are worse than those of prior work (Column 4), the following combining operation effectively reduces the lengths (Column 7) and achieves 11-21% DNA sequence length reduction compared to [6] . As a by-product, it also achieves 18-67% reduction in the level of proteinproduction cascade. For the computation time, most of the runtime was spent on ILP solving, and the entire computation takes less than a second for every benchmark circuit.
