This article presents the development of configurations for bio-inspired self-healing cellular arrays known as Embryonics (embryonic electronics). In the Embryonics design, the configurations are employed to define the functionality and connections of each cell. However, developing configurations of the Embryonics is a time-consuming and challenging work due to lack of effective tools. In this article, an approach is proposed to develop configurations using graphic mapping, which also optimizes the length of configurations for the Embryonics. Using metric embedding, the problem of configurations is exactly formulated to binary quadratic assignment problem and routing problem with constraint of the Embryonics architecture. Since binary quadratic assignment problem is nondeterministic polynomial-time hard, a genetic algorithm is used to tackle this problem for achieving high-quality placement. Due to the limitation of communication bandwidth, how to resolve congestion is also an important issue. An improved ant colony algorithm is presented to realize routing of the Embryonics based on the result of placement. Configurations of the Embryonics are formed according to the result of placement and routing. Experimental result on a 4 3 4 multiplier demonstrates that developing configurations for lookup table-based Embryonics using graphic mapping can lower the difficulty of the Embryonics design and optimize placement and routing of Embryonics.
Introduction
The reliability of electronic systems has always been an issue with electronic engineering. Common ways to improve the reliability, such as dual modular redundancy and triple modular redundancy, usually have high resource overhead and poor fault-coverage. 1 Embryonics (embryonic electronics) is a promising alternative to design highly robust integrated circuits. 2, 3 It aims to make the digital hardware obtain some of the features of biological organisms such as selfreplication and self-repair. However, designing an Embryonics system is a hard job due to lack of computer-aided design (CAD) tools. In its present state, Embryonics is usually implemented using field programmable gate array (FPGA). It synthesizes and simulates using FPGA design software such as Xilinx ISE Foundation. However, the early stages of designing Embryonics have to specify the function and connectivity of each cell by hand. This process is also called development of configurations for the Embryonics. In many cases, designers have an approximate idea of the Embryonics architecture and cell structure but little information about the specific function of each cell and the interconnections between cells. These make the development of configurations be a challenging and time-consuming task, especially in complex applications. Moreover, it is prone to increase the length of configurations determined by hand, which means to increase hardware overhead. Therefore, a design tool is needed to help designers develop the configurations for the Embryonics.
In the early stage of the Embryonics, the implementation of the Embryonics is based on the MUXTREE architecture. 3 The function of each cell is a multiplexer with two inputs and one output, and the connectivity between cells is limited. Thus, a digital circuit can be converted into multiplexer networks via ordered binary decision diagrams. A genetic algorithm is used to define the functionality and connectivity of cells by mapping a multiplexer network into a MUXTREE Embryonics. 4 With the development of electronic technique, most implementation of the Embryonics is based on lookup table (LUT) architecture 5, 6 at the present time. In LUTbased Embryonics, the function of each cell has a LUT which has four or six inputs and one output, and the number of data lines connecting cells to each other is large. It is hard to develop configurations of LUTbased Embryonics using the method applied by OrtegaSanchez et al. 4 In the design of a LUT-based Embryonics, a specific application can be represented to a task graph according to various function modules predefined by the Embryonics. Then, the functionality of cells is defined by mapping a task graph into the architecture of the Embryonics using concepts from graph embedding and metric geometry. 7 The connectivity between cells can be also defined by routing path allocation based on the placement. Thus, configurations of the Embryonics can be developed by means of placement and routing.
The placement of the Embryonics is similar to the one in FPGA and networks-on-chip (NoC). There exist quite a multitude of approaches for placement, such as simulated annealing, 8 StarPlace, 9 and HeAP. 10 Analytical placement using graph mapping shows significant scalability for the large design compared to traditional simulated annealing-based placement methods. Convex assigned placement for regular integrated circuit (CAPRI) is used as a global optimization step to produce a good initial placement using graph embedding concepts to accurately model routing architectures during FPGA placement. 11 An architecture-aware analytic mapping (A3MAP) algorithm has been proposed for NoC placement with homogeneous and heterogeneous cores on regular and irregular mesh or custom architecture. The task mapping problem is solved by two effective heuristics, a successive relaxation algorithm to achieve short runtime for mapping and a genetic algorithm to achieve high mapping quality. 12 However, in the Embryonics, some cells such as input cells have specific structure to deal with additional signals. The input nodes in the task graph are required to map to the input cells in the Embryonics. Therefore, an application mapping algorithm should tackle these requirements during the placement of Embryonics.
Based on the placement, the routing, which also exists in the FPGA design flow, is used to determine the communication path among cells. Due to the constraint of communication bandwidth in the Embryonics, how to resolve congestion is the most important issue. A simple iterative maze-running technique 13 is presented to improve the quality of wiring around congested areas by ripping up and rerouting every net in every iteration. A breadth-first search is used to perform routes in a random order. In the face of failures, a penalty factor is introduced to select the routes which need to be ripped up.
14 A coarse graph expansion has been used to address the issue of scare routing resources by considering the side effects that the routing of one connection has on another. 15 If some connections are not able to be achieved, individual connections would be removed and rerouted to resolve routing conflicts. In essence, the schemes of ripping up and retrying are a key factor to avoid congestion during routing. The success of a route is dependent not just on the choice of which nets to reroute but also on the order that the rerouting is done. 16 In this article, an approach is presented to develop configurations determining the function and connectivity of cells in the Embryonics by means of placement and routing. Our approach is divided into two steps. In the step of the placement, an interconnection matrix is first developed to model application task graph according to function modules. Then, an analytic distance metric of Embryonics architecture is constructed in terms of the shortest links between any two cells. Next, based on matrix projections, the application task graph is embedded to the metric space in order to minimize the total links on Embryonics and ensure routability simultaneously. The objective of this step is to find a permutation matrix representing a valid placement which meets the constraint of communication bandwidth on the Embryonics. In the step of the routing, a ripping up and retrying scheme for congested areas is applied. Also, a heuristic algorithm with a penalty function is used to implement the routing based on the placement. A routable net will be achieved through multiple iterations of this algorithm.
The remainder of this article is organized as follows: section ''Embryonics architecture and design flow'' introduces the architecture and a typical design flow of the Embryonics; section ''Model of graphic mapping'' describes the model of graphic mapping based on task graph and architecture graph; section ''Problem formulation'' formulates the placement to assignment problem using metric embedding and defines the constraint of the routing; in section ''Implement issues,'' a genetic algorithm and an improved ant colony algorithm (IACA) are proposed as the efficient solutions for the mapping problem; in section ''Experimental results,'' a 4 3 4 binary multiplier is implemented to demonstrate the feasibility of developing configurations of the Embryonics by graphic mapping; and, finally, the conclusion is presented in section ''Conclusions and future work.''
Embryonics architecture and design flow
Architecture of Embryonics tions. An application is performed by cells cooperated with each other in the array. Therefore, each cell needs to carry out certain function and transmit the result to other cells. In addition, each cell is able to detect errors by itself. When a failure in a cell is detected, the selfrepair process is activated, and the fault cell will be replaced by a space cell, so that the overall function of the original array is preserved.
It is worth mentioning that for the ease of design of the cells, the first row and the last row of the Embryonics usually have an additional multiplexer to select input signals. Thus, the functional unit in these cells is a bit different from other cells. More information about the Embryonics architecture can be found in Mange et al. 3 and Seffrin and Biedermann.
17

Design flow of an Embryonics
Similar to the design flow of a MUXTREE Embryonics, 4 the following steps are necessary to design a LUT-based Embryonics:
Step 1: Define the type of functions implemented by cells;
Step 2: Specify the application as a set of functions with communication between them;
Step 3: Construct a task graph from the description;
Step 4: Place the task graph into the Embryonics;
Step 5: Route the Embryonics based on the placement; Step 6: Generate configurations of the Embryonics by determining the function and connectivity of each cell;
Step 7: Design the Embryonics according to configurations;
Step 8: Synthesize and simulate the design by FPGA design tools.
Steps 1-5 of the design have to be developed by hand. Due to the constraint of communication bandwidth, steps 4 and 5 are the most challenging and time-consuming. This article develops an automatic algorithm for steps 4 and 5.
Model of graphic mapping
Step 4 in the design of an Embryonics is implemented by graphic mapping which is viewed as an embedding of a graph representing an application into a designed metric space representing the architecture of the Embryonics. Figure 2 shows the system model of the graphic mapping.
Task graph for an application
According to functions defined by the cells of the Embryonics, an application is specified as a collection of functions with communication between them. It is modeled as a directed graph called task graph Q = (V , E) as shown in Figure 2 (a). It starts from the input nodes and terminates when the output nodes issue the result. Each node v i 2 V represents a function of the application with a unique ID between 1 and n marked in the upper right of the node. A directed edge e i, j 2 E represents that node v i receives an input from node v j with vol(e i, j ) representing communication volume between v i and v j .
Architecture graph for an Embryonics
A regular mesh directed graph called architecture graph S = (C, L) is constructed to represent the Embryonics as shown in Figure 2 (b). Each square box (depicted in white) in the graph corresponds to a cell in the Embryonics with a unique coordinate. Each cell c i 2 C is also uniquely identified by an integer i (depicted in light gray) called the cell ID which is a one-to-one correspondence with the cell coordinate. Each directed edge l i, j 2 L represents directed data line between c i and c j transferring the results from c i to c j . bw(l i, j ) represents the communication bandwidth between c i and c j shown by a number on the directed edge. The cells (depicted in dotted box) in the first and the last rows are used to deal with input and output signals, respectively.
Graphic mapping model
The graphic mapping model is developed based on task graph and architecture graph. For ease of the design, we assume that (a) the number of nodes in the task graph is equal to the number of cells in the architecture graph and (b) the nodes dealing with input and output signals (e.g. nodes 1 and 13) are assigned to the first and the last rows in architecture graph, respectively. The graphic mapping model is constructed to obtain a placement matrix which aims to minimize the total connections of the embedding and ensure the routability. In the graphic mapping model, an interconnection matrix is first developed to model task graph. According to a graph-drawing technique 18 using distance matrices, an analytic distance metric representing the architecture graph is then constructed in terms of the minimal links between any two cells. Next, based on matrix projections, the interconnection matrix is embedded to the metric space of the architecture graph in order to minimize the total connections. In order to ensure the routability, the graphic mapping must satisfy the constraint of communication bandwidth. The result of the placement is a permutation matrix.
Problem formulation
The formulation of the placement is in the form of an assignment problem. The quality of placement influences the quality of routing. The higher the quality of placement, the more achievable the routing will be. The formulation of the routing, which implements interconnections between cells with adjacent connections, is to control routing congestion based on the placement. Based on the interconnection matrix of the task graph and the permutation matrix, a matrix representing the interconnections between cells after the placement is obtained. The matrix contains the information about adjacent connections and long-distance connections. The routing is to find an interconnection matrix containing only adjacent connections and satisfy communication bandwidth in the architecture graph.
Formulation of placement
According to the task graph, a n 3 n interconnection matrix M t is constructed, 5 where M t (i, j) is equal to vol(e i, j ) as shown in Figure 3 . Each row of M t represents interconnections between a node and all other nodes in the task graph. The M t encapsulates interconnection relations for the entire task graph and represents the metric space of the task graph.
As stated previously, the placement problem is equivalent to determine the assignment of the node ID to the cell ID. This assignment can be mathematically represented by an n 3 n permutation matrix P. The row indices and the column indices in P represent the node ID and the cell ID, respectively. For example, if P(k, i) = 1, then the node k is assigned to the cell i. According to the hypothesis of the graphic mapping model, only one element in each row and each column of P is set to 1; all others must be 0. The action of P on the task graph is represented by D p = P T M t P, which represents the interconnection of nodes on the architecture graph after placement. For instance, if D p = 2, then the cell i connects to the cell j with communication volume of 2. In addition, according to the hypothesis (b) of the mapping model, some nodes dealing with the input or output signals, represented as set G in the task graph, are required to assign to some specific cells represented as set H in the architecture graph.
According to the one-to-one correspondence between the cell ID and the cell coordinate, the minimal links between two cells are calculated by the cell coordinates in case no faulty cell exits, as shown in equation (1) i, j) . Therefore, the placement problem is formulated mathematically to find a permutation matrix P to minimize f obj as shown in the following equation In order to ensure the feasibility of routing, D p needs to satisfy jP T M t P À M a j ! 0, where M a is an interconnection matrix representing the connections of the architecture graph. Thus, the constraints of the placement problem are shown as follows
The constraints on the elements of P restrict the solution space of our formulation to a nonconvex set. Thus, convex optimization techniques, such as gradient descent, are not suitable to be directly applied to solve this problem. 12 In fact, this type of formulation is known as binary quadratic assignment problem (BQAP) that is nondeterministic polynomial-time (NP)-hard. 19 We address BQAP by a genetic algorithm with staged research. The detail of this approach is described in the subsequent section.
Formulation of routing
The object of routing is to find a legal set of paths that accomplishes the required interconnections between cells based on the placement. It is clear that small amount of total links obtained in the process of the placement will reduce the degree of routing congestion. A routing matrix satisfying communication bandwidth will be obtained based on a valid result of placement even if it is not the optimal result. Thus, a feasible connection will be accomplished by repeated iterations of a routing algorithm. In order to improve the quality of global wiring, a scheme of ripping up and retrying around congested area is needed.
Our overall routing methodology is a two-phase approach. In phase 1, the path with source cell and destination cell in one line will be routed directly in order to obtain the shortest path. Phase 2 is a routing for long-distance connections. Since the long-distance connections denoted in the permuted interconnection matrix D p cannot be routed directly on the architecture graph, they are needed to be divided into multiple adjacent connections. Thus, all long-distance connections are decomposed into adjacent connections and then routed with adjacent connections step by step in phase 2. When the routing is accomplished, D p is transformed into a routing matrix M r containing only adjacent connections. Similar to interconnection matrix M t for a task graph, a n 3 n interconnection matrix M a for an architecture graph is constructed. The M a contains adjacent connections for the entire architecture graph and represents the communication bandwidth in the architecture graph. The objective of the routing is to obtain a routing matrix M r to satisfy M c = jM a À M r j ! 0. In other words, all elements m i, j 2 M c are no less than 0.
Implement issues
The process diagram of our approach implementing the placement and routing is shown in Figure 4 . First, a genetic algorithm with staged research is applied to find a permutation matrix P for the placement problem formulated in equation (2) . Then, an IACA is presented to accomplish the routing based on the placement. The approach is repeated until there is no improvement for k-times iterations. Finally, a permutation matrix providing an optimized performance for the placement and a routing matrix are obtained.
Genetic algorithm for placement
Equation (2) is in the form of a combinatorial optimization problem. A genetic algorithm is suitable to solve optimization problems. 20 A genetic algorithm is used to explore the design space efficiently for task assignment, mapping, and routing path allocation. 21 The authors used a genetic algorithm to achieve high-quality mapping for NoC. 12 However, due to the constraints of equation (6), we need to select appropriate schemes which fit well in our BQAP formulation.
According to the binary constraints on the elements of permutation matrix P, a valid P can be achieved from multiple columns swapping in a given valid solution. In addition, since a node in the task graph cannot be assigned to any cell in the architecture graph, the schemes of encoding, crossover, and mutation in our genetic algorithm are required to make the columns of P swap in the right scope. We use a specific dividing and merging rules to create an offspring. Algorithm 1 is the pseudo-code of our genetic algorithm for BQAP.
Encoding. P is naturally coded as a string of integers called chromosome. For example, a 6 3 6 shown in Figure 5 is coded as ½365214. In a chromosome, the number q and its site i represent P(q, i) = 1. The encoding for a permutation matrix P makes the operation of crossover and mutation be more subtle.
Crossover. A crossover is to produce a new feasible solution with the good characteristics of parent chromosomes. Since some nodes in the task graph are required to assign to some specific cells in the architecture graph, the chromosomes are divided into substrings for crossover. Then, new chromosomes are constructed by the new substrings. An example of crossover is given below with two substrings, as shown in Figure 6 . First, parent 1 is divided into two substrings s1 and s2. Similarly, parent 2 is divided into s3 and s4. Then, alleles between the first site to the random cross point in s3 are copied to the first part of s1. Next, a new substring o1 is produced after the repetitive alleles are removed. Similarly, o2, o3, and o4 are produced in this way. Finally, o1 and o2 and o3 and o4 constitute child 1 and child 2, respectively. Mutation. A mutation operation is performed for each child. Similar to the operation of crossover, a child is divided into substrings. Then, two numbers in a substring randomly selected are swapped to generate a new substring. Then, the new substrings constitute a new child with two numbers swapped. The swapping is valid only if it reduces the number of the total links. After the mutation operation, one of the two children with the minimum links is chosen as the parent 1 for the next evolution. 
Algorithm 1. Placement function ().
Input: M t : an interconnection matrix for task graph D d : a distance matrix for the minimal links between any two cells. Output: P: an optimal permutation matrix.
f obj : the total connections between cells after placement. begin generate legal permutation matrix P arbitrarily; calculate f obj by Equation (2); generate parent 1 by encoding matrix P; while run times <k-times do generate a valid parent 2 arbitrarily; (child 1, child 2) = crossover (parent 1, parent 2); (child 1, child 2) = mutation (child 1, child 2); calculate f obj of child 1 and child 2; parent 1 = minimum (child 1, child 2); Figure 5 . Encoding for a permutation matrix P.
IACA for routing
There are two types of connections, adjacent connections and long-distance connections, indicated by the permuted interconnection matrix D p after placement. For the adjacent connections, they are connected directly in our routing process because these are the shortest paths. The long-distance connections indicated in the permuted interconnection matrix D p comprise two types of connections. For the first type of connections, the source cell and the destination cell are nonadjacent but in the same row or in the same column. In our routing scheme, these long-distance connections are divided into adjacent connections in the same row or in the same column in order to minimize the total links of connections. These adjacent connections are the shortest paths for this type of long-distance connections. Thus, a new permuted interconnection matrix D p , which is the foundation of the subsequent routing, is obtained after routing these two types of connections. For the second type of connections, the source cell and the destination cell are neither in the same rows nor in the same columns. Communication bandwidth makes the routing quite difficult. The difficulty in the routing increases as the complexity of the connections grows. Moreover, the connections interact with each other. It is hard to find proper paths for the connections manually or by an exhaust algorithm. Ant colony algorithm can search on several computational threads based on local information and global information on the quality of previously obtained result. 22, 23 Thus, an IACA is used to realize the long-distance connections. It takes full advantage of prior information about the connections to find a promising solution. For a longdistance connection, the shortest distance between a source cell and a destination cell is fixed in case no faulty cell exits. Thus, the IACA is not used to find a shortest path but to search a suitable path which satisfies communication bandwidth.
Each ant has the coordinate of source cell c s and destination cell c d and moves from c s to c d by forwarding from one cell to the next one. It can only move along the closest direction to destination cell. The ants choose direction according to a probability which is a function of local pheromone trails. An ant located in cell c i uses pheromone T ij of cell c j 2 allowed k to compute the probability p where allowed k is the allowed list of ant k. The pheromone T ij , associated with the edge joining cells c i and c j , is updated as follows
where r is the evaporation rate, m is the number of ants, and DT k ij is the quantity of pheromone laid on edge (i, j)
where F is a constant and L k is the length of the path constructed by ant k. For all long-distance connections, a colony of ants move from the source cells to the destination cells until all ants complete the tour. When all the long-distance connections are routed, a routing matrix is formed. The configurations for Embryonics are developed if jM a À M r j ! 0, if not the routing process is repeated. A penalty-pheromone DT co ij is added to the congested areas to decrease the chances to be selected in a retrying routing process. The penalty-pheromone is given as follows A requested routing matrix will be found by repeated iterations of the scheme of ripping up and retrying. In our algorithm, it may need many ants to find valid paths. Most of them find invalid paths and deposit pheromone trails on those paths, which increases the probability to find a valid path by a subsequent ant.
Experimental results
Implementing a Embryonics 4 3 4 multiplier
In order to demonstrate the development of configurations for LUT-based Embryonics, a universal 4 3 4 binary multiplication function with four inputs was built. This case also acts as the building block to construct any proportional-derivative (PD), proportionalintegral (PI), or proportional-integral-derivative (PID) controller. 24, 25 Although this is a relatively simple problem, it allows all of the basic concepts and functions of the development of configurations to be validated. Figure 7 shows a 4 3 4 array multiplier with 4-bit inputs X = x 3 x 2 x 1 x 0 and Y = y 3 y 2 y 1 y 0 and 8-bit outputs Z = z 7 z 6 z 5 z 4 z 3 z 2 z 1 z 0 . Figure 8 shows a task graph of the 4 3 4 array multiplier, where each logic-node has an ID. In the task graph, the logic-nodes with ID 1-8 implement two AND functions on inputs X and Y . The other logic-nodes realize a full adder function with 4-bit inputs and a carry output, a sum output. The implementation of this 4 3 4 multiplier requires 20 cells.
We construct an Embryonics consisting of a matrix of 5 3 4 cells to implement the multiplier as shown in Figure 9 , where 8 cells (in the first row and last row) with a multiplexer to select the input signals realize the AND function and 12 cells (rows 2-4) realize the full adder function. More information about the cell architecture can be found in Zhang et al. 26 Our program of placement and routing was run on MATLAB 7.0 in windows XP OS with dual-core CPU (3.00 GHz). The statistic results of routing are shown in Table 1 by executing IACA in this article and maze-running technique 13 many times. An optimized configuration of a 4 3 4 multiplier is obtained as shown in Figure 9 . Each cell has a unique title at the top of the box. The number in the title represents the corresponding logic-node in the task graph is placed to this cell. The directed lines represent the connections between the cells.
According to the result of configurations, synthesis and simulation of the 4 3 4 multiplier were carried out in Xilinx ISE Design Suite 13.2. Figure 10 shows the result of the multiplier implemented in an Embryonics. For instance, at the time of 557.699 ns, the 4-bit inputs are X = 0111 (the hexadecimal value is 7) and Y = 0011 (the hexadecimal value is 3) and the 8-bit output is Z = 00010101 (the hexadecimal value is 21). Table 2 presents a summary of hardware overhead and supply power obtained from running Xilinx ISE with Virtex-6 for the configuration both in Figure 9 and in Zhang et al. 26 
Discussion
In some task graphs, the input and output signals of some logic-nodes are more than the communication bandwidth of the cells in architecture graph. In this case, the placement and routing cannot be accomplished. Thus, these logic-nodes should be divided into several sub-nodes where the input and output signals are less than the communication bandwidth of cells. For a task graph with a large number of logic-nodes, the runtime of placement and routing would increase rapidly. In order to decrease the runtime, a complex task graph is required to be partitioned into several subgraphs. The details about partition approach can be found in Le Beux et al. 27 For a practical application, trade-off between runtime and performance is required in the development of configurations for Embryonics.
Conclusions and future work
This article presents an approach to implement the development of configurations for Embryonics based on graphic mapping. Based on metric embedding technology, the problem of configurations is formulated to BQAP, which can be viewed as a placement and routing problem with constraint of Embryonics architecture. A genetic algorithm with staged research is used to find a placement matrix. Also, an IACA provides a valid routing matrix based on the placement matrix. After this, an optimized configuration for LUT-based Embryonics can be found. A 4 3 4 multiplier has shown the effectiveness of the proposed approach. Using the proposed approach, a designer can accomplish a design of Embryonics quickly. Furthermore, our approach is also suitable for off-line self-repair of Embryonics. In extensions to this work, we plan to look into using this approach for dynamic reconfiguration of Embryonics. Table 2 . Hardware overhead and supply power. Figure 9 Zhang et al. 
Declaration of conflicting interests
