Dohnar, S., T.-M. Ko and R. McEliece, Some VLSI decompositions of the de Bruijn graph, Discrete Mathematics 106/107 (1992) 189-198.
We define a VLSI decomposition of a directed graph G to be a collection of isomorphic vertex-disjoint subgraphs of G which together contain all of G's vertices. We call the isomorphic subgraphs comprising the decomposition building blocks for the graph G, and we refer to the edges contained in the collection of subgraphs as internal edges. The efficiency of a VLSI decomposition of G is the fraction of the total number of edges in G which are internal edges. In this paper we will present a general construction for efficient VLSI decompositions for the family of de Bruijn graphs. Using the methods to be explained in this paper, we have found a 64-chip VLSI decomposition of the de Bruijn graph B,, with efficiency 0.754, which is being used by JPL design engineers to build a single-board Viterbi decoder for the K = 15, rate l/4 convolutional code which will be used on NASA's Galileo mission.
Introduction
Let G be a directed graph. We think of G as a representation of the wiring diagram for an electronic circuit, with the vertices of G representing arithmetic processors and the edges of G representing wires connecting the processors. In modern VLSI technology, if the circuit is too large to fit on a single chip, it may be possible to build it by wiring together two or more appropriately designed chips. Each processor must then be placed on one of the chips, but the wires of the circuit may be either internal to the chips (intrachip wires) or external (interchip wires). F or reasons of economy and simplicity, it is desirable, though not necessary, for the chips to be identical. Returning now to the graph G, we are thus motivated to define a VLSZ decomposition of G as a collection of isomorphic vertex-disjoint subgraphs of G which together contain all of G's vertices, and a subset of its edges. We call the isomorphic subgraphs comprising the decomposition building blocks for the graph G, and we refer to the edges contained in the collection of subgraphs as internal edges. Since internal edges are, as a rule, much more convenient for the circuit designer than external edges, we define the efficiency of a VLSI decomposition of G as the fraction of the total number of edges in G which are internal edges.
In this paper we will present a general construction for efficient VLSI decompositions for the family of de Bruijn graphs (a formal definition of the de Bruijn graph B, will be given in Section 3). We focus on this class of graphs because they represent the circuit diagrams for fully parallel Viterbi decoders. Indeed, a constraint length K, rate l/n convolutional code has the de Bruijn graph BK_2 as the circuit diagram for its Viterbi decoder, and NASA is using a K = 15, rate l/4 convolutional code on the Galileo mission. As we will see, the methods described in this paper can be used to design a 64-chip VLSI decomposition of B13 with efficiency 0.754, and this chip is being used by JPL design engineers to build a single-board Viterbi decoder for the Galileo code. (Earlier, but less efficient, VLSI decompositions for de Bruijn graphs were presented in [l] and [2] . Those led to a 256-chip VLSI decomposition of B13 of efficiency 0.563, which was used to design a multi-board decoder for the Galileo code. )
As a sample of our results, consider the de Bruijn graph B, in Fig. 1 . It contains 8 vertices and 16 edges. In Fig. 2 we see a two-chip VLSI decomposition of B3, in which the underlying building block contains four vertices and three edges. This decomposition has efficiency 3/8. It turns out that the building block in Fig. 2 is a universal building block, in the sense that it can be used to build any de Bruijn graph B, with n 3 3. Indeed this building block is just one of a large family of universal building blocks we will describe in Theorem 3.1.
Here is a summary of the rest of the paper. In Section 2 we will introduce some algebraic notation and prove two simple lemmas that will be needed in the proof of our main theorem. In Section 3 we will give a formal definition of the de Bruijn graph B, and show that subgraphs of B, defined by certain rank functions are universal de Bruijn building blocks. In Section 4 we will use the results of Section 3 to exhibit a 4-chip VLSI decomposition of B5 of efficiency 0.50. Finally, in Section 5 we will list the most efficient universal building blocks we know of (including, in Fig. 8 , a description of the building block which is being used on the one-board Viterbi decoder for the Galileo code), and state some asymptotic results.
Algebraic preliminaries
Let V, be the set of all n-dimensional binary vectors. We begin by defining three linear mappings L, R, and C ('left', 'right', and 'center') from V, to V,_,. 
Zf x and y are two n-vectors with c'x = C'y, and B(x, y) 2 r, then x=y.
Proof. We use induction on r. For I = 1, the assertion is that if CX = Cy, and if x and y agree in at least one coordinate, then x and y are identical. To see that this is so, note that C is a linear mapping from V,, to If,_,. Its nullspace, i.e., the set of x's such that Cx = 0, is the set of vectors [xi, . . . , Table 1 . The corresponding graph B,(p) is shown in Fig. 3 .
It turns out, rather surprisingly, that any graph of the form B,(p) can be used as a building block for any de Bruijn graph BN with N zz n. For this reason we call 111 Fig. 3. The graph B,(p) , derived from the rank function in The graph B,,(p) is a UBB, i.e., it can be used to build any de Bruijn graph BN with N 2 n.
Proof. We will show that B,(p) builds B,,, for all r a 1. For any X = [XI, X,, . . . > X,+rl E V",,, suppose that C'X =x E V,, and p(x) = i. We define the r-bit chip number of X, denoted by num(X), as num(X) = [Xi+l, . . . , Xi+,].
(3.1)
Note that since 0 G i 6 n, then 1 G i + 1 s i + r s n + r, so that the chip number as defined in (3.1) 'fits' within the field of X. In building B,,, with 2' copies of B,(p) ('chips'), numbered 00 * * .O to 11 . . + 1, we place vertex X on the chip numbered num(X), at the location corresponding to x = C'X. Lemma 2.2 shows that no two vertices of B,,, can be assigned the same location on the same chip, so that each of the 2"+' vertices in B,,, is assigned a unique 'home' on one of the 2' chips. What remains to show is that the connections within the chips correspond to connections in the big graph B,,,, i.e., that if num(X) = num(Y) and if C'X and C' Y are connected on B,(p) , then X and Y are connected in B,,,, i.e.,
LX=RY.
To see this, we reason as follows. Thus LX and RY agree on r consecutive positions, i.e.,
B(LX, RY) 3 r. (3.2)
But also, since C'X and C'Y are connected on I?,(p), we have LC'X = RC'Y, which, by Lemma 2.1, implies
Combining (3.2) and (3.3), using Lemma 2.2, we find that LX = RY, which is what we set out to prove. Cl
The following theorem, whose proof may be found in [5] , gives a partial converse to Theorem 3.1. To state it, we introduce the notion of a graded digraph. A digraph G with vertex set V is graded of rank m if there is a rank function p : V + (0, 1, . . . m}, such that p(y) = p(x) + 1 if there is a directed edge from x to y. Thus B,(p) is a graded subgraph of B, of rank n. Theorem BN for N 2 N,, then G is a graded subgraph of B,. Combining Theorems 3.1 and 3.2, we see that a necessary condition for a subgraph of B, to be a UBB is that it be graded; and a sufficient condition is that it be graded of rank cn. Most graded subgraphs of 8, with rank exceeding n, seem to be UBBs; however, there is a graded subgraph of B4 of rank 5 which is not a UBB, so the whole story is apparently quite complicated.
If G is a subgraph of B, with 2" vertices, and which builds all de

Bruijn graphs
Example
We illustrate the construction in Theorem 3.1, by building the graph B5 with four copies of the graph B,(p) in Fig. 3 . We begin with Table 2 , which lists, for each of the 32 possible 5-bit vectors X, the 3-bit vector x = C2X, and the corresponding rank p(x).
We number the four copies of B,(p) 00, 01, 10 and 11. Table 2 can be used to find the chip number and the location within a chip of each 5-bit vector X, as follows. For a given X, the value x = C'X gives the location, and the two bits of X in positions p(x) + 1 and p(x) + 2, which are underlined in the table, give the chip number. For example, consider X = 11000. According to the table, x = 110, p(x) = 1, and the underlined bits are 10. Thus X must be placed in location 110 in chip number 10. The complete assignment of vertices of B5 to the four chips is shown in Fig. 4 .
The most efficient known UBBs
In Section 3 we saw that any subgraph of B, induced by a rank n function is a UBB. The efficiency e, of a UBB is independent of the size of the de Bruijn Some VLSI decompositions of the de Bruijn graph Table 3 The most efficient known universal de Bruijn building blocks, for 1 s n < 8. The entries for n ~4 are known to be optimal. For larger values of n, improvements may be possible. (In the table E, denotes the number of edges, e, is the efficiency defined in (5.1 independent of r. Table 3 lists the number of edges in the most efficient UBBs we have been able to find, using ad hoc methods, for 1~ n s 8. ' We have been able to show by exhaustive search that the entries for 1 s n G 4 are optimal, but for larger values of n, improvements may be possible. For n = 1, an optimal building block consists of a single edge. The optimal universal building block for n = 2 is shown in Fig. 2 (to see that the chips in Fig. 2 do indeed correspond to subgraphs of B2, just apply the operator C to them). An optimal UBB for n = 3 has already been shown in Fig. 3 . The best known UBBs for n = 4 through n = 7 are shown in ' The building blocks for n = 5 and n = 8 were first discovered by a clever computer search algorithm developed by Gordon Oliver. Table 3 , where c, for the best known chips is tabulated for 1 <n 6 8, something much stronger seems to be true, and we end our paper with a conjecture.
Conjecture. If c,* is as defined above, then lim,,, c,* = 2.
