Some VLSI decompositions of the de Bruijn graph  by Dolinar, Sam et al.
Discrete Mathematics 106/107 (1992) 189-198 
North-Holland 
189 
Some VLSI decompositions of the 
de Bruijn graph* 
Sam Dolinar, Tsz-Mei Ko and Robert McEliece 
Jet Propulsion Laboratory and Department of Electrical Engineering, California Institute of 
Technology, Pasadena, CA 91125, USA 
Received 12 March 1992 
Abstract 
Dohnar, S., T.-M. Ko and R. McEliece, Some VLSI decompositions of the de Bruijn graph, 
Discrete Mathematics 106/107 (1992) 189-198. 
We define a VLSI decomposition of a directed graph G to be a collection of isomorphic 
vertex-disjoint subgraphs of G which together contain all of G’s vertices. We call the 
isomorphic subgraphs comprising the decomposition building blocks for the graph G, and we 
refer to the edges contained in the collection of subgraphs as internal edges. The efficiency of a 
VLSI decomposition of G is the fraction of the total number of edges in G which are internal 
edges. In this paper we will present a general construction for efficient VLSI decompositions 
for the family of de Bruijn graphs. Using the methods to be explained in this paper, we have 
found a 64-chip VLSI decomposition of the de Bruijn graph B,, with efficiency 0.754, which is 
being used by JPL design engineers to build a single-board Viterbi decoder for the K = 15, rate 
l/4 convolutional code which will be used on NASA’s Galileo mission. 
1. Introduction 
Let G be a directed graph. We think of G as a representation of the wiring 
diagram for an electronic circuit, with the vertices of G representing arithmetic 
processors and the edges of G representing wires connecting the processors. In 
modern VLSI technology, if the circuit is too large to fit on a single chip, it may 
be possible to build it by wiring together two or more appropriately designed 
chips. Each processor must then be placed on one of the chips, but the wires of 
the circuit may be either internal to the chips (intrachip wires) or external 
(interchip wires). F or reasons of economy and simplicity, it is desirable, though 
Correspondence to: R. McEhece, Department of Electrical Engineering, California Institute of 
Technology, Pasadena, CA 91125, USA. 
* The research described in this paper was partially carried out at the Jet Propulsion Laboratory, 
California Institute of Technology, under a contract with the National Aeronautics and Space 
Administration. Tsz-Mei Ko’s contribution was also supported by National Security Agency Grant 
No. MDA904-90-H-1007. Robert McEliece’s contribution was also supported by AFOSR grant 
91-0037 and a grant from Pacific Bell. 
0012-365X/92/$05.00 0 1992 - Elsevier Science Publishers B.V. All rights reserved 
190 S. Dolinar et al. 
not necessary, for the chips to be identical. Returning now to the graph G, we are 
thus motivated to define a VLSZ decomposition of G as a collection of isomorphic 
vertex-disjoint subgraphs of G which together contain all of G’s vertices, and a 
subset of its edges. We call the isomorphic subgraphs comprising the decomposi- 
tion building blocks for the graph G, and we refer to the edges contained in the 
collection of subgraphs as internal edges. Since internal edges are, as a rule, much 
more convenient for the circuit designer than external edges, we define the 
efficiency of a VLSI decomposition of G as the fraction of the total number of 
edges in G which are internal edges. 
In this paper we will present a general construction for efficient VLSI 
decompositions for the family of de Bruijn graphs (a formal definition of the de 
Bruijn graph B, will be given in Section 3). We focus on this class of graphs 
because they represent the circuit diagrams for fully parallel Viterbi decoders. 
Indeed, a constraint length K, rate l/n convolutional code has the de Bruijn 
graph BK_2 as the circuit diagram for its Viterbi decoder, and NASA is using a 
K = 15, rate l/4 convolutional code on the Galileo mission. As we will see, the 
methods described in this paper can be used to design a 64-chip VLSI 
decomposition of B13 with efficiency 0.754, and this chip is being used by JPL 
design engineers to build a single-board Viterbi decoder for the Galileo code. 
(Earlier, but less efficient, VLSI decompositions for de Bruijn graphs were 
presented in [l] and [2]. Those led to a 256-chip VLSI decomposition of B13 of 
efficiency 0.563, which was used to design a multi-board decoder for the Galileo 
code. ) 
As a sample of our results, consider the de Bruijn graph B, in Fig. 1. It 
contains 8 vertices and 16 edges. In Fig. 2 we see a two-chip VLSI decomposition 
of B3, in which the underlying building block contains four vertices and three 
Fig. 1. The de Bruijn graph B,. 
000 
Fig. 2. A two-chip VLSI decomposition 
of B,, with efficiency 0.375. 
Some VLSI decompositions of the de Bruijn graph 191 
edges. This decomposition has efficiency 3/8. It turns out that the building block 
in Fig. 2 is a universal building block, in the sense that it can be used to build any 
de Bruijn graph B, with n 3 3. Indeed this building block is just one of a large 
family of universal building blocks we will describe in Theorem 3.1. 
Here is a summary of the rest of the paper. In Section 2 we will introduce some 
algebraic notation and prove two simple lemmas that will be needed in the proof 
of our main theorem. In Section 3 we will give a formal definition of the de 
Bruijn graph B, and show that subgraphs of B, defined by certain rank functions 
are universal de Bruijn building blocks. In Section 4 we will use the results of 
Section 3 to exhibit a 4-chip VLSI decomposition of B5 of efficiency 0.50. Finally, 
in Section 5 we will list the most efficient universal building blocks we know of 
(including, in Fig. 8, a description of the building block which is being used on 
the one-board Viterbi decoder for the Galileo code), and state some asymptotic 
results. 
2. Algebraic preliminaries 
Let V, be the set of all n-dimensional binary vectors. We begin by defining 
three linear mappings L, R, and C (‘left’, ‘right’, and ‘center’) from V, to V,_,. 
(Technically, the mappings L, R, and C are each families of mappings, one for 
each n 32.) If x = [xi, . . . , x,] is a binary vector of length n, then 
Lx = (Xl, # . . ) Xn-*), 
fi = (x2, . . 1 , x,), 
cx = (x, +x2, . . . x,_, +x,). 
For example, if x = [lOllO], then Lx = [loll], Rx = [OllO], and Cx = [llol]. 
Lemma 2.1. The mappings L and R commute with C, i.e., CLx = LCx and 
CR\: = RCx for any binary vector x of length 23. 
Proof. By direct computation we find that if x = [xi, . . . , x,], then 
CLX = LCX = [x1 +x2, . . . ) x,-z +x,-J, 
CRx=RCx=[xz+x3,. . . ,x,_l+x,J. q 
Now we define the burst agreement B(x, y) between two binary n-vectors x and 
y as the length of the largest block of consecutive components on which x and y 
agree. For example if x = [11010010] and y = [01110001], then B(x, y) =3 
because x and y agree in positions 4, 5, and 6, but in no set of four or more 
consecutive positions. 
Lemma 2.2. Zf x and y are two n-vectors with c’x = C’y, and B(x, y) 2 r, then 
x=y. 
192 S. Dolinar et al. 
Proof. We use induction on r. For I = 1, the assertion is that if CX = Cy, and if x 
and y agree in at least one coordinate, then x and y are identical. To see that this 
is so, note that C is a linear mapping from V,, to If,_,. Its nullspace, i.e., the set 
of x’s such that Cx = 0, is the set of vectors [xi, . . . , x,] such that x1 + x2 = 
x2+x3=.* . = x,_, + X, = 0. This set contains only the two vectors [00 . . . 0] and 
[ll - * . 11. Thus if CX = Cy, then either x =y or x =y + [ll . . . 11, i.e., x and y 
differ in all n positions. It follows that if Cx = Cy and if x and y agree in at least 
one place, then x = y. This completes the proof for I = 1. 
We now assume I z 2, and that the lemma has been proved for all r’ <r. If 
B(x, y) 2 I, i.e., if x and y agree on I consecutive positions, then clearly Cx and 
Cy agree on at least Y - 1 consecutive positions. Thus if we let x’ = Cx and 
y’ = Cy, then B(x’, y’) 3 r - 1. Also, the hypothesis C’x = Cry is equivalent to 
C’-ix’ = c’-‘y’. Thus by the induction hypothesis, x’ = y’, i.e., Cx = Cy. But 
also B(x, y) 3 r 2 1, so that by the r = 1 case of the lemma, which has already 
been proved, x = y. q 
3. A construction for universal de Bruijn building blocks 
The de Bruijn graph B, can be defined as the directed graph with V, as vertex 
set, and with a directed edge from x to y if and only if Lx = Ry. (See Golomb [3, 
Sections 2.2 and 6.21 for more about the de Bruijn graph.) 
Now let p be a mapping from V, to the set (0, 1, . . . , n}. For x E V,, we call 
p(x) the rank of x. We define the p-subgraph of B,, denoted by B,(p), as 
follows. The vertex set of B,(p) is V,, and x and y are connected by a directed 
edge in B,(p) if and only if (a) x and y are connected by a directed edge in B,, 
and (b) p(y) = p(x) + 1. For example, consider the rank mapping p:V3+= 
(0, 1, 2, 3) described in Table 1. The corresponding graph B,(p) is shown in 
Fig. 3. 
It turns out, rather surprisingly, that any graph of the form B,(p) can be used 
as a building block for any de Bruijn graph BN with N zz n. For this reason we call 
Table 1 
A rank function on the vertices of 
B,. The corresponding graph B,(p) 
is shown in Fig. 3 
X P(X) x (P(X) 
ooo 3 loo 0 
001 2 101 0 
010 1 110 1 
011 2 111 1 
Some VLSI decompositions of the de Bruijn graph 193 
p=o p=l p=2 p=3 
100 010 001 000 
7 101 110 11 
111 
Fig. 3. The graph B,(p), derived from the rank function in Table 1. All edges are directed from left 
to right. 
the graph B,(P) a universal de Bruijn building block (UBB). Theorem 3.1, which 
follows, spells this out. 
Theorem 3.1. The graph B,,(p) is a UBB, i.e., it can be used to build any de 
Bruijn graph BN with N 2 n. 
Proof. We will show that B,(p) builds B,,, for all r a 1. For any X = 
[XI, X,, . . . > X,+rl E V”,,, suppose that C’X =x E V,, and p(x) = i. We define 
the r-bit chip number of X, denoted by num(X), as 
num(X) = [Xi+l, . . . , Xi+,]. (3.1) 
Note that since 0 G i 6 n, then 1 G i + 1 s i + r s n + r, so that the chip number as 
defined in (3.1) ‘fits’ within the field of X. In building B,,, with 2’ copies of B,(p) 
(‘chips’), numbered 00 * * .O to 11 . . + 1, we place vertex X on the chip numbered 
num(X), at the location corresponding to x = C’X. Lemma 2.2 shows that no two 
vertices of B,,, can be assigned the same location on the same chip, so that each 
of the 2”+’ vertices in B,,, is assigned a unique ‘home’ on one of the 2’ chips. 
What remains to show is that the connections within the chips correspond to 
connections in the big graph B,,,, i.e., that if num(X) = num(Y) and if C’X and 
C’Y are connected on B,(p), then X and Y are connected in B,,,, i.e., 
LX=RY. 
To see this, we reason as follows. Since C’X and C’Y are connected on B,(p), 
then p(c’Y) = p(C’X) + 1. Thus if p(C’X) = i, then p(C’Y) = i + 1, and so, 
since num(X) = num(Y), we have 
[X;+r, . * . ? Xi+,] = [X+2, . . . , x+,+*1. 
Thus LX and RY agree on r consecutive positions, i.e., 
B(LX, RY) 3 r. (3.2) 
194 S. Dolinar et al. 
But also, since C’X and C’Y are connected on I?,(p), we have LC’X = RC’Y, 
which, by Lemma 2.1, implies 
C’LX = CRY. (3.3) 
Combining (3.2) and (3.3), using Lemma 2.2, we find that LX = RY, which is 
what we set out to prove. Cl 
The following theorem, whose proof may be found in [5], gives a partial 
converse to Theorem 3.1. To state it, we introduce the notion of a graded 
digraph. A digraph G with vertex set V is graded of rank m if there is a rank 
function p : V + (0, 1, . . . m}, such that p(y) = p(x) + 1 if there is a directed 
edge from x to y. Thus B,(p) is a graded subgraph of B, of rank n. 
Theorem 3.2. If G is a subgraph of B, with 2” vertices, and which builds all de 
Bruijn graphs BN for N 2 N,, then G is a graded subgraph of B,. 
Combining Theorems 3.1 and 3.2, we see that a necessary condition for a 
subgraph of B, to be a UBB is that it be graded; and a sufficient condition is that 
it be graded of rank cn. Most graded subgraphs of 8, with rank exceeding n, 
seem to be UBBs; however, there is a graded subgraph of B4 of rank 5 which is 
not a UBB, so the whole story is apparently quite complicated. 
4. Example 
We illustrate the construction in Theorem 3.1, by building the graph B5 with 
four copies of the graph B,(p) in Fig. 3. We begin with Table 2, which lists, for 
each of the 32 possible 5-bit vectors X, the 3-bit vector x = C2X, and the 
corresponding rank p(x). 
We number the four copies of B,(p) 00, 01, 10 and 11. Table 2 can be used to 
find the chip number and the location within a chip of each 5-bit vector X, as 
follows. For a given X, the value x = C’X gives the location, and the two bits of 
X in positions p(x) + 1 and p(x) + 2, which are underlined in the table, give the 
chip number. For example, consider X = 11000. According to the table, x = 110, 
p(x) = 1, and the underlined bits are 10. Thus X must be placed in location 110 in 
chip number 10. The complete assignment of vertices of B5 to the four chips is 
shown in Fig. 4. 
5. The most efficient known UBBs 
In Section 3 we saw that any subgraph of B, induced by a rank n function is a 
UBB. The efficiency e, of a UBB is independent of the size of the de Bruijn 
graph which it is used to build. Indeed, if there are E, edges in B,(p) and 2’ 
copies of B,(p) are used to build B,+,, then there are a total of 2’E,, internal 
edges out of 2”+‘+’ total edges in B,,,. Thus the efficiency is 
2’E,, E,, e =----=- n 2n+r+1 n+l ’ 2 (5.1) 
chip 00 chip 01 
00101 00010 ooool oooo0 01111 10111 01011 10101 
l 
0111 F 00111 oooll 
0011 
chip 10 chip 11 
loo00 Olooo 10100 01010 11010 11101 11110 11111 
1101 7 01101 10110 
011 
4. Four copies of the graph of Fig. 3 labelled to form a subgraph of B,. This is a four-chip VLSI 
decomposition of B, of efficiency 0.50. 
195 Some VLSI decompositions of the de Bruijn graph 
Table 2 
A table for building B, from 4 copies of the graph in Fig. 3 
196 S. Dolinar et al. 
Table 3 
The most efficient known universal de Bruijn 
building blocks, for 1 s n < 8. The entries for 
n ~4 are known to be optimal. For larger 
values of n, improvements may be possible. (In 
the table E, denotes the number of edges, e, is 
the efficiency defined in (5.1), and c, is defined 
in (5.2.)) 
n E” e, Cl3 
1 0.250 1.500 
3 0.375 1.875 
8 0.500 2.000 
19 0.594 2.031 
43 0.672 1.969 
92 0.719 1.969 
193 0.754 1.969 
398 0.777 2.004 
independent of r. Table 3 lists the number of edges in the most efficient UBBs we 
have been able to find, using ad hoc methods, for 1~ n s 8.’ We have been able 
to show by exhaustive search that the entries for 1 s n G 4 are optimal, but for 
larger values of n, improvements may be possible. For n = 1, an optimal building 
block consists of a single edge. The optimal universal building block for n = 2 is 
shown in Fig. 2 (to see that the chips in Fig. 2 do indeed correspond to subgraphs 
of B2, just apply the operator C to them). An optimal UBB for n = 3 has already 
been shown in Fig. 3. The best known UBBs for n = 4 through n = 7 are shown in 
Figs. 5 through 8. In these figures, the binary strings are represented by their 
decimal equivalents. 
If e,* denotes the maximum possible efficiency of a B,(p) graph, it is natural to 
wonder about the asymptotic behavior of e,* as n-+ ~0. Thanks to Eric Schwabe 
P=Q p=l p=z p=3 P-4 
p=5 
p=1 p=z P=4 p=5 
8 
: 
10 
11 
X 
2x 
4 
12 
1: X 
Fig. 5. The most efficient B,(p) building block. 
The edge count is 19. 
X 
7 
8,/-‘6 23 
Fig. 6. The most efficient known B,(p) building block. 
The edge count is 43. 
’ The building blocks for n = 5 and n = 8 were first discovered by a clever computer search 
algorithm developed by Gordon Oliver. 
Some VLSI decompositions of the de Bruijn graph 197 
P=o p=1 p=2 p=3 p=4 P-J pd 
O\ 32 
4x 
6 
7x 
10 
14 
22 
30 
:; 
:42 
23 
31 
56 
11 
15 
53 
55 
/ 63 
Fig. 7. The most efficient known B,(p) building block. The edge count is 92. 
P=o p=l p=2 p=3 P=4 p=s p=6 p=7 
' \64 
18p2 
16\782 
:: 
56 
96 
61 -94 w 113ul20 
];;=;;6w 31 
115 121 
127/' 
j3 L-L 95 ~111~119"123 
Fig. 8. The most efficient known B,(p) building block. The edge count is 193. 
198 S. Dolinar et al. 
[4], we know quite a lot about this. If we define the quantity c,* by the formula 
(5.21 
then Schwabe has proved that 
A s lim inf c,* d lim sup c,* c B, (5.3) n--t= n--V= 
where A and B are positive constants with A > 1 and B s 8. Thus the efficiency of 
the best UBB behaves like 1 - K/n as n--, m. However, if we look at Table 3, 
where c, for the best known chips is tabulated for 1 <n 6 8, something much 
stronger seems to be true, and we end our paper with a conjecture. 
Conjecture. If c,* is as defined above, then lim,,, c,* = 2. 
References 
[l] 0. Collins, S. Dolinar, R. McEliece and F. Pollara, A VLSI decomposition of the de Bruijn 
Graph, J. Assoc. Comput. Mach., to appear. 
[2] 0. Collins, F. Pollara, S. Dolinar and J. Statman, Wiring Viterbi decoders (splitting de Bruijn 
graphs), JPL TDA Progress Report 42-96 (October-December 1988) 93-103. 
[3] S. Golomb, Shift Register Sequences, revised edition (Aegean Park Press, Laguna Hills, CA, 
1982). 
[4] E. Schwabe, Efficient embeddings and simulations for hypercubic networks, MIT Ph.D. Thesis, 
1991, MIT/LCS/TR-508. 
[5] S. Dolinar, T.-M. Ko and R. McEliece, VLSI decompositions for de Bruijn graphs, Proc. 1992 
International Symp. Circuits and Systems; 1855-1858. 
