Improved Subset Generation For The MU-Decoder by Agarwal, Utsav
Louisiana State University
LSU Digital Commons
LSU Master's Theses Graduate School
2017
Improved Subset Generation For The MU-
Decoder
Utsav Agarwal
Louisiana State University and Agricultural and Mechanical College, utsav.agarwal444@gmail.com
Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_theses
Part of the Electrical and Computer Engineering Commons
This Thesis is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU
Master's Theses by an authorized graduate school editor of LSU Digital Commons. For more information, please contact gradetd@lsu.edu.
Recommended Citation
Agarwal, Utsav, "Improved Subset Generation For The MU-Decoder" (2017). LSU Master's Theses. 4395.
https://digitalcommons.lsu.edu/gradschool_theses/4395
IMPROVED SUBSET GENERATION FOR THE MU-DECODER
A Thesis
Submitted to the Graduate Faculty of the
Louisiana State University and
Agricultural and Mechanical College
in partial fulfillment of the
requirements for the degree of





B.Tech, West Bengal University of Technology, 2012
May 2017
Acknowledgments
I would like to thank my advisor Dr. Ramachandran Vaidyanathan, for his patient
guidance and constant support for not only my research but also in making me a better
person. I am indeed blessed to research under such a wise pundit. I would also like to
thank the committee members Dr. Jerry Trahan and Dr. Konstantin (Costas) Busch for
helping me develop the thesis better. I would like to thank the circle of my strength and
pride, my family, which includes my parents, brother, grandparents, uncles, aunts and my
cousins. I would like to thank my family of friends who made me feel home and supported me




Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Chapter
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Problem Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 MU-Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 MU-Decoder Structure and Ordered Partitions . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Properties of MU-Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Totally-Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.1 Totally-Ordered Sets in the Boolean Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2 Totally-Ordered Source and Output Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.3 Canonical Form of Source Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4 Generating a Given Totally Ordered Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.1 Generating a Large Totally-Ordered Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.2 Generating a Set of Non-isomorphic Totally Ordered Sets . . . . . . . . . . . . . . . 22
5 Hardware Enhancement for Partition Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.1 Ordered Partition Translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Generic Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1 Traversing the Boolean Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
6.1.1 Single Total Order S . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.1 Work Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
iii
List of Tables
1.1 Results in this thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5.1 Block Number of a ∈ B0m over translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
iv
List of Figures
1.1 An Illustration of Frame Granularity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Boolean Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Totally-Ordered Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 Totally Ordered Subset Restriction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 x-to-n MU-Decoder MD(x, z, y, n) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.1 Boolean Lattice G4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.1 Current Selector Module . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
5.2 C-uniform translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
5.3 Address Generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.4 Selector Module Hardware Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
6.1 2d spaced totally-ordered subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6.2 XY Totally-Ordered subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
7.1 Multiple Totally-Ordered paths produced from one Totally-Ordered set
of subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
v
Abstract
The MU-Decoder is a hardware subset generator that finds use in partial reconfiguration
of FPGAs and in numerous other applications. It is capable of generating a set S of subsets
of a large set Zn with n elements. If the subsets in S satisfy the “isomorphic totally-
ordered property,” then the MU-Decoder works very efficiently to produce a set of u subsets
in O(log n) time and Θ(n
√
u log n) gate cost. In contrast, a naive approach requires Θ(un)
gate cost. We show that this low cost for the MU-Decoder can be achieved without the
isomorphism constraint, thereby allowing S to include a much wider range of subsets. We
also show that if additional constraints on the relative sizes of the subsets in S can be placed,
then u subsets can be generated with Θ(n
√
u) cost. This uses a new hardware enhancement
proposed in this thesis. Finally, we show that by properly selecting S and by using some












Many modern applications utilize Field Programmable Gate Arrays (FPGAs) [1, 35],
which includes intelligent systems [3, 23, 24], scientific applications [8, 13], defense and
aerospace systems [19, 25, 27, 34], communication and signal processing [6, 14, 22, 26, 32],
instrumentation [10,30], finance [33].
Partial Reconfiguration [4,5,7,9,18,20,31] (PR) is an important feature of reconfigurable
computing that allows a portion of the configuration fabric to be reconfigured at runtime.
The feature allows for an efficient use of the chip’s real estate. However PR needs to be
quick (to be useful at real time). The unit of reconfiguration is typically called a “frame.”
A frame may contain hundreds or thousands of reconfigurable bits (the entire chip could
contain millions). If frames are too large (and granular), a larger portion of the configurable
fabric may have to be reconfigured.
Example 1.1. We consider the illustrative example taken from Ashrafi [2] and Jordan [17].
Figure 1.1 shows us how frame granularity affects the partial configuration of a chip. The
blue colored regions represent the area that needs to be reconfigured and the yellow colored
regions represent the extra part of the frame that needs to be reconfigured as well during
partial configuration.
This points toward using small frames. However, to configure a frame, it must first be
selected and then a data path established to it, before the configuration bits can be input
to the frame. Conventional ways to select a frame come down to the use of a 1-hot decoder,
that selects one frame at a time (or a 1-element subset of the set of all frames). If we use
the illustration of Example 1.1 to make frames small, then there will be many frames to
reconfigure and many iterations of using the 1-hot decoder; essentially a time-consuming
exercise. The MU-Decoder [16] shows how a subset (of more than 1 frame) can be selected.
The scan-path architecture [2] shows how the configuration bits can be delivered to the
1
(a)PR area: 15% (b)Conf area: 75% (6/8 frames)
(c)Conf area: 100% (all 8 frames) (d)Conf area: 39% (7/18 frames)
(e)Conf area: 24% (17/72 frames)
Figure 1.1: An Illustration of Frame Granularity (a) shows the location of the part needing
reconfiguration. It occupies only 15% of the total area. Part (b)-(e) show the impact
of various frame size and shapes. For example, in part (c), 6 of the 8 “tall” frames are
needed to cover the blue area. So while the partial reconfiguration (PR) area is 15%, the
configuration area is 6/8 or 75%. The square frames in part(d) fare worse, requiring a total
reconfiguration. With smaller frames, the configuration area reduces.
2
multiple selected frames. The MU-Decoder is, however, efficient only for certain types of
subsets. In this thesis, we extend the range of subsets for which the MU-Decoder works. We
also construct a framework for generating arbitrary subsets on the MU-Decoder.
The core problem addressed is that of generating a subset. While this is useful in
FPGA partial reconfiguration, the application of subset generation is much wider, including
wireless and heterogeneous networks [11], vehicle control units [31], database servers [5],
image compression [29], bioinformatics [15] and much more.
1.1 Problem Definition
Let Zn = {0, 1, · · · , n− 1}. The goal is to generate a set S = {Si : 0 ≤ i < u, Si ⊆ Zn}.
In this thesis, we focus on a “totally-ordered” set S where S0 ⊂ S1 ⊂ · · · ⊂ Su−1. The
2n subsets of Zn can be viewed on a Boolean Lattice (Haase diagram) [28] as shown in
Figure 1.2(b) (an experiment example where n = 4 is in Figure 3.1 on page 14). In this
lattice a totally-ordered set is a set of points on a path from ∅ to Zn as shown in Figure 1.2(b)
with S0, S1, · · · , Sw−1 ∈ S, which is totally-ordered.
It has been shown [17] that S can be produced using an MU-Decoder with a delay of
O(log n) and gate-cost of O(n
√
u log u). Then Jordan and Vaidyanathan [16] extend this to
“isomorphic” totally-ordered sets (see Figure 1.3(a)) S0, S1, · · · , Sv−1 with Sm = {Smi : 0 ≤
i < w and Smi ⊆ Zn}. Here combinations of the subsets of any two Sm, Sm′ are in one-to-one
correspondence. That is, if Sm contains subsets of size a0, a1, · · · , aw−1 then so does Sm′ .
The cost of producing S0, S1, · · · , Sv−1 is O((v + w)n logw). If the uw subsets were to be
produced naively by a simple (uw) x n look-up table (LUT) its cost would be O(uwn). So
the cost of producing the subsets on the MU-decoder is substantially smaller than that of
a brute force “LUT Decoder” [17] method. While the LUT-Decoder can produce arbitrary
subsets expensively, the MU-Decoder can produce only a certain type of subsets (for example,
isomorphic totally-ordered subsets) inexpensively. In a way, the work of this thesis is to
extend the type of subsets that the MU-Decoder can produce effectively. We show that the


















subsets of size ` subsets S = {S0, S1, · · · , Sw−1}
Figure 1.2: Boolean Lattice
before) but without the restriction of isomorphism on the totally-ordered sets S0, S1, · · · , Sv−1
(see Figure 1.3(b)). To implement the subset of the type in Figure 1.3(b), the method of
Jordan and Vaidyanathan we will need a MU-Decoder of cost O(vwn log(uv)). Considering
that v and w could be quite large (in hundreds) the difference in cost could be significant (a
factor of tens). Next, we develop properties of totally-ordered sets that allow us to increase
the efficiency of the MU-Decoder. For the set S0, S1, · · · , Sv−1, each with w subsets, the
cost can be reduced further to Θ(n(v + w)). This is done through a hardware enhancement
of the MU-Decoder that does not change its gate-cost significantly. However, it adds some
additional conditions on the sets S0, S1, · · · , Sv−1. These conditions, while not as strict as
isomorphism, are stricter than just total order. They put constraints on the gaps in the path
representing totally-ordered sets (Figure 1.4). The shaded circles in the figure represent the
original (generator) subsets of each totally-ordered set and the unshaded circles represent






(a)Isomorphic Totally-Ordered Sets (b)Non-Isomorphic Totally-Ordered Sets
Figure 1.3: Totally-Ordered Sets
on the generator subsets is that the smallest of these subsets must have a minimum size.
The subsets generated have the condition of being equally spaced in the Boolean Lattice
(of equal Hamming Distance [12]) from their corresponding generator subset. The above
approach allows us to pick the “best” sets of subsets. S0, S1, · · · , Sv−1 such that they are
strategically placed close (on the Boolean Lattice) to the subsets that we wish to generate.





z2 log z subsets in O(d log n) time
using a MU-Decoder of gate cost O(zn log z) and delay O(log n). This is a substantial
expansion of the MU-Decoder range for efficient operation. In fact, the method used here
can also be ported to the LUT Decoder. However, the cost of the LUT decoder will still be
O(z2n log z).
Chapter 2 presents the preliminary concepts used throughout the thesis.





Figure 1.4: Totally Ordered Subset Restriction
In Chapter 4, we prove that isomorphism is not required to produce disjoint output sets
which are individually totally-ordered.
In Chapter 5, we discuss hardware enhancements for the MU-Decoder, which increase
its productivity by accommodating the concept of translation (introduced in this thesis) into
the hardware.
In Chapter 6, we use a generic set of subsets and cover an additional d distance on the





factor increase in subsets in O(d log n) time.
Finally, in Chapter 7, we summarize our findings and discuss the future scope of this




Constraint Cost Subsets Equivalent
LUT cost
Ref.








2 MD(dlog(z − 1)e,




Θ(zn log z) z2 nz2 [16]
3 MD(dlog(z − 1)e,
dlog ze, z, n)
Non-
Isomorphic
Θ(zn log z) z2 nz2 [this
the-
sis]
4 MD*(dlog(z − 1)e,
dlog(z log z)e, z, n)
Translated
Partitions
O(zn log z) z2 log z nz2 log z [this
the-
sis]
5 MD*(dlog(z − 1)e,
dlog(z log z)e, z, n)
Generic
Subsets+













* This MU-Decoder uses a hardware enhancement proposed in this thesis.
+ Subsets with Hamming distance d from total order with O(d log n) delay




In this chapter, we describe the MU-Decoder and some of its relevant properties. We
start the chapter by defining the standard form of a subset when represented as a string of
bits, also called the characteristic representation of a subset. We start the first section with
a description of the MU-Decoder, along with a diagram and all the terms associated with
the same as described by Jordan and Vaidyanathan [16]. We define input and output words
and the selector address along with the fact that we use ordered partition in this thesis and
not regular partitions to represent the selector module. We describe the casting of a source
word into an ordered partition, to produce the output word with the help of an indicator
set. We show how we group these words into sets of subsets, moreover even after grouping
how the casting of source words to set of ordered partitions still holds, and support it with
an example.
We move on to the next section to describe the properties of the MU-Decoder. Since
all these properties are discussed in the earlier work, we define them and then cite each of
them for proper referencing. In this section, we discuss the gate cost, time delay and subsets
produced by an MU-Decoder. We move on to define totally-ordered sets along with the
property of isomorphism. Before we proceed we define the characteristic representation of a
set.
Definition 2.1. Let Zv = {0, 1, · · · , v−1} be a v-element set. Every subset S ⊆ Zv can be
represented in hardware as a binary string 〈s0, s1, · · · , sv−1〉 where si = 1 if and only if i ∈ S .
This representation of a subset is called a characteristic representation of a subset. We will,
in general, not distinguish a subset from its characteristic representation. Conversely, every
v-bit binary string represents a subset of Zv.
In this thesis, we will use subsets of different Zv’s and sets of subsets of these Zv’s. In
general we will use the term “subset” to refer to S ⊆ Zv and the term “set” to refer to
8
S = {S0, S1, · · · }, where Si ∈ Zv, or S ∈ P(Zv).
2.1 MU-Decoder Structure and Ordered Partitions
The MU-Decoder was proposed by Jordan and Vaidyanathan [16] [17]. It allows for
efficient generation of multiple subsets of Zn. Figure 2.1 shows the general structure of an






















log z selector word
2y × n log z
selector module
Figure 2.1: x-to-n MU-Decoder MD(x, z, y, n)
x+ y external input bits, z internal signals and n output bits. Functionally, the x-bit input
word selects one of 2x = X locations in the LUT and outputs the corresponding z-bit source
word. This source word and a separate y-bit selector address are input to the Mapping Unit
where they are converted into an n-bit output word representing a subset S ⊆ Zn. The
9
Mapping Unit multicasts the z-bit source word to the various output word bits based on an
ordered partition selected by the y-bit selector address. Before we proceed it is important
to understand the role of the source word and an ordered partition to generate an output
word.
Definition 2.2. An ordered z-partition π of Zn is a list
〈
B0, B1, · · · , Bz−1
〉
pairwise disjoint




The difference between an ordered partition and a conventional partition is (a) the blocks
Bi of an ordered partition can be empty and (b) the blocks are ordered. In this thesis, all
ordered partitions have z-blocks (where z is the source word size). We will use the term
ordered partition to mean an ordered z-partition. Let L = 〈L(i) : 0 ≤ i < z〉 be a source
word and let π = 〈Bi : 0 ≤ i < z〉 be an ordered partition.
Definition 2.3. For any Boolean variable (condition) ν any set S defines the indicator set:
[1(S, ν)] =

S, if ν = 1
∅, if ν = 0
We are now in a position to define how a source word combines with an ordered parti-
tion to produce subset S .




(where S ⊆ Zn). It is easy to say that for all a ∈ Zn, a ∈ S if and only if there exists i such
that a ∈ Bi and L(i) = 1.
In the definition above, we remind the reader that there always exists a unique block Bi
to which a belongs. The question is simply that of the block number and the corresponding
source word bit.
10
Example 2.1. For n = 8, let source words L0 = 〈0111〉 and L1 = 〈0011〉, and let ordered
partitions π0 = 〈{0, 2, 4, 6}, {1, 5}, {3}, {7}〉 and π1 = 〈{0, 1, 2, 3}, {4, 5}, {6}, {7}〉. Then
from definition 2.4 we have L0◦π0 = S00 =
z−1⋃
i=0
[1(B0i , L0(i))] = [1(B
0
0 , L0(0))]∪[1(B01 , L0(1))]∪
[1(B02 , L0(2))] ∪ [1(B03 , L0(3))] = ∅ ∪ B01 ∪ B02 ∪ B03 = 〈0, 1, 0, 1, 0, 1, 0, 1〉. Similarly we get
L1 ◦ π0 = S01 = 〈0, 0, 0, 1, 0, 0, 0, 1〉, L0 ◦ π1 = S10 = 〈0, 0, 0, 0, 1, 1, 1, 1〉 and L1 ◦ π1 = S11 =
〈0, 0, 0, 0, 0, 0, 1, 1〉
In general the LUT of Figure 2.1 contains the X = 2x source words L0, L1, · · ·LX−1. The
source set is L = {Li : 0 ≤ i < X}. Similarly the selector module contains the representation
of at least 2y = Y separate ordered partitions. Let Π = {πj : 0 ≤ j < Y } denote the set of
ordered partitions. Then we will use notation L ◦ π = S and L ◦Π =
Y−1⋃
j=0
L ◦ πj = Sj = S to
indicate set of subsets of Zn.
For example 2.1 we can say L = {L0, L1} and Π = {π0, π1}, hence the set of subsets
S0 = {S00 , S01} = L ◦ π0, S1 = {S10 , S11} = L ◦ π1. Hence S = S0 ∪ S1 = L ◦ Π
Observe that if the characteristic representation of S = 〈S(a) : 0 ≤ a < n〉 then S(a) = 1
if and only if a ∈ Bi and L(i) = 1. To implement this consider a hardwiring of the blocks
of π as shown in Figure 2.1, then a cast of L into π is simply multicasting L through this
hardwiring. The Mapping Unit implements this multicast by configuring through the selector
word each of its n z-to-1 MUXes. It has been shown in theorem 2.1 that MD(x, z, y, n) has
a cost and delay.
2.2 Properties of MU-Decoder
In this section, we list some relevant properties of the MU-Decoder. These are all from
Jordan and Vaidyanathan [16]
Theorem 2.1. An MU-Decoder MD(x, z, y, n) as shown in Figure 2.1 has
Gate Cost=O(2x(x+ z) + n log z(z + 2y)), Delay=O(x+ log z + y + log n)
producing min{2x, 2yblog zc} independent subsets or a many as O(2x+y) subsets.
Now we define a totally-ordered set of subsets.
11
Definition 2.5. A set S = {Si : 0 ≤ i < u} is totally ordered if and only if there exists an
ordering (permutation) of the indices of Zn such that for all 0 ≤ i < u−1, Sf(i) ⊂ Sf(i+1)
Later in Section 3.3 (see page 17) we show that we can assume without loss of generality
that f(i) = i; hence S0 ⊂ S1 ⊂ · · ·Sn−1 as stated by Jordan and Vaidyanathan [16].
Definition 2.6. Let S0, S1, · · · , Sv−1 be totally-ordered sets. These form an isomorphic set of
totally-ordered sets if and only if these subset’s cardinalities are in one-to-one correspondence.
That is if S0 has elements of cardinality a0 < a1 < · · · < an−1 then so do all Si’s.
Figure 1.3(a) (see page 5), illustrates an isomorphic totally-ordered set. We note that each
line represents a totally-ordered set, and these lines contain 4 subsets each. A subset on one
totally-ordered set shares the level with a subset from each of the other totally-ordered sets.
From Figure 1.2(a) we know that at level ` all subsets are of size `. Hence the subsets are




In this chapter, we discuss the properties of “totally-ordered sets” as they relate to
the MU-Decoder. Jordan and Vaidyanathan [16] showed that totally-ordered sets have an
efficient implementation on the MU-Decoder. In this thesis, we expand the scope of this
observation by extending the range of sets for which the MU-Decoder is efficient. In this
chapter, we formally define totally-ordered sets and derive some properties of the source set
(see definition 2.5 on page 12) needed for producing totally-ordered sets. We end the chapter
with the definition of a “canonical form” of source words (section 3.3). Then we justify one
assumption that the source words are represented in canonical form as just a convenience
for studying totally-ordered sets.
We begin with a definition of a totally-ordered set (of subsets of Zn), taken from Jordan and
Vaidyanathan [16].
Definition 3.1. A set S = {S0, S1, · · · , Su−1} ⊆ P(Zn) is totally-ordered if and only if
there exists an ordering of elements of S, such that S0 ⊂ S1 ⊂ · · · ⊂ Su−1.
Example 3.1. Let n = 8, u = 5. Then with S0 = {0, 6}, S1 = {0, 1, 6}, S2 = {0, 1, 5, 6},
S3 = {0, 1, 5, 6, 7} and S4 = {0, 1, 2, 5, 6, 7}, the set S = {S0, S1, · · · , S4} ⊆ P(8) is totally-
ordered, with S0 ⊂ S1 ⊂ S2 ⊂ S3 ⊂ S4 . In terms of the characteristic string of a set we have
S0 = 〈10000010〉, S1 = 〈11000010〉, S2 = 〈11000110〉, S3 = 〈11000111〉 and S4 = 〈11100111〉.
Recall that
−→
S is a characteristic string of set S .
3.1 Totally-Ordered Sets in the Boolean Lattice
For a n element set Zn, its power set P(Zn) (with 2n elements) can be represented as a
boolean lattice [28]. The most common expression of this lattice is as a Hasse diagram [28].
This is a graph Gn with 2n nodes (one per element of P(Zn)). Two nodes Si, Sj with
characteristic strings σi, σj have an edge between them if and only if σi, σj differ in exactly
one bit, that is Si, Sj are different by exactly one element of Zn. Typically nodes of Gn are
13






of Zn) with ` elements. Therefore, the empty set ∅ is the only one at level 0 and the set Zn
is the only one at level n. Figure 3.1 shows the representation of Boolean Lattice G4 with 24
















〈0011〉 〈0101〉 〈0110〉 〈1010〉 〈1001〉 〈1100〉
Figure 3.1: Boolean Lattice G4
of size `. Observe that if there is an edge in Gn between Si and Sj then they must be in
adjacent levels, say ` and ` + 1. Without loss of generality let Si have ` elements and Sj
have ` + 1 elements. Then Sj = Si ∪ {a}, where a the only element of Sj that is not in Si.
Therefore, Si ⊂ Sj. We view Gn as a directed (acyclic) graph where each undirected edge
between Si and Sj with Si ⊂ Sj is viewed as directed edge from Si to Sj. In this view, all the
edges of Figure 3.1 are directed towards the top. The following observation is now optional.
14
Lemma 3.1. If S0 ⊂ S1 ⊂ · · · ⊂ Su−1 then the totally-ordered set S = {S0, S1, · · · , Su−1}
corresponds to a path in Gn traversing S0, S1, · · · , Su−1 in that order.
In Definition 2.4 (see page 2.4) we described how a source set L and a set of ordered
partition s Π produce an output set S = L ◦ π. Now in Section 3.2 we further study the
relationship between the source set L and the output set S, given that one of them is a
totally-ordered set. In Section 3.3 we first define a canonical form for the source words in
a totally-ordered source set. Then we show that every totally-ordered source set can be
assumed to be in canonical form. This representation makes many of the proofs starting
from Chapter 4 more concise.
3.2 Totally-Ordered Source and Output Sets
Let L = {L0, L1, · · · , LX−1} be a set of z bit source words and let π =
〈
B0, B1, · · · , Bz−1
〉
be an ordered z ordered partition of Zn. Let S = L ◦ π = {S0, S1, · · · , Su−1} be a set of u
subsets of Zn, where u ≤ X.
Lemma 3.2. If L is a totally-ordered set then S = L ◦ π is a totally-ordered set.
Proof. We proceed in the contrapositive direction. Suppose S is not a totally-ordered set.
This implies that there exist different subsets Si, Sj ∈ S such that Si 6⊂ Sj and Sj 6⊂ Si.
This further implies that there exist elements a, b ∈ Zn such that a ∈ Si, a 6∈ Sj and b ∈ Sj,
b 6∈ Si. For blocks Bq, Br of ordered partition π, let a ∈ Bq ∈ π and b ∈ Br ∈ π; observe
that q 6= r, otherwise a, b ∈ Bq ∈ π and a ∈ Si if and only if b ∈ Si. For some source words
Li, Lj ∈ L, let Si = Li ◦ π, and Sj = Lj ◦ π. Now a ∈ Si, and b 6∈ Si implies that Li(q) = 1
and Li(r) = 0 (see definition 2.4 on page 10) or q ∈ Li and r 6∈ Li. Similarly b ∈ Sj, and
a 6∈ Sj implies that Lj(r) = 1 and Lj(q) = 0 (chapter 2) or r ∈ Lj and q 6∈ Lj. Thus Li 6⊆ Lj
and Lj 6⊆ Li, implying that L is also not a totally-ordered set.
Example 3.2. Let Si = {Si0, Si1, · · · , SiX−1} not be a totally-ordered set, and Si0 =
15
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
A B C D E F G H I J K L M N O
Si1 =
0 1 1 0 1 0 1 0 1 0 1 0 1 0 1
A B C D E F G H I J K L M N O
We can say from the above values that A ∈ Si0, A 6∈ Si1, similarly B 6∈ Si0, B ∈ Si1. Now
assuming the output set is a result of the following computation L◦πi = Si (definition 2.4), let
the ordered partition πi = {{B}, {A}, {C,E,G, I,K,M,O}, {D,F,H, J, L,N}}, and hence
the corresponding L = {L0, L1, · · · , LX−1},L0 = 0110 and L1 = 1010 where L0 ◦πi = Si0 and
L1 ◦ πi = Si1. Hence L0 6⊆ L1 and L1 6⊆ L0 implying L is not a totally-ordered set.
We now proceed to a similar result in the opposite direction.
Lemma 3.3. Let L ◦ π = S, let π be an ordered partition with no empty blocks. If S is a
totally-ordered set then L is a totally-ordered set.
Proof. Again we proceed in the contrapositive direction. Suppose that L is not totally-
ordered. This implies that there exist distinct source words Li, Lj ∈ L such that Li 6⊂ Lj
and Lj 6⊂ Li. This further implies that there exist elements q, r ∈ Zz such that q ∈ Li, q 6∈ Lj
and r ∈ Lj, r 6∈ Li. We can further say that Li(q) = 1, Li(r) = 0 and Lj(r) = 1, Lj(q) = 0.
Let a ∈ Bq ∈ π and b ∈ Br ∈ π. (The existence of a, b is assured as Bq, Br 6= ∅.) This
further implies that a ∈ Si = Li ◦ π, a 6∈ Sj = Lj ◦ π and b ∈ Sj = Lj ◦ π, b 6∈ Si = Li ◦ π.
Hence we can conclude that S is not a totally-ordered set.
Example 3.3. Let L = {L0, L1, · · · , LX−1},L0 = 0110 and L1 = 1010 where L0 ◦ πi = Si0
and L1 ◦ πi = Si1. Now L0(2) = 1, L0(3) = 0 and L1(3) = 1, L1(2) = 0. Let πi =
16
{{B}, {A}, {C,E,G, I,K,M,O}, {D,F,H, J, L,N}}. Assuming Si = {Si0, Si1, · · · , SiX−1}
the corresponding Si0 =
1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
A B C D E F G H I J K L M N O
Si1 =
0 1 1 0 1 0 1 0 1 0 1 0 1 0 1
A B C D E F G H I J K L M N O
Hence Si0 6⊆ Si1 and Si1 6⊆ Si0 implying Si is not a totally-ordered Set.
3.3 Canonical Form of Source Set
We first define a reverse sorted string.
Definition 3.2. A binary string 〈a0, a1, · · · , am−1〉 is reverse sorted if and only if every 1
in the string precedes any 0.
Example 3.4. For m = 8,
−→
L1 = 〈11100000〉 is reverse sorted whereas
−→
L2 = 〈10110000〉 and
−→
L3 = 〈0000011〉 are not.
Recall from Definition 2.1 (see page 8) that the characteristic string of a set L ⊆ Zz
depends on the characteristic order −→c assumed on Zz.
Definition 3.3. Let L ⊆ P(Zz), and let −→c = 〈c0, c1, · · · , cz−1〉 be a characteristic order of
Zz. The characteristic order −→c is in canonical form if and only if the characteristic string
−→
L is reverse sorted for every L ∈ L.
Example 3.5. For z = 8 and L = {L0, L1}, let L0 = {1, 3, 5} and L1 = {1, 2, 3, 4, 5}.
Consider the characteristic order −→c1 = 〈0, 1, 2, 3, 4, 5, 6, 7〉, −→c2 = 〈6, 1, 2, 3, 4, 5, 0, 7〉, −→c3 =
17
〈1, 3, 5, 2, 4, 0, 6, 7〉 and −→c4 = 〈5, 1, 3, 4, 2, 6, 0, 7〉. Under −→c1 we have L0 = 〈01010100〉 and
L1 = 〈01111100〉. Now characteristic order −→c1 gives the exact same characteristic represen-
tation of L0, L1. For characteristic order
−→c2 we have rearranged the positions of elements of
Zn. However the characteristic string of L0 and L1 are still the same as under −→c1 . Now for
−→c3 we see that the positions are rearranged in such a way that the resulting characteristic
string produces L0 = 〈11100000〉 and L1 = 〈11111000〉. Here all 1’s precede all the 0’s,
which is the reverse sorted order. For −→c4 , we see that the characteristic strings of L0 and L1
are the same as under −→c3 and hence reverse sorted order. Thus −→c3 and −→c4 are in canonical
form whereas −→c1 and −→c2 are not. We note −→c3 6= −→c4 .
We now show that there is no loss of generality in assuming a canonical order for the
source set L.
Let L = {L0, L1, · · · , LX−1} be totally-ordered with L0 ⊂ L1 ⊂ · · · ⊂ LX−1. We now
construct a characteristic order −→c = 〈c0, c1, · · · , cz−1〉 as described below. Let |L0| = m0,
|Li − Li−1| = mi for 0 < i < X and let ni = m0 + m1 + · · · + mi = |Li|. In constructing
−→c we enumerate the m0 elements of L0 first; that is L0 = {c0, c1, · · · , cm0−1}. The relative
order of these m0 elements is irrelevant. Next for 0 < i < X, we assign the mi elements of
Li − Li−1 to elements of {cni−1 , cni−1+1, · · · , cni−1}. We will call a characteristic order −→c a
standard characteristic order.
Theorem 3.1. Every standard characteristic order is canonical for totally-ordered L
Proof. Without loss of generality let L = {L0, L1, · · · , LX−1} with L0 ⊂ L1 ⊂ · · · ⊂ LX−1
and |Li| = ni, for 0 ≤ i < X. Let −→c = 〈c0, c1, · · · , cz−1〉 be a standard characteristic order.
From our construction we can say as follows:
−→c =
〈
c0, c1, · · · , cn0−1︸ ︷︷ ︸
L0
, cn0 , cn0+1, · · · , cn1−1︸ ︷︷ ︸
L1−L0
, · · · , cni−1 , cni−1+1, · · · , cni−1︸ ︷︷ ︸
Li−Li−1
, · · ·





Hence the representation of Li under
−→c has the following form:
−→c =
〈
1, 1, · · · , 1︸ ︷︷ ︸
L0
, 1, 1, · · · , 1︸ ︷︷ ︸
L1−L0
, · · · , 1, 1, · · · , 1︸ ︷︷ ︸
Li−Li−1︸ ︷︷ ︸
Li
, 0, 0, · · · , 0︸ ︷︷ ︸
Li+1−Li
, · · · , 0, 0, · · · , 0︸ ︷︷ ︸
LX−1−LX−2
〉
which is reverse sorted (definition 3.2).
Thus we may, without any loss of generality assume that every totally ordered source
set L is represented in canonical order. This assumption does not change L, it only makes
our proof in subsequent chapters much easier.
19
Chapter 4
Generating a Given Totally Ordered
Set
Let S = L◦π be any set of output subsets that can be produced from a source set L and
a single ordered partition π. In this chapter, we will first produce multiple ordered partition s
π0, π1, · · · , πy−1 for totally-ordered S. This result is important as the entire result of Jordan
and Vaidyanathan [16] applies only to a relatively small set S with at most X = z − 1
elements. Further, we show that any decomposition of S into S0, S1, · · · , Sk−1 works, as long
as |Si| ≤ X.
In a separate result, Jordan and Vaidyanathan [16] showed that a set S = S0 ∪ S1 ∪ · · · ∪
Sk−1 of isomorphic totally-ordered sets S0, S1, · · · , Sk−1 with |Si| ≤ X can be implemented
as MU(x, x+ 1, log k, n). We extend this to work for any set S = S0 ∪ S1 ∪ · · · ∪ Sk−1 of
totally-ordered sets (not necessarily isomorphic).
4.1 Generating a Large Totally-Ordered Set
We begin with the case where a totally-ordered set S is small. Recall that L is the
source set. Suppose that |L| ≥ |S|. In Lemma 3.3 we showed that if S = L ◦ π then if S is
totally-ordered, then so is L (assuming π has no empty block). The following lemma, in a
way, works in the opposite direction.
Lemma 4.1. For any given L and S that are both totally-ordered sets, with |L| ≥ |S| there
exists an ordered partition π such that L ◦ π = S.
Proof. Let L = {L0, L1, · · · , LX−1} with Li ⊂ Li+1 for 0 ≤ i < X−1 and S =
{
S0, S1, · · · , Su−1
}
with Si ⊂ Si + 1 for 0 ≤ i < u − 1. Here u ≤ X. Recall that z is the source word length
for Li to be distinct. X ≤ z − 1, assuming Li 6= ∅ or Zz. Without loss of generality assume








struct ordered partition π = {Bi : 0 ≤ i < z} as follows: block B0 = S0 and for 0 < i < u,
20
Bi = Si − Si−1. Elements of Zn − Su−1 can be placed in any manner among blocks Bu
to Bz−1. Observe that
i⋃
j=0







where 0 ≤ i < u. a ∈ Li ◦ π if and only if a ∈ B0 ∪ B1 ∪ · · · ∪ Bi = Si. That is Si = Li ◦ π.
Therefore S = L ◦ π.
The above result also appears in Jordan and Vaidyanathan [16], although expressed
less formally. This result extends in a simple manner to large totally-ordered sets S with




Sj = S and Si ∩ Sj = ∅ for i 6= j. Any such ordered partition suffices since S is
totally-ordered, clearly each Si is totally-ordered.





. For each Si, applying Lemma 2.4 gives us an ordered
partition πi such that Si = L ◦ πi. Thus with Π = {πi : 0 ≤ i < v}, we get S = L ◦ Π.
Lemma 4.2. For any totally-ordered output set S and a totally-ordered source set L there





Theorem 4.1. A MU(x, 2x + 1, y, n) can produce a totally-ordered set of at most 2x+y
subsets.
Proof. Setting |L| = 2x and v = 2y in Lemma 4.2, we have |S| ≤ v|L| = 2x+y.
This result extends the idea in Jordan and Vaidyanathan [16] to large totally-ordered sets.
The above theorem shows a method to implement a large totally-ordered set S or a MU-
Decoder. In doing so, we partitioned S into 2x blocks. Clearly, it is useful to have each block
of S contain approximately the same number of elements, so that the number of blocks is
reduced. Thus S =
v−1⋃
i=0











However, which elements of S are in Si is not clear. For the purpose of the results in this
chapter, this question is irrelevant. However, in Chapter 5 we will show that a particular
way of constructing Si is advantageous.
21





Si be a set of output sets with Si 6= ∅, Si ∩ Sj = ∅ for i 6= j and Si is
totally-ordered. Then recall Definition 2.6 that S is a set of isomorphic totally-ordered sets
if and only if for each Si, Sj and any S
i ∈ Si there exists an Sj ∈ Sj such that |Si| = |Sj|.
Jordan and Vaidyanathan [16] proved that if S is a set of isomorphic totally-ordered sets and
if |Si| ≤ |L| then an MU-Decoder can generate S as in Definition 2.6.
The main contribution of this definition is to leverage a common L for all v = 2y totally-
ordered sets. We now show that the isomorphic restriction is not needed and that the range
of ordered partition could exceed v.
Theorem 4.2. Let S =
v−1⋃
i=0
Si be a set of output sets where Si is non-empty totally-
ordered and pairwise disjoint. Then for any totally-ordered source set L, there exists a set
Π of at most v + |S||L| ordered partitions such that S = L ◦ Π.
Proof. By Lemma 2.4 each Si can be generated by L and an ordered partition set πi ( so



























|Si| = v + |S||L| .
The above is used later for Generic Subset Generation.
22
Chapter 5
Hardware Enhancement for Partition
Generation
In Chapter 2 we discussed the structure of the MU-Decoder (Figure 2.1, Page 9). In
this chapter, we particularly focus on the selector module of the MU-Decoder. The selector
module accepts a selector address and selects the corresponding ordered partition to be used
with the MUXes for mapping the source words. Originally each selector address produces
one ordered partition. In this chapter, we propose hardware enhancement for the selector
module that allows each selector address to produce multiple ordered partitions.
The original selector module produces 2y ordered z-partitions with a hardware cost
of O(2yn log z), where n is the size of the output word representing an output subset of
Zn = {0, 1, · · · , n − 1}. With the enhancements, we will be able to produce k2y ordered
z-partitions with a cost of O(2yn log h); that is with the same cost as before, we can produce
a factor of k more ordered partitions.
In the next section we provide an overview of the new hardware. Section 5.1 is devoted
to the idea of “Translation” of one ordered partition to another. This operation is essential






Figure 5.1: Current Selector Module
23
5.1 Ordered Partition Translation
In this section we describe an operation called translation on an ordered partition π0 that
produces another ordered partition π1. Central to this operation is a “Translation Vector”
−→
C that guides the translation of π0 to π1. Subsequently we will develop the proposition of
translation.
Definition 5.1. Let ordered partition π0 = 〈B0m : 0 ≤ m < z〉 be an ordered partition on
Zn, with |B0m| = `0m. For each 0 ≤ m < z, let Cm ⊆ B0m with Cz−1 = ∅. Denote |Cm| = cm.
Let vector
−→
C = 〈Cm : 0 ≤ m < z〉. Ordered partition π1 = 〈B1m : 0 ≤ m < z〉 is a unit-




C−→ π1) if and only if B10 = B00 − C0 and
for all 0 < m < z, B1m = (B
0
m − Cm) ∪ Cm−1.
Example 5.1. Let ordered partition π0 = 〈{3, 4, 11, 12}, {2, 5, 10, 13}, {1, 6, 9, 14}, {0, 7, 8, 15}〉
and ordered partition π1 = 〈{3, 4}, {11, 12, 2, 5, 10}, {13, 1}, {6, 9, 14, 0, 7, 8, 15}〉. Then π0
−→
C−→
π1, that is π1 is a unit-translation of π0 with respect to vector
−→
C = 〈{11, 12}, {13}, {6, 9, 14}, {}〉.






0 6= B00 − D0 for any D0 ⊆ B00 we have B20 6⊆ B00 . Moreover though B21 =
(B01−D1)∪D0 where D0 = 〈12〉, D1 = 〈13〉. But B22 6= (B02−D2)∪D1, B23 6= (B03−D3)∪D1
and D3 6= ∅.
Let ordered partition π3 = 〈{3, 4}, {11, 12, 2}, {5, 10}, {13, 1, 6, 9, 14, 0, 7, 8, 15}〉, then π1
−→
E−→
π3, that is π3 is a unit-translation of π1 with respect to vector
−→
E = 〈{}, {5, 10}, {13, 1}, {}〉.
We also note that π0
−→
F
6−→ π3, because 〈13〉 ∈ B01 , so it should be in B31 = (B01 − F1) ∪ F0 or
B32 = (B
0
2 − F2) ∪ F1 but 〈13〉 ∈ B33 which is not possible through unit-translation.
We observe that when π0 is translated to π1, cm ≤ |B0m| elements of B0m ∈ π0 move to
B1m+1 ∈ π1. All the remaining elements remain in the same block, that is the rest of the
elements of B0m ∈ π0 move to B1m ∈ π1 which is basically the same block number across
different partitions.
Definition 5.2. For each element a ∈ Zn, let r(a) be a unique rank (number) from Zn.
24
The value a and its rank r(a) are not related. A set of ranks r is consistent with an ordered
partition π =
〈
B0, B1, · · · , Bz−2, Bz−1
〉
if and only if the following condition holds. For all
a ∈ Bi and b ∈ Bj, if i < j then r(a) < r(b) and if a, b ∈ B0i still r(a) 6= r(b).




E−→ π3 is a 2-transition with ordered partition π0 =
〈{3, 4, 11, 12}, {2, 5, 10, 13}, {1, 6, 9, 14}, {0, 7, 8, 15}〉 and the rank of the
ordered partition r(π0) = 〈{0, 1, 2, 3}, {4, 5, 6, 7}, {8, 9, 10, 11}, {12, 13, 14, 15}〉.




C1−→ π2 · · · πk−1
−−−→
Ck−1−→ πk be a series of unit-translations.
This sequence of k unit-translations from π0 to πk will be called a k-translation if and only if
there exists a rank r for each element of Zn that is consistent with every ordered partition πi
(0 ≤ i ≤ k).
This implies that the Cm elements of B
0
m that move into B
1
m+1 are the highest ranked
elements of B0m.
Lemma 5.1. Suppose π0
−→
C−→ π1. Then for all a ∈ Zn, if a ∈ B0m then a ∈ B1m′ where
m′ ∈ {m,m+ 1}.
Proof. Let position a ∈ B0m. Now every block in ordered partition π1 is B1m = (B0m −
Cm) ∪ Cm−1; where C−1 = ∅. Since a ∈ B0m, a 6∈ B0m−1 if it exists (the blocks are disjoint).
That is a 6∈ Cm−1 ∈ B0m−1. Now we consider two cases, first if a ∈ Cm, then a 6∈ B1m but
a ∈ B1m+1 = (B0m+1 − Cm+1) ∪ Cm. Second if a 6∈ Cm then a ∈ B1m = (B0m − Cm) ∪ Cm−1.
Remark: The block number of any element of block B0m increases by at most 1 as we
translate from π0 to π1.
Theorem 5.1. Let L be a source set and π0, π1 be ordered z-partitions such that π0
−→
C−→ π1.
Let output sets S0 = L ◦ π0 and S1 = L ◦ π1. If L is totally-ordered, then S0 ∪ S1 is totally-
ordered.
25
Proof. Without loss of generality, we assume that the source set L is in canonical form (see
Section 3.3 on page 17). Clearly, by Lemma 3.2 (Page 15), S0 and S1 are independently
totally-ordered, as L is totally-ordered. Suppose that S0 ∪ S1 is not totally-ordered. Then
there exist subsets S0i ∈ S0 and S1j ∈ S1 such that S0i 6⊆ S1j and S1j 6⊆ S0i . This implies that
there are elements a, b ∈ Zn such that a ∈ S0i , b 6∈ S0i and a 6∈ S1j , b ∈ S1j . This implies that
a, b are in different blocks of π0; similarly they are in different blocks of π1. Let a ∈ B0î ,
b ∈ B0
ĩ
where î 6= ĩ and let a ∈ B1
ĵ
and b ∈ B1
j̃
where ĵ 6= j̃. Let S0i = Li′◦π0 and S0j = Lj′◦π1.
Since a ∈ S0i , a ∈ B0î and S
0
i = Li′ ◦ π0, we have Li′ (̂i) = 1. Similarly Li′ (̃i) = 0,Lj′(ĵ) = 0
and Lj′(j̃) = 1. Now since source set L is in canonical form, from Li′ (̂i) = 1 and Li′ (̃i) = 0
we can say
î < ĩ (5.1)
similarly ĵ > j̃ (5.2)
Now since a ∈ B0
î
and a ∈ B1
ĵ
, a might have moved to a new block in translation.
by Lemma 5.1 î ≤ ĵ ≤ î+ 1 (5.3)
Similarly ĩ ≤ j̃ ≤ ĩ+ 1 (5.4)








≤ î + 1
(5.1)
≤ ĩ which is a contradiction. Hence the
lemma.
Lemma 5.2. Let G1j+1 = B
1
0 ∪ B11 ∪ · · · ∪ B1j+1 and G0j+1 = B00 ∪ B01 ∪ · · · ∪ B0j+1. Then
G1j+1 = G
0
j+1 − Cj+1, where 0 ≤ j + 1 < z − 1.
Proof. As assumed G1j+1 = B
1
0 ∪ B11 ∪ · · · ∪ B1j+1 and G0j+1 = B00 ∪ B01 ∪ · · · ∪ B0j+1. From
Definition 5.1 we know that B10 = B
0
0 − C0 and for all 0 < j < z, B1j = (B0j − Cj) ∪ Cj−1
26
where Cz−1 = ∅.
Hence G1j+1 = B
1
0 ∪B11 ∪ · · · ∪B1j+1
= {(B00 − C0)} ∪ {(B01 − C1) ∪ C0} ∪ · · · ∪ {(B0j+1 − Cj+1) ∪ Cj}
= B00 ∪B01 ∪ · · · ∪ (B0j+1 − Cj+1)
= G0j ∪ (B0j+1 − Cj+1)
Since all blocks are disjoint and Cj+1 ⊆ B0j+1
Hence: G1j+1 = G
0
j+1 − Cj+1
moreover G1j+1 ⊂ G0j+1
Now since Cz−1 = ∅ and union of all the blocks gives us Zn






j+1 − Cj+1 (5.5)
proves the lemma.
Let us again consider the translation π0
−→
C−→ π1, which yields the output set S0 = L ◦ π0
and S1 = L ◦ π1. From Theorem 5.1 we know that if source set L is totally-ordered then
S0 ∪ S1 is totally-ordered as well. We now derive the circumstances under which S0 and S1
are disjoint.
Lemma 5.3. Let π0
−→
C−→ π1 and for totally-ordered source set L, let S0 = L ◦ π0 and
S1 = L◦π1. Then if 0 < ci < |B0i | for all 0 ≤ i < z we have S0∩S1 = ∅, where Cz−1 = ∅.
Proof. We first observe that since 0 < Ci < |B0i | for 0 ≤ i < z − 1, we have non empty
blocks for π0. We proceed in the contrapositive direction. Let S ∈ S0∩S1. Then there exists
some 0 ≤ u′, v′ < X such that S = Lu′ ◦ π0 = Lv′ ◦ π1. We can, without loss of generality,













. Using Definition 2.4 we can say Lu′ ◦ π0 =
z−1⋃
i=0
[1(B0i , Lu′(i))] =
27
{B00 ∪B01 ∪ · · · ∪B0u} = G0u (Lemma 5.2). Similarly
Lv′ ◦ π1 =
z−1⋃
i=0
[1(B1i , Lv′(i))] = {B10 ∪B11 ∪ · · · ∪B1v} = G1v. Hence S = G0u = G1v.
Hence S = G0u = G
1
v (5.6)
using Equation (5.5) where G1v = G
0
v − Cv
we get G0u = G
0
v − Cv (5.7)
We now consider a few cases:




= G0v−Cv. Hence Cv = ∅.
This is not possible as cv > 0.
ii For u > v let the union of u blocks G0u = {B00 ∪ B01 ∪ · · · ∪ B0v ∪ B0v+1 ∪ · · · ∪ B0u} =




= G0u, which is
the necessary contradiction.
28
iii For u < v, let u+ i = v where i > 0 is an integer.
G0u = G
0
v − Cv Equation (5.7)
= G0u+i − Cu+i
=
G0u ∪ (B0u+1 ∪B0u+2 ∪ · · · ∪B0u+i)︸ ︷︷ ︸
H (let)
− Cu+i
= (G0u − Cu+i) ∪ (H − Cu+i)
Now since Cu+i ⊆ B0u+i 6∈ G0uwe can say
G0u = G
0
u ∪ (H − Cu+i)
Hence H − Cu+i ⊆ G0u
This is possible only if H − Cu+i = ∅ because all the blocks of an ordered partition are
disjoint, meaning H ∩G0u = Cu+i ∩G0u = ∅
From H − Cu+i = ∅
we can say H ⊆ Cu+i
Now since all the blocks are non empty and H =
(
B0u+1 ∪B0u+2 ∪ · · · ∪B0u+i
)
we can say
B0u+i ⊂ H ⊆ Cu+i
Therefore |B0u+i| < cu+i,
which contradicts our assumption that cu+i < |B0u+i|.
Lemma 5.4. Let π0
−→
C−→ π1 and for totally-ordered source set L, let S0 = L ◦ π0 and
S1 = L ◦ π1. Then S0 ∩ S1 = ∅ implies for all 0 < m < z − 1, 0 < cm < |B0m|.
Proof. Suppose the conclusion is false. That is for all 0 < m < z− 1, 0 < cm < |B0m| is false.
This implies that there exists m such that cm = |B0m| or cm = 0 6= |B0m|.
29
i For cm = |B0m|, that is Cm = B0m
Now G1m = G
0
m − Cm equation (5.5)
= G0m −B0m using Cm = B0m
= (B00 ∪B01 ∪ · · · ∪B0m)−B0m
= B00 ∪B01 ∪ · · · ∪B0m−1 because the Blocks B0i are pairwise disjoint.
or G1m = G
0
m−1 implies that the union of blocks receiving
value 1 in π1 and π0 are equal.
Hence the result:
=⇒ S1 ∩ S2 6= ∅
i For cm = 0 6= |B0m|, that is Cm = ∅
Now G1m = G
0
m − Cm equation (5.5)
or G1v = G
0
v implies that the union of blocks receiving value-
1 in π1 and π0 are equal. Hence the result:
=⇒ S1 ∩ S2 6= ∅
Hence the lemma.
Lemma 5.5. Let π0
−→
C−→ π1 and for totally-ordered source set L, let S0 = L ◦ π0 and
S1 = L◦π1. Then for all 0 < m < z−1 we have 0 < cm < |B0m| if and only if S0∩S1 = ∅.
Proof. Evident from the results of the two Lemmas 5.3 and 5.4.
We now consider a particular class of translation π0
−→
C−→ π1, in which for all 0 ≤ m < z−1,
|B0v | 6= ∅ implies 0 < cm = c ≤ |B0m|. That is c lower bounds the size of the smallest block
of π0. We will call such a translation as a C-uniform unit-translation or simply a C-uniform
translation, where the context is clear. Before we proceed let us consider the following
example.
30
Example 5.2. For n = 8, that is Zn = {0, 1, 2, 3, 4, 5, 6, 7}, let z = 3 and let π0 =
〈{0, 5, 7}, {2, 3}, {1, 4, 6}〉. Here the block number for elements 0,5,7 is 0, that of 2,3 is
1 and that of 1,4,6 is 2. Let ~C = 〈{5, 7}, {2, 3}〉, then for π0
−→
C−→ π1 we have π1 =
〈{0}, {5, 7}, {2, 3, 1, 4, 6}〉. The block number for 0 is still 0, however the block number
of 5,7 is now 1. This Translation causes the block number of the elements of Zn to change.
Notice that the number of elements entering and leaving each block is uniform (c = 2 ≤ |B0m|)
except the first and last blocks. Hence a C-uniform translation.
Lemma 5.6. There exists a C-uniform translation π0
−→
C−→ π1, if and only if no block B0m
(0 ≤ m < z − 1) has 0 < c′ < c elements, that is every block has ≥ c elements.
Proof. If every block has αc (α > 0) elements then consider any ~C = 〈C0, C1, · · · , Cz−2〉
with cm = c. Since |B0m| ≥ c, cm ≤ B0m is always true. On the contrary, if some block
has 0 < |B0m| = c′ < c, then cm must be equal to c for a C-uniform translation. However
cm ≤ |B0m| < c. Thus the C-uniform translation is not possible.
Lemma 5.7. If π0
−→
C−→ π1 is a C-uniform translation with respect to ~C, and if every block
B0m of ordered partition π0 (for 0 ≤ m < z − 1) has ≥ c elements, then every block B1m of
ordered partition π1 has equal number of elements, that is |B1m| = |B0m|.
Proof. Recall from Definition 5.1 that B1m = (B
0
m − Cm) ∪ Cm−1 and Cm ⊆ B0m. Since
Cm−1∩B0m = ∅, we have |B1m| = |B0m|−|Cm|+|Cm−1| = |B0m|−cm+cm−1. Now cm = cm−1 = c,
hence |B1m| = |B0m| − c + c = |B0m|.
Putting Lemmas 5.6, 5.7 together we can say that if the number of elements in every






C1−→ π2 · · · πk−1
−−−→
Ck−1−→ πk
This is because the block size remains the same through the translations which is ≥ c.
We now consider such a C-uniform k translation in which from each block Bim, c elements
shift out to the corresponding block Bi+1m+1 ∈ πi+1.
31
Example 5.3. For n = 16, that is Zn = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, let
z = 4 and let π0 =
〈
{0, 5, 7, 8, 9, 14}︸ ︷︷ ︸
block 0
, {2, 3, 13, 15}︸ ︷︷ ︸
block 1
, {1, 4, 6, 12}︸ ︷︷ ︸
block 2




~C0 = 〈{9, 14}, {13, 15}, {6, 12}〉, then for π0
−→
C0−→ π1 we have
π1 =
〈
{0, 5, 7, 8}︸ ︷︷ ︸
block 0
, {9, 14, 2, 3}︸ ︷︷ ︸
block 1
, {13, 15, 1, 4}︸ ︷︷ ︸
block 2
{6, 12, 10, 11}︸ ︷︷ ︸
block 3
〉
. Let ~C1 = 〈{7, 8}, {2, 3}, {1, 4}〉,
then for π1
−→
C1−→ π2 we have π2 =
〈
{0, 5}︸ ︷︷ ︸
block 0
, {7, 8, 9, 14}︸ ︷︷ ︸
block 1
, {2, 3, 13, 15}︸ ︷︷ ︸
block 2




block number for 0,5 is still 0, however the block number of 9,14 is now 1. These translations
cause the block number of the elements of Zn to change. Notice that the number of elements
entering and leaving each block is uniform (c = 2 ≤ |B0m|), hence the block size remains
constant (except the first and last blocks). Hence a C-uniform 2-translation. The extent of
change in block numbers is tracked later in the chapter.
We recall from Example 5.3 that as we translate from one ordered partition to another,
block number of elements of Zn change. We now track the extent of change to the block




C1−→ π2 · · · πk−1
−−−→
Ck−1−→ πk.
Specifically, we consider any element a ∈ B0m whose block number in π0 is m. As we move
from π0 to π1, π2, · · · πk, the block number of a could increase. This increase is calculated in
following paragraphs.
As defined in Definition 5.2 we can say that there is a rank r of the elements of Zn, that
is constant with πi (0 ≤ i ≤ k), where πi is produced from C-uniform k translation of π0
through πk. Considering π0
−→
C0−→ π1, let some element a ∈ B0m move to block B1m+1; recal
that a ∈ B1m or B1m+1. Thus the maximum increase in the block number of element a is 1. If
g(k) denotes the maximum increase in the block number of an element as we translate from
π0 to πk, then g(1) = 1. Figure 5.2 shows the contents of a block B
1
m+1 with ≥ c elements.









− 1 = `1 (where |B1m+1| ≥ c). After another `1 translation a ∈ B1m+1
moves to B2m+2 only if it is one of the largest c ranked elements of B
1
m+1. If `1 ≥ 1, then
32












Figure 5.2: C-uniform translation
|B1m+1| ≥ 2c and a will not move out of B1m+1 to B2m+2. It will remain in B1m+1, that is block
number of a will not increase. In fact, it is easy to show that the block number of a will
increase only after `1 additional translations. So for all 1 ≤ `′ ≤ `1, a ∈ B`
′
m+1. It’s possible
that a ∈ B`′+1m+2. Table 5.1 extends this argument to show the number of translations needed
for each block number increase. We now restrict our attention to a particular case where
Block Number Number of Translations
m 0(initial)
m+ 1 1 = k1






















Table 5.1: Block Number of a ∈ B0m over translation
|B00 | ≥ kc. We call this a non-depleting C-uniform k-translation. It is easy to verify that
each block Bi0 ≥ 0 (0 ≤ i < k), since |B0m| ≥ c for all 0 < m < z − 1, hence Bim ≥ c. This is
because for m > 0, a block receives c elements and gives up c elements. It is only Bi0 that
does not receive any element for non-depleting C-uniform k-translation.
Hence: |Bi0| = |B00 | − ic for 0 ≤ i < k (5.8)
and |Bim| = |B0m| for 0 < m < z − 1 (5.9)
33
In Table 5.1 we see that kh > kh−1 ≥ 1. Therefore using Equation 5.9, we can say |Bkhm | =
|B0m|. Thus from the table, we can say

































































An unintended consequence of this is the following Lemma.











k-translation and still have each element’s block number increase by at most h.
Suppose we have an ordered partition π0 = 〈B0m : 0 ≤ m < z〉 with z blocks in which B00




2 , · · · , B0z−1 has at least c elements each. Then π0




C1−→ π2 · · · πk−1
−−−→
Ck−1−→
πk) ensuring that the block number of any element of Zn increases (corresponding to that of
π0) by at most h.




1, · · · , Ciz−2,
〉
has |Cim| = c.
Example 5.4. For n = 16, that is Zn = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15}, let
z = 4, c = 2 and let π0 =
〈
{11, 0, 5, 7, 8, 9, 14}︸ ︷︷ ︸
block 0
, {2, 3, 13, 15}︸ ︷︷ ︸
block 1






~C0 = 〈{9, 14}, {13, 15}, {6, 12}〉, then for π0
−→




{11, 0, 5, 7, 8}︸ ︷︷ ︸
block 0
, {9, 14, 2, 3}︸ ︷︷ ︸
block 1
, {13, 15, 1, 4}︸ ︷︷ ︸
block 2
{6, 12, 10}︸ ︷︷ ︸
block 3
〉
. Let ~C1 = 〈{7, 8}, {2, 3}, {1, 4}〉,
then for π1
−→
C1−→ π2 we have π2 =
〈
{11, 0, 5}︸ ︷︷ ︸
block 0
, {7, 8, 9, 14}︸ ︷︷ ︸
block 1
, {2, 3, 13, 15}︸ ︷︷ ︸
block 2




~C2 = 〈{0, 5}, {9, 14}, {13, 15}〉, then for π2
−→





, {0, 5, 7, 8}︸ ︷︷ ︸
block 1
, {9, 14, 2, 3}︸ ︷︷ ︸
block 2




These 3 translations cause the block number of the elements of Zn to change by up to 2.
Notice that the size of the first block is greater than the product of the number of translations
and shift size, moreover the number of elements entering and leaving each block is uniform
(c = 2 ≤ |B0m|), hence the block size remains constant (except the first and last blocks).







The block number for elements {11, 0, 5, 7, 8, 9, 14, 2, 3, 13, 15, 1, 4, 6, 12, 10} changed from
0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2, 3 to 0, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3 respectively. Hence
the change in block number for the elements are 0, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 1, 0, we call
this as the increment vector for π0 −→ πh. In general the increment vector 〈d0, d1, · · · , dn−1〉
gives da, the increase in block number for element a ∈ Zn with 0 ≤ da ≤ h.
Recalling that an ordered partition π =
〈
B0, B1, · · · , Bz−1
〉
is represented in the address
generator module (see Figure 5.3) as a sequence 〈e0, e1, · · · , ea, · · · , en−1〉, where for all a ∈
Zn, ea = m if and only if a ∈ Bm. Clearly 0 ≤ ea < z. Let us call the sequence 〈ea : a ∈ Zn〉
as the partition vector. Given any partition vector ~e0 = 〈e0a : a ∈ Zn〉 (corresponding to
ordered partition π0) and an increment vector ~d = 〈da : a ∈ Zn〉 for π0 −→ πq, where q = k.
One can generate the partition vector ~eq = 〈eqa : a ∈ Zn〉 as eqa = e0a + da. Notice that
0 ≤ da ≤ h. Figure 5.3 shows how πq can be generated from π0.
In the method of Jordan and Vaidyanathan [16] the address generator stored the vector
for every ordered partition, it needs to use. This required a 2y × n log z LUT with gate
cost O(2yn log z). Now we only need to store the increment vector, each of size n log h (as
opposed to n log z for the entire ordered partition ). If 2y1 increment vectors are stored then
35




1 · · · e0a · · · e0n−1
Increment Vector d0 d1 · · · da · · · dn−1
log z-bit address




1 · · · eqa · · · e
q
n−1
Figure 5.3: Address Generator
the cost is O(2y1n log h) for the LUT, 2n log z for the input and output partition vectors.
The additions even if performed by a ripple carry adder have O(n log z) cost. Thus the total





. Now if h
is a constant then y1 = y log log z. Thus with the same cost of O(2
yn log z), the enhanced
MU-Decoder can now produce 2y1 ordered partitions and 2y1+x subsets rather than the 2y+x
subsets in the original solution. Further generalizing this, if we use 2y ordered partitions
each capable of handling 2y1 increment vectors, then with a cost of O(2yn log z + 2y1n log h)






. This gives the hardware structure of Figure 5.4.
In general it is more efficient to decrease y (to even 0), however, a larger y allows















Until now we have looked at generating a given totally-ordered set. In this chapter, we
generate a generic set of subsets of Zn, from which other subsets can be discovered.
6.1 Traversing the Boolean Lattice
Given a subset S ⊆ Zn, supersets of S can be generated by adding one element of Zn−S
at a time to S . This amounts to moving up the Boolean lattice, one level at a time. In fact,
a one-hot decoder generates subsets this way.
Example 6.1. On a 1-hot decoder the subset S2 = {0, 1, 2} may be generated as S0 = {0},
S1 = S0 ∪ {1} and S2 = S1 ∪ {2}.
Similarly, subsets can be formed by removing one element at a time, that is moving down
the Boolean lattice. Thus the path length of moving from subsets Ŝ = (S1−S2)∪S3 (where
S1 ∩ S2, S1 ∩ S3 are empty) is |S2|+ |S3| which is a good reflection of the cost of generating
Ŝ from S1.
The overall idea is to generate S1 in O(log n) time and then spend another O(d log n)
time to generate S from S1. The set S1 is called a primary set and S (derived from S1) a
secondary set. Now given a set S = 〈Si : 0 ≤ i < u〉 of primary sets (produced by an MU-
Decoder), we try to produce a set Ŝ = {Ŝi} ⊇ S of secondary subsets that can be produced
within a distance d of S.
6.1.1 Single Total Order S
Let S = {Si : 0 ≤ i < u} be totally-ordered. Using Lemma 4.1, S can be produced
using an MU-Decoder in O(log n) time. Select S so that Si ⊆ Si+1 and |Si+1| − |Si| ≥ 2d
This implies that the totally-ordered set S occupies a path in the Boolean lattice in which
individual elements of S are at least 2d apart (see Figure 6.1). For each primary set Si
reaching in O(log n) time using an MU-Decoder, secondary set at distance d can be reached.
Since Hamming distance between any Si and Sj ≥ 2d, a d distance set can be reached from
38
only one primary set.
Figure 6.1: 2d spaced totally-ordered subsets
Lemma 6.1. The number of d-distance secondary sets reachable from any primary set






Proof. Out of the n possible elements in (or not in) Si, select d to be not in (or in) Ŝi (a
secondary set).
Example 6.2. For set {0, 1, 2}, a 1-hot decoder generates {0}, then generates {1} and
accumulates this with {0} to get {0, 1} and finally generates {3} to accumulate with prior
result to get {0, 1, 2}.
In general, the above accumulation is a move up from a point in the Boolean Lattice. A
move down can be done in a similar way (using a active 0 1-hot decoder).
Given any set S , another set S ’ that differs from S in d elements (Hamming distance d)
can be produced from S in d iteration. Here we create a set of primary subsets S, such that
each pair of elements of S has a Hamming distance > 2d, see Figure 6.1.
39
Lemma 6.2. If primary set of subsets is totally ordered and if it has a pairwise Hamming





z2 log z subsets can be generated in O(d log n) time by an
MD((dlog(z − 1)e, 0, z, n)).
Proof. This MU-Decoder represents a single line with 2x = z elements. Therefore each
source word corresponds to one primary subset and there is only one partition being used
(y = 0). The Totally-orderedfor a single line is proved by Jordan and Vaidyanathan [17] and






The following theorem is deduced in a similar way to Lemma 6.1 using Theorem 4.1
(page 21).
Theorem 6.1. Now if we have 2y ≤ n
2d
(d ≥ 1) primary sets, where all the subsets have a





z2 log z2y totally-orderedsubsets can be generated
in O(d log n) time by an MD((dlog(z − 1)e, dlog(z log z)e, z, n)).
Proof. 2y ≤ n
2d
primary subsets can be spaced as Figure 6.1.
To see the significance of this result, we observe that for d = 1 we have nz2 log z subsets at
O(nz log z) cost and O(log n) delay. However with z = 2, there is no significant change to the




(an increase by an n
2
2
factor). Given that n is much larger than z, this is a substantial
increase in the number of subsets produced. More generally, for about the same costa and
delay, with any constant d we will have a polynomial increase in the number of subsets
produced.
Let S = S0∪S1∪· · · SY−1, where the Si’s are pairwise disjoint and equal in size. Suppose
Si = L ◦ πi. Then S can be generated as shown in Definition 2.4. Now we look into the fact
whether all these πi ordered partitions can be generated from a single ordered partition π0
as shown in Lemma 5.8 (see page 34). For us to translate k ordered partitions from ordered
partition π0, we need |B00 | > kc and each block in π0 to have at least c elements (except
the last one). We pick any S (path of XY elements) where X = |L| the source set size and
40
Y is the number of partitions, including the primary and the translated ones. Let elements
of S be subsets uniformly distributed on the path. That is, |Si − Si−1| ≥ c ≥ 2d for each i
(see Figure 6.2). Now let us divide the XY totally-ordered subsets from Figure 6.2 into Y
sets of subsets as below
S0 = {SY−1, S2Y−1, · · · , SiY−1, · · · , SXY−1}
S1 = {SY−2, S2Y−2, · · · , SiY−2, · · · , SXY−2}
...
Sj = {SY−(j+1), S2Y−(j+1), · · · , SiY−(j+1), · · · , SXY−(j+1)}
...
SY − 1 = {S0, SY , · · · , S(i−1)Y , · · · , S(X−1)Y }
Now Sj = {Sj0, S
j
1, · · · , S
j

























X = Zn − S
j
X−1. We
used X = 2x = z − 1 from Jordan and Vaidyanathan [16].










0 = SY−1 − SY−(j+1). Since Y − 1− (Y − (j + 1)) = j, we can say
|SY−1| − |SY−(j+1)| = cj. Clearly S00 ⊇ S
j















= (S(i+1)Y−1 − S(i)Y−1)− (S(i+1)Y−(j+1) − S(i)Y−(j+1))
Now (i+ 1)Y − 1 ≥ (i+ 1)Y − (j + 1) ≥ iY − 1 ≥ iY − (j + 1)
or (S(i+1)Y−1 ⊃ (S(i+1)Y−(j+1) ⊃ S(i)Y−1) ⊃ S(i)Y−(j+1))
Hence B0i −B
j
i = S(i+1)Y−1 − S(i+1)Y−(j+1)
or |B0i −B
j
i | = jc
41
Hence we can say π0 → πj is a non depleting C-uniform k-translation. We can say
that if element a moves into Bji from B
j−1
i−1 , then a never moves out of B
j
i . Hence S
0 =
S0, S1, · · · , SY−1 can be generated from a single ordered partition π0. Therefore for each
totally-ordered set, the number of generator partitions is only 1 which is π0 and the rest are
generated using the increment vector. Hence y = y1 + y2 = y2.
Lemma 6.4. S0, S1, · · · , SY−1 are disjoint.
Proof. We can say from the construction of Figure 6.1 that since |Si−Si−1| ≥ c, hence each





















In this thesis, we proved that for any source set and any output set we can produce
an ordered partition which will map the given source set to the output, as long as the
source set and output set are totally-ordered. We further expand on this to say that for
a given set of output sets, which are individually totally-ordered, they can be produced
from any given single totally-ordered source set we produce the required set of ordered
partitions for this output set (for the same source set). This alleviates the limitations due to
isomorphism (see Jordan and Vaidyanathan [16]). This allows us to produce different sets
of individually totally-ordered output sets from a single totally-ordered source set, hence
effectively reducing the cost of using a large LUT-Decoder. A set of subsets as represented
in Figure 1.3(b) (see page 5), can be produced with the same cost as a regular MU-Decoder,
but with the isomorphism constraint, the cost would rise significantly.
We further worked on hardware enhancements for generating additional ordered parti-
tions from existing ordered partitions, inside the MU-Decoder itself, leading to an increase in
the number of subsets produced for about the same hardware cost. We enhance the existing
hardware of the MU-Decoder to produce these additional subsets. This additional hardware
doesn’t affect the complexity of previously assumed cost of the MU-Decoder and generates
Θ(log z) additional subsets. Producing these additional subsets has certain limitations, the
generator and generated subsets together are totally-ordered. So far we generated, for π0,




Si is totally-ordered. However, our method does not require this. All we im-




Si being totally-ordered, we have strong evidence that is all we require for
S0 ∪ Si to be totally-ordered. In this thesis, we focused our work producing subsets which
44
can be produced cheaply, and using these subsets to approach a given set of subsets at a dis-
tance of up to log n from the produced subset. In this thesis, we focused our work producing





2yz2 log z sub-
sets in O(d log n) time using a MU-Decoder MD(dlog(z − 1)e, dlog(z log z)e, z, n) of gate cost
O(zn log z) and delay O(log n) where 2y ≤ n
2d
and (d ≥ 1). This is a substantial expansion
of the MU-Decoder range for efficient operation.
7.2 Future Work
This work opens up several possible directions for future work.
Is the translation algorithm the best possible? Are similar algorithms (to generate
ordered partition) possible for non-totally-ordered sets?
The cost of enhancement is driven by the cost O(n2y) of the increment vector. We
assume that these vectors are independent. Could one be derived from the other?
The MU-Decoder itself is a reconfigurable device being reconfigured through its LUTs.
Can the MU-Decoder be used as a LUT? For example the increment vector π as big as a
subset of Zn.
So far the MU-Decoder used is “universal” [17] in which all z source bits are sent to all
n muxes. We want to figure if there is a “good” set of subsets (as in Chapter 6) for which
each MUX receives only a subset of source bits. This would drastically reduce the cost of the
interconnects in the mapping unit and the address selector LUT size. (It has been shown by
Kongari [21] that these interconnect is a large contributor to the MU-Decoder area.)




Figure 7.1: Multiple Totally-Ordered paths produced from one Totally-Ordered set of subsets
46
Bibliography
[1] Intel FPGAs, Solutions, 2017.
[2] Arash Ashrafi. An architecture for configuring an efficient scan path for a subset of
elements. Master’s thesis, Louisiana State University, 2016.
[3] P. N. Bachate and S. M. Mahamuni. FPGA based robots for industrial security and
application. In 2016 IEEE International Conference on Recent Trends in Electronics,
Information Communication Technology (RTEICT), pages 1757–1760, May 2016.
[4] S. Banerjee, E. Bozorgzadeh, and N. D. Dutt. Integrating Physical Constraints in
HW-SW Partitioning for Architectures with Partial Dynamic Reconfiguration. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, 14(11):1189–1202, Nov
2006.
[5] A. Becher, F. Bauer, D. Ziener, and J. Teich. Energy-aware SQL query acceleration
through FPGA-based dynamic partial reconfiguration. In 2014 24th International Con-
ference on Field Programmable Logic and Applications (FPL), pages 1–8, Sept 2014.
[6] SL Bishop, Suresh Rai, B Gunturk, Jerry L Trahan, and Ramachandran Vaidyanathan.
Reconfigurable implementation of wavelet integer lifting transforms for image compres-
sion. In ReConFig 2006., pages 1–9. IEEE, 2006.
[7] Christophe Bobda. Introduction to reconfigurable computing: architectures, algorithms,
and applications. Springer Science & Business Media, 2007.
[8] F. Chekired, A. Mellit, S.A. Kalogirou, and C. Larbes. Intelligent maximum power
point trackers for photovoltaic applications using FPGA chip: A comparative study.
Solar Energy, 101:83 – 99, 2014.
[9] S. Corbetta, M. Morandi, M. Novati, M. D. Santambrogio, D. Sciuto, and P. Spoletini.
Internal and External Bitstream Relocation for Partial Dynamic Reconfiguration. IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, 17(11):1650–1654, Nov
2009.
[10] V. R. Deskar, M. P. V. Kumar, P. Kumar, and M. S. Gururaj. Design of net meter
using FPGA. In 2016 IEEE International Conference on Recent Trends in Electronics,
Information Communication Technology (RTEICT), pages 322–326, May 2016.
[11] Ghada Dessouky and Ahmad-Reza Sadeghi. Poster: Exploiting dynamic partial re-
configuration for improved resistance against power analysis attacks on FPGAs. In
Proceedings of the 9th ACM Conference on Security &#38; Privacy in Wireless and
Mobile Networks, WiSec ’16, pages 223–224, New York, NY, USA, 2016. ACM.
[12] G. Forney. Generalized minimum distance decoding. IEEE Transactions on Information
Theory, 12(2):125–131, Apr 1966.
47
[13] A. Gregerson, A. Farmahini-Farahani, B. Buchli, S. Naumov, M. Bachtis, K. Compton,
M. Schulte, W. H. Smith, and S. Dasu. FPGA design analysis of the clustering algo-
rithm for the CERN Large Hadron Collider. In 2009 17th IEEE Symposium on Field
Programmable Custom Computing Machines, pages 19–26, April 2009.
[14] P. Greisen, M. Runo, P. Guillet, S. Heinzle, A. Smolic, H. Kaeslin, and M. Gross. Evalua-
tion and FPGA implementation of sparse linear solvers for video processing applications.
IEEE Transactions on Circuits and Systems for Video Technology, 23(8):1402–1407,
Aug 2013.
[15] H. M. Hussain, K. Benkrid, and H. Seker. Dynamic partial reconfiguration implemen-
tation of the svm/knn multi-classifier on FPGA for bioinformatics application. In 2015
37th Annual International Conference of the IEEE Engineering in Medicine and Biology
Society (EMBC), pages 7667–7670, Aug 2015.
[16] M. C. Jordan and R. Vaidyanathan. MU-Decoders: A class of fast and efficient con-
figurable decoders. In 2010 IEEE International Symposium on Parallel Distributed
Processing, Workshops and Phd Forum (IPDPSW), pages 1–4, April 2010.
[17] Matthew Collin Jordan. A configurable decoder for pin-limited applications. Master’s
thesis, Louisiana State University, 2006.
[18] Cindy Kao. Benefits of partial reconfiguration. Xcell journal, 55:65–67, 2005.
[19] P. N. Karthik and K. Suresh. Devise and establishment of property specification lan-
guage to verify the complex behaviour of FPGA ethernet IP core. In 2016 IEEE In-
ternational Conference on Recent Trends in Electronics, Information Communication
Technology (RTEICT), pages 763–768, May 2016.
[20] Hirak Kashyap and Ricardo Chaves. Compact and On-the-Fly Secure Dynamic Recon-
figuration for Volatile FPGAs. ACM Trans. Reconfigurable Technol. Syst., 9(2):11:1–
11:22, January 2016.
[21] Raghavendra Kongari. Cost and performance modeling of the MU-Decoder. Master’s
thesis, Louisiana State University, 2011, 2011.
[22] B. Koziel, R. Azarderakhsh, M. Mozaffari Kermani, and D. Jao. Post-quantum cryptog-
raphy on FPGA based on isogenies on elliptic curves. IEEE Transactions on Circuits
and Systems I: Regular Papers, 64(1):86–99, Jan 2017.
[23] J. Kumar, Shanmukha, Murali, J. Kumar, and R. Bhakthavatchalu. Design and im-
plementation of hodgkin and huxley spiking neuron model on FPGA. In 2016 IEEE
International Conference on Recent Trends in Electronics, Information Communication
Technology (RTEICT), pages 1483–1487, May 2016.
[24] Yufei Ma, N. Suda, Yu Cao, J. S. Seo, and S. Vrudhula. Scalable and modularized RTL
compilation of convolutional neural networks onto FPGA. In 2016 26th International
Conference on Field Programmable Logic and Applications (FPL), pages 1–8, Aug 2016.
48
[25] M. R. Maheshwarappa, M. D. J. Bowyer, and C. P. Bridges. Improvements in CPU
and FPGA performance for small satellite SDR applications. IEEE Transactions on
Aerospace and Electronic Systems, PP(99):1–1, 2017.
[26] P. Narapureddy, C. M. Ananda, B. P. Kumar, and E. P. J. Kumar. Design and im-
plementation of fiber channel based high speed serial transmitter for data protocol on
FPGA. In 2016 IEEE International Conference on Recent Trends in Electronics, Infor-
mation Communication Technology (RTEICT), pages 926–931, May 2016.
[27] David Ratter. FPGAs on Mars. Xcell J, 50:8–11, 2004.
[28] Kenneth H. Rosen. Discrete Mathematics and Its Applications. McGraw-Hill Higher
Education, 7th edition, 2012.
[29] J. Sarkhawas, P. Khandekar, and A. Kulkarni. Variable quality factor jpeg image com-
pression using dynamic partial reconfiguration and microblaze. In 2015 International
Conference on Computing Communication Control and Automation, pages 620–624, Feb
2015.
[30] A. C. Shettar, K. M. Sudarshan, and S. Rehman. FPGA design and implementation of
digital pwm technique for DC-DC converters. In IEEE International Conf. on Recent
Trends in Electronics, Information Communication Technology, pages 918–920, May
2016.
[31] S. Shreejith, K. Vipin, S. A. Fahmy, and M. Lukasiewycz. An approach for redundancy
in flexray networks using FPGA partial reconfiguration. In 2013 Design, Automation
Test in Europe Conference Exhibition (DATE), pages 721–724, March 2013.
[32] Nitin Srivastava, Jerry L Trahan, Ramachandran Vaidyanathan, and Suresh Rai. Adap-
tive image filtering using run-time reconfiguration. In Reconfigurable Architectures
Workshop, International Parallel and Distributed Processing Symposium, 2003.
[33] J. K. Toft and A. Nannarelli. Energy efficient FPGA based hardware accelerators for
financial applications. In 2014 NORCHIP, pages 1–6, Oct 2014.
[34] Y. Wang, Q. Liu, and A. E. Fathy. Cw and pulse doppler radar processing based on
FPGA for human sensing applications. IEEE Trans. Geoscience and Remote Sensing,
51(5):3097–3107, 2013.
[35] Xilinx Inc., Applications, 2017.
49
Vita
Utsav Agarwal was born on March 3 1989, in Kolkata India. He received his Elementary
education from Abhinav Bharati High School, following which he received the rest of his
schooling from The Assemply of God Church School. He joined Meghnad Saha Institute of
Technology in the field of Electronics and Communication Engineering and graduated as a
Bachelor of Technology in May 2012. After working for a couple of years India, he joined
Louisina State University, USA as a graduate student in Electrical and Computer Engineer-
ing Department. He is expected to receive his Master of Science in Electrical Engineering in
May 2017.
50
