FPGA Bit-stream Compression Using Run-length Encoding by P. M.Sandeep & C.S Manikandababu
International Journal of Electronics Communication and Computer Technology (IJECCT) 
Volume 3 Issue 2 (March 2013) 
 
ISSN:2249-7838                                                                                                                       IJECCT | www.ijecct.org  386 
 
FPGA Bit-stream Compression Using Run-length 
Encoding 
P. M.Sandeep 
P.G. Scholar ,M.E.VLSI Design 
Sri Ramakrishna Engineering College 
Coimbatore, India 
C.S Manikandababu 
Assistant Professor, M.E. VLSI Design 
 Sri Ramakrishna Engineering College 
Coimbatore, India 
 
Abstract— Reconfigurable system uses bit-stream compression 
to reduce the bit-stream size and the memory requirement. The 
communication  bandwidth  is  improved  reducing  the 
reconfiguration  time.  Existing  research  has  explored  efficient 
compression with slow decompression or fast decompression at 
the cost of compression efficiency. This paper proposes a decode-
aware compression technique to improve both compression and 
decompression efficiencies. The three major contributions of this 
paper are: i) Efficient bitmask selection technique that can create 
a large set of  matching patterns; ii) Proposes a bitmask based 
compression using the bitmask and dictionary selection technique 
that  can  significantly  reduce  the  memory  requirement  iii) 
Efficient  combination  of  bitmask-based  compression  and  run 
length encoding of repetitive patterns. 
Keywords-  Bitmask-based  compression;  decompression 
engine; Field-Programmable Gate Array (FPGA). 
I.   INTRODUCTION 
      FIELD-PROGRAMMABLE GATE ARRAYS (FPGA) are 
widely  used  in  reconfigurable  systems.  The  description  of  the 
logic circuit is entered using a hardware description language such 
as VHDL or Verilog. The logic design is drawn using a schematic 
editor. Logic synthesizer program is used to transform HDL or 
schematic into netlist. A netlist is a description of various logic 
gates in the design and their interconnections. The implementation 
tool  is  used  to  map  the  logic  gates  and  interconnections  into 
FPGA. The configurable logic block (CLB) in the FPGA contains 
look up tables (LUT’s) which performs the logic operations. The 
mapping  tool  collects  netlist  gates  into  groups  that  fit  into  the 
LUTs and then the place & route tool assigns the gate collections 
to  specific  CLBs  while  opening  or  closing  the  switches  in  the 
routing  matrices  to  connect  the  gates  together.  When  the 
implementation phase is complete, a program extracts the state of 
the  switches  in  the  routing  matrices  and  generates  a  bit-stream 
where the ones and zeroes correspond to open or closed switches. 
Since the configuration information for FPGA has to be stored in 
internal or  external  memory  as  bit-streams,  the  limited  memory 
size,  and  access  bandwidth  become  the  key  factors  in 
determining  the  different  functionalities  that  a  system  can  be 
configured and how quickly the configuration can be performed. 
It is quite costly to employ memory with more capacity and access 
bandwidth, bit-stream compression technique lessen the memory 
constraint by reducing the size of the bit-stream. The compressed 
bit-streams stores more configuration information using the same 
memory.  The  efficiency  of  bit-stream  compression  is  measured 
using Compression Ratio (CR). It is defined as the ratio between 
the compressed bit-stream size (CS) and the original bit-stream 
size (OS) (ie CR=CS/OS). A smaller compression ratio implies a 
better  compression  technique.  Among  various  compression 
techniques that has been proposed compression [5] seems to be 
attractive  for  bit-stream  compression,  because  of  its  good 
compression  ratio  and  relatively  simple  decompression  scheme. 
This approach combines the advantages of previous compression 
techniques  with  good  compression  ratio  and  those  with  fast 
decompression.  
II.  RELATED WORK 
     The existing bit-stream compression techniques can be 
classified  into  two  categories  based  on  whether  they  need 
special  hardware  support  during  decompression.  Some 
approaches  require  special  hardware  features  to  access  the 
configuration  memory,  like  wildcard  register,  partial 
reconfiguration or frame readback, which are provided only by 
certain  FPGAs.  For  example,  the  wildcard  compression 
scheme  is  developed  for  the  Xilinx  XC6200  series  FPGA, 
which  support  wildcard  registers.  Using  these  registers,  the 
same logic configuration can be written to multiple cells by a 
single operation. Pan et al. [1] used frame reordering and active 
frame readback to achieve redundancy. 
 The  difference  between  consecutive  frames  (difference 
vector)  is  encoded  using  either  Huffman-based  run  length 
encoding  or  LZSS-based  compression.  Such  sophisticated 
encoding  schemes  can  produce  excellent  compression. 
However,  the  decompression  overhead  in  [1],  is  a  major 
bottleneck in reconfigurable systems. 
In contrast, many bit-stream compression techniques only 
access  the  configuration  memory  linearly  during 
decompression, and  therefore  can  be  applied  to  virtually  all 
FPGAs. The basic idea behind most of these techniques is to 
divide  the  entire  bit-stream  into  many  small  words,  then 
compress  them  with  common  algorithms  such  as  Huffman 
coding,  arithmetic  coding  or  dictionary-based  compression. 
LZSS based. For instance, Xilinx [9] introduced a bit-stream 
compression algorithm based on LZ77 which is integrated in 
the System ACE controller. Huebner et al proposed an LZSS-
based  technique  for  Xilinx  Virtex  XCV2000E  FPGA.  The 
decompression  engine  is  designed  carefully  to  achieve  fast 
decompression.  Stefan  et  al.[11]  observed  that  simpler 
algorithms  like  LZSS  successfully  maintain  decompression 
overhead  in  an  acceptable  range  but  compromises  on 
compression  efficiency.  On  the  other  hand,  compression 
techniques using  complex algorithms can  achieve significant 
compression but incurs considerable hardware overhead during International Journal of Electronics Communication and Computer Technology (IJECCT) 
Volume 3 Issue 2 (March 2013) 
 
ISSN:2249-7838                                                                                                                       IJECCT | www.ijecct.org  387 
 
   
 
   
 
decompression. Unfortunately, the authors did not model the 
buffering circuitry of the decompression engine in their work. 
Hence  the  hardware  overhead  presented  for  some  variable-
length coding techniques may be inaccurate. 
To  increase  the  decompression  throughput  of  complex 
compression algorithms, parallel decompression can be used. 
Nikara  et  al.  [12]  improved  the  throughput  employing 
speculative  parallel  decoders.  Qin  et  al.  [13]  introduced  a 
placemen  technique  of  compressed  bit-streams  to  enable 
parallel decompression. However, since the structure of each 
decoder  and  buffering  circuitry  are  not  changed,  the  area 
overhead  is  also  multiplied. Most importantly, this approach 
does not reduce the speed overhead introduced by the buffering 
circuitry  for  VLC  bit-stream.  In  contrast,  our  proposed 
approach  will  significantly  improve  the  maximum  operating 
frequency  by  effectively  addressing  the  buffering  circuitry 
problem. 
III.  BACKGROUND AND MOTIVATION 
In this section, we briefly analyze the decompression hard- 
ware  complexity  of  common  variable-length  compression 
techniques.  This analysis forms the basis of our approach. In 
the following discussion, we use the term symbol to refer to a 
sequence  of  uncompressed  bits  and  code  to  refer  to  the 
compression result (of a symbol) produced by the compression 
While  compression  efficiency  is  straightforward  and  widely 
used  criteria  to  evaluate  compression  techniques,  the 
complexity of decompression hardware determines whether an 
algorithm with promising compression ratio can be applied to 
commercial  FPGAs.  Interestingly,  our  study  shows  that  the 
complexity  of  the  decompression  algorithm  is  not  the  only 
determining factor of the hardware complexity. 
IV.  DECODE-AWARE BIT-STREAM COMPRESSION 
    
 
 
Figure 1.   Decode-aware bit-stream compression. 
      Fig.1  [9]  shows  decode-aware  bit-stream  compression 
framework.  On  the  compression  side,  FPGA  configuration  bit-
stream  is  analyzed  for  selection  of  profitable  dictionary  entries 
and bitmask patterns. The compressed bit-stream is then generated 
using  bitmask-based  compression  and  run  length  encoding 
(RLE). Next, our decode-aware placement algorithm is employed 
to  place the  compressed bit-stream  in  the  memory for  efficient 
decompression.  During  run-time,  the  compressed  bit-stream  is 
transmitted from the memory to the decompression engine, and the 
original configuration bit-stream is produced by decompression. 
    Memory and communication bus are designed in multiple of 
bytes (8 bits), storing dictionaries or transmitting data other than 
multiple of byte  size  is not  efficient.  Therefore, we  restrict the 
symbol  length  to  be  multiples  of  eight  in  our  current 
implementation. Since the dictionary for bit-stream compression is 
smaller compared to the size of the bit-stream itself, we use to d=2
i 
to  fully  utilize  the  bits  for  dictionary  indexing,  where  i  is  the 
number of indexing bits. 
A.  Bitmask Selection 
     Bitmask is a pattern of binary values which is combined 
with some value using bitwise AND with the result that bits in 
the value in positions where the mask is zero are also set to 
zero. A bitmask might also be used to set certain bits using 
bitwise OR, or to invert them using bitwise exclusive or. This 
approach tries to incorporate maximum bit changes using mask 
patterns without adding significant cost (extra bits) such that 
the CR is improved. Our compression technique also ensures 
that the decompression efficiency remains the same compared 
to that of the existing techniques. Fig 3 [5] below represents 
compression  using  bitmask  selection.  The  bit-streams  which 
cannot  be  compressed  using  dictionary  selection  are 
compressed  by  bitmask  selection.  The  selection  of  bitmask 
plays an important role in bitmask-based compression.  
 
 
Figure 2.   Bit-stream compression using bitmask selection approach 
B.  Dictionary Selection 
     Dictionary-based code-compression techniques provide 
compression  efficiency  as  well  as  fast  decompression 
mechanism.  The  basic  idea  is  to  take  commonly  occurring 
instruction  sequences  by  using  a  dictionary.  The  repeating 
occurrences are replaced with a code word that points to the 
index  of  the  dictionary  that  contains  the  pattern.  The 
compressed  program  consists  of  both  code  words  and 
uncompressed  instructions.  Fig.2  [9]  shows  an  example  of 
dictionary  based  code  compression  using  a  simple  program 
binary. The binary consists of ten 8-b patterns, i.e., a total of 
80-b. The dictionary has two 8-b entries. The compressed bit-
streams requires 62 b, and the dictionary requires 16 b. In this 
case,  the  CR  is  97.5%.  The  bit-stream  CR  for  dictionary 
selection  is  large  therefore  it  does  not  yield  a  better 
compression technique. Therefore the bit-streams which cannot 
be compressed using dictionary selection can be compressed by 
bitmask selection which yields a smaller compression ratio. International Journal of Electronics Communication and Computer Technology (IJECCT) 
Volume 3 Issue 2 (March 2013) 
 
ISSN:2249-7838                                                                                                                       IJECCT | www.ijecct.org  388 
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
 
 
 
 
 
 
 
 
   
 
 
 
 
 
Figure 3.    Bit-stream compression using dictionary selection 
 
Figure 4.   RLE based compression 
C.  Run Length Encoding of Compressed Words 
 
The configuration bit-stream usually contains consecutive 
repeating  bit  sequences.  Although  the  bitmask-based 
compression [5] encodes such patterns using same repeated 
words, it is suggested in [2] and [4] that run length encoding 
(RLE)  of  these  sequences  may  yield  a  better  compression 
result. Interestingly, to represent such encoding no extra bits 
are needed. Note that bitmask value 0 is never used, because 
this  value  means  that  it  is  an  exact  match  and  would  have 
encoded using zero bitmasks. Using this as a special marker, 
these  repetitions  can  be  encoded  without  changing  the  code 
format of bitmask-based compression. 
     Fig 3 [9] illustrates the bitmask-based RLE. The input 
contains  word  “00000000”  repeating  five  times.  In  normal 
bitmask-based  compression  these  words  will  be  compressed 
with  repeated  compressed  words,  whereas  our  approach 
replaces  such  repetitions  using  a  bitmask  of  “00”.  In  this 
example,  the  first  occurrence  will  be  encoded  as  usual, 
whereas  the  remaining  4  repetitions  will  be  encoded  using  
RLE.  The number of repetition is encoded  as bitmask  offset 
and  dictionary  bits  combined  together.  In  this  example,  the 
bitmask offset is “10” and dictionary index is “0”. Therefore, 
the number of repetition will be “100” (i.e., 4). 
D. Decompression Engine 
     The decompression engine is a hardware component used to 
decode the compressed configuration bit-stream and feed the 
uncompressed bit-stream to the configuration unit in FPGAs. 
A decompression engine usually has two parts:  the  buffering 
circuitry  is  used  to  buffer  and  align  codes  fetched  from  the 
memory, while decoders perform decompression operation to 
generate original symbols. 
 
 
Figure 5.   Decompression Engine 
The  design of a  decompression engine  (DCE),  shown  in 
Fig.5  [9]  can  easily  handle  bit  masks  and  provide  fast 
decompression. The most important feature of decompression 
engine  is  the  introduction  of  XOR  gate  in  addition  to  the 
decompression scheme for dictionary based compression. The 
decompression  engine  generates  a  test  data  length  bitmask, 
which is then XOR ed with the dictionary entry. The test data 
length  bit  mask  is  created  by  applying  the  bitmask  on  the 
specified position in the encoding. The generation of bit mask 
is done in parallel with dictionary access, 
V.  CONCLUSION 
The existing compression algorithms either provide good 
compression with slow decompression or fast decompression 
at  the  cost  of  compression  efficiency.  In  this  paper,  we 
proposed a decoding-aware compression technique that tries to 
obtain both best possible compression and fast decompression 
performance.  The  proposed  compression  technique  analyzes 
the effect  of  parameters  on  compression  ratio  and  chooses 
the  optimal  ones  automatically.  We  also  exploit  run  length 
encoding    of    consecutive  repetitive  patterns  efficiently 
combined with bitmask-based compression to further improve 
both compression ratio  and  decompression efficiency.  
REFERENCES 
 
[1]  J.  H.  Pan,  T.  Mitra,  and  W.  F.  Wong,  “Configuration  bit-stream 
compression for dynamically reconfigurable FPGAs,” in Proc. Int. Conf. 
Comput.-Aided Des., 2004, pp. 766–773. 
[2]   S. Hauck and W. D. Wilson, “Runlength compression techniques for 
FPGA  configurations,”  in  Proc.  IEEE  Symp.  Field-Program.  Custom 
Comput. Mach., 1999, pp. 286–287. 
[3]  A. Dandalis and V. K. Prasanna, “Configuration compression for FPGA-
based embedded systems,” IEEE Trans. Very Large Scale Integr. (VLSI) 
Syst., vol. 13, no. 12, pp. 1394–1398, Dec. 2005. 
[4]  D. Koch, C. Beckhoff, and J. Teich, “Bit-stream decompression for high International Journal of Electronics Communication and Computer Technology (IJECCT) 
Volume 3 Issue 2 (March 2013) 
 
ISSN:2249-7838                                                                                                                       IJECCT | www.ijecct.org  389 
 
speed  FPGA  configuration  from  slow  memories,”  in  Proc.  Int.  Conf. 
Field-Program. Technol., 2007, pp. 161–168. 
[5]  S.  Seong  and  P.  Mishra,  “Bitmask-based  code  compression  for 
embedded systems,” IEEE Trans. Comput.-Aided Des. Integr. Circuits 
Syst., vol. 27, no. 4, pp. 673–685, Apr. 2008. 
[6]  S. Hauck, Z. Li, and E. Schwabe, “Configuration compression for the 
Xilinx  XC6200  FPGA,”    IEEE  Trans.  Comput.-Aided  Des.  Integr. 
Circuits Syst., vol. 18, no. 8, pp. 1107–1113, Aug. 1999. 
[7]  D.  A.  Huffman,  “A  method  for  the  construction  of  minimum-
redundancy codes,” Proc. IRE, vol. 40, no. 9, pp. 1098–1101, 1952. 
[8]  A. Moffat, R. Neal, and I. H. Witten, “Arithmetic coding revisited,” in 
Proc. Data Compression Conf., 1995, pp. 202–211. 
[9]  Xiaoke  Qin,  Chetan  Muthry,  and  Prabhat  Mishra,  “Decoding  Aware 
Compression of FPGA Bit-streams,” in Proc. Data Compression Conf., 
2011, pp. 411–419. 
 
 