A Fast Fractal Image Compression Algorithm Combined with Graphic Processor Unit by Guo, Hui & He, Jie
TELKOMNIKA, Vol.13, No.3, September 2015, pp. 1089~1096 
ISSN: 1693-6930, accredited A by DIKTI, Decree No: 58/DIKTI/Kep/2013 
DOI: 10.12928/TELKOMNIKA.v13i3.1776   1089 
  
Received March 23, 2015; Revised July 8, 2015; Accepted July 25, 2015 
A Fast Fractal Image Compression Algorithm 
Combined with Graphic Processor Unit 
 
 
Hui Guo*, Jie He  
School of Information and Electronic Engineering, Wuzhou University, Wuzhou543002, Guangxi, China 
*Corresponding author, e-mail: guohui928@qq.com 
 
 
Abstract 
Directed against the characteristics of computational intensity of fractal image compression 
encoding, a serial-parallel transfer mechanism is built for encoding procedures. By utilizing the properties 
of single instruction and multithreading execution of compute unified device architecture (CUDA), the 
parallel computational model of fractal encoding is built on the graphic processor unit (GPU) in order to 
parallelize the considerably time-consuming serial execution process of searching for the block of best 
match. The experimental result indicates, the algorithm in this paper shortens the encoding time to the 
millisecond scale and significantly boosts the execution efficiency of fractal image encoding algorithm while 
keeping the decoded image in good quality. 
 
Keywords: fractal image compression, graphic processor unit, compute unified device architecture, 
parallel computing 
 
Copyright © 2015 Universitas Ahmad Dahlan. All rights reserved. 
 
 
1. Introduction 
Fractal image encoding is a compression method within the spatial domain with high 
compression ratio and high decoding quality. However, the unnecessarily long encoding time 
has limited its popularization and application. In order to reduce the encoding time, a lot of 
scholars have raised the feature classification method or method of clustering to reduce the 
search time for matching blocks [1-4]. Jiang Zheng et al. [5] proposed a K-mean clustering 
optimization based fractal encoding. Against the problem that the K-mean clustering fractal 
encoding algorithm depends on data distribution, Wu Yiquan and Sun Ziyi [6] proposed an 
immune particle swarm optimization (PSO) and kernel fuzzy clustering based method, which 
can achieve an acceleration of 6 times as compared with the basic fractal encoding algorithm. 
Hui Guo et al. [7] have modified the quadtree fractal encoding method in combination with the 
human visual system, in order to control the distortion within the range beyond human eye 
recognition when decoding, but the acceleration effect is no more than 27 times. Since the 
fractal encoding renders the typical characteristic of serial execution, the matching procedure is 
to implement global or classified local search of D-block pool for each R block one by one, 
which can be viewed as serial repetitive execution over the same procedures, so the encoding 
is rather time-consuming. Parallelized execution of these procedures would be a feasible 
optimization method. Especially, given the availability of a lot of hardware with parallel 
computation structure nowadays, the encoding speed will promote significantly if the fractal 
encoding can be integrated with a certain parallel hardware with high popularity and low cost to 
establish a corresponding implementation mechanism.  
The research of the above scholars is to shorten the encoding time in the dimension of 
optimization of encoding algorithm. The fractal encoding renders a typical characteristic of serial 
execution. The matching procedure is to implement global or classified local search of D-block 
pool for each R block one by one, which can be viewed as serial repetitive execution over the 
same procedures. Thus, to parallelize these procedures would be a feasible optimization 
method. Especially, given the availability of a lot of hardware with parallel computation structure 
nowadays, the encoding speed will promote significantly if the fractal encoding can be 
integrated with a certain parallel hardware with high popularity and low cost to establish a 
corresponding implementation mechanism. The image processor graphic processor unit (GPU) 
has large quantities of parallel hardware arithmetic units which are applicable to parallel 
                   ISSN: 1693-6930 
TELKOMNIKA  Vol. 13, No. 3, September 2015 :  1089 – 1096 
1090
computation of multiple data objects. The compute unified device architecture compute unified 
device architecture (CUDA) is a new type of software architecture and programming model for 
handling and managing GPU computation. With single instruction and multidata execution 
modes, it can utilize CPU to process the sequential portion of applications, and at the same time 
perform parallel execution of the compute-intensive portion on GPU via API with thread as the 
basic unit [8]. 
This paper offers a fast fractal image compression algorithm built on the base of GPU 
and utilizing CUDA for parallel encoding. This parallel encoding method comprises of three 
components: the 4-neighborhood average method adopted for space compression of the 
domain block, preprocessing of range and domain blocks, and computation of minimum mean 
square error. The process of space compression of the domain block begins with the use of a 
parallel execution scheme, namely each thread of GPU performs the average sampling job of 
one domain block. On the preprocessing stage, each thread of GPU will figure out the sum of 
pixels and sum of squares of pixels, respectively, for each range block and the searched 
domain block. In the computation of minimum mean square error, each range block will have a 
corresponding standalone thread for affine transformation and solution to minimum mean 
square error. The experimental result indicates the algorithm in this paper can speed up 120 
and more times as compared with the traditional fractal compression method while keeping the 
decoded image in good quality.  
 
 
2. Traditional Fractal Encoding Method 
Mandelbrot first raised the fractal image was an iterated function system [9]. He 
believed that many matters in the natural world had similar parts, and pointed out a fractal cloud 
could be described by a simple mathematical function. In 1988, Barnsly and Sloan raised the 
fractal image compression method, utilizing the image’s local self-similarity for compression 
[10]. The practical fractal blocked encoding put forward by Barnsley’s doctoral student Jacquin 
was exactly developed on this base [11]. The fractal encoding method shall first partition an 
image into non-overlapping R×R blocks and possibly overlapping D×D blocks, which are called 
range blocks (R blocks) and domain blocks (D blocks). The size of domain blocks must be 
greater than that of range blocks. The following step is to perform average sampling of the 
domain blocks so as to accord the size of domain blocks with that of range blocks. All the 
domain blocks can be saved in the domain pool SD.  
A N×N sized image can be partitioned into i domain blocks and j range blocks, where 
i=0,1,2,…(N-2R+1)2， j=0,1,2,…,(N/R)2. Search for the domain block of best match from the 
domain pool SD by the norm of minimum square error. The affine transformation formula is 
shown as Formula 1.  
 


































ixyixy oD
y
x
s
dc
ba
R
y
x
0
0
00
0
0
        (1) 
 
Where x = 0 ,1 ,2 ,…, R ，−1 y = 0 ,1 ,2 ,…, R−1, xyR and xyD are the values of pixels 
within range block jR and domain block iD . Parameters a, b, c and d are used for 8 isometric 
transformations of pixel: 4 rotations and 4 reversals, as shown in Figure 1. iS and iO are the 
contrast and brightness adjustment coefficients, respectively, in the process where domain 
block iD is matched with range block jR . The computational formulas for iS and iO are shown 
as Formula 2 and Formula 3, respectively.  
TELKOMNIKA  ISSN: 1693-6930  
 
A Fast Fractal Image Compression Algorithm Combined with GPU (Hui Guo) 
1091
 
 
Figure 1. 8 Isometric Transformations 
 
      ]'[)'( ]'[' 22
2
  
   

 R
x
R
y xy
R
x
R
y xy
R
x
R
y xy
R
x
R
y xy
R
x
R
y xyxy
i
DDR
RDRDR
S
    
(2) 
 
2
'
R
DSR
O
R
x
R
y xyij
R
x
R
y xy
i
   
       
(3) 
 
D'xy is the pixel value in correspondence to the domain blocks via the 8 isometric 
transformations. The root-mean-square, RMSi-j, in the process where domain block iD is 
matched with range block jR can be figured out as Formula 4. The minimum root-mean-square 
minRMS is shown as Formula 5. 
 
  2/1221 


    R
x
R
y
xyjxyjji RoDsR
RMS        (4) 
 
))/(,...,2,1,0
,)12,...(2,1,0,min(
2
2
min
RNj
RNiRMSRMS

      (5) 
 
Suppose I is the original image, R is the size of range block, D is the size of domain block. The 
concrete algorithmic steps are shown as the following:  
Step 1: Partition the original image I into non-overlapping range blocks jR  with the size 
of R×R.  
Step 2: Partition the original image I into possibly overlapping domain blocks iD with 
the size of D×D. 
Step 3: Perform average sampling of the domain blocks so as to accord their size with 
that of the range blocks. 
Step 4: For each range block jR , find the corresponding domain block iD  from the 
domain pool SD. Make sure if the difference of mean square errors between jR  and iD  is the 
minimum after the affine transformation over iD , then this iD  is the block of best match for jR .      
Step 5: For each range block jR , record the fractal code (IFS code) constituted by 
transformation library wi(i, n, si, oi): 
(1) The number i of block iD of best match; 
(2) Turn jR and iD into a number n (n ranges from 0 to 7) of isometric transformations; 
(3) Contrast adjustment coefficient iS  and brightness adjustment coefficient iO . 
 
 
 
                   ISSN: 1693-6930 
TELKOMNIKA  Vol. 13, No. 3, September 2015 :  1089 – 1096 
1092
3. GPU Combined Parallel Fractal Encoding Algorithm 
CUDA is a new hardware and software architecture for handling and managing GPU 
computation. Applications can handle the section of sequential execution via CPU and perform 
parallel execution of the compute-intensive section on GPU via relevant API to CUDA, thereby 
give more full play to the large-scale concurrent computing power of display card. While the 
program is running, the concurrent processing section in CUDA program is performed by the 
kernel function. The basic unit of the kernel function running on GPU is thread. CUDA may 
produce a lot of concurrent threads at different addresses. These threads execute the kernel 
function and implement parallel processing of data. Figure 2 demonstrates the basic execution 
principle of CUDA. The program code developed by CUDA programming comprises of two 
sections: Host code and the device code. The host code is the serial processing running on 
CPU, whereas the device code is the parallel processing running on the display chip GPU. The 
host code takes the typical and principal charge of scheduling the overall and strongly logical 
serial operations, such as initialization of GPU and data exchange, etc., while the device code is 
mainly responsible for parallel data processing with high degree of parallelization in the 
program.  
 
 
 
 
Figure 2. Execution Principle of CUDA 
 
 
The traditional fractal image compression method is considerably time-consuming 
because of the great amount of calculation in the process of searching the block of match for 
the range blocks. In order to speed up the searching, this paper raises a GPU based fast fractal 
image compression algorithm using CUDA for parallel encoding. In the following such parallel 
execution mechanism is demonstrated via Figure 3, in which T1, T2, ... are threads on GPU, 
sumiD _ and powersumiD _  are the sum of pixels and sum of squares of pixels of domain blocks iD
, sumjR _ and powersumjR _  are the sum of pixels and sum of squares of pixels of range blocks jR . 
The fractal image compression method of such parallel processing falls into three steps: 
Average sampling of domain blocks, preprocessing of range blocks and domain blocks, and 
computation of minimum mean square error.  
The Average-Sample in Figure 3 is average sampling. Prior to searching for the 
matched, a 4-neighborhood pixel means processing shall be performed over the D blocks within 
the D block pool. The computation work of this part is equally distributed over all threads. The 
processed data are saved into the texture memory. The concrete process for 4-neighborhood 
average is shown as Figure 4. 
TELKOMNIKA  ISSN: 1693-6930  
 
A Fast Fractal Image Compression Algorithm Combined with GPU (Hui Guo) 
1093
 
 
Figure 3. Parallel Processing on GPU 
 
 
 
Figure 4. 4-Neighborhood Average Method 
 
 
Precompute is the preprocessing. After conducting sub-sampling over the D blocks, 
each thread on GPU is allocated to precompute the sumjR _ value and powersumjR _ value of each 
range block, as well as the sumiD _ value and powersumiD _ value of each domain block, with the 
respective computational formula shown as follows:  
 
 R
x
R
y
xysumj RR _
     
(6) 
 
  R
x
R
y
xypowersumj RR
2
_
     
(7) 
 
 R
x
R
y
xysumi DD _
     
(8) 
 
   R
x
R
y
xypowersumi DD
2
_
     
(9) 
 
Computation of the minimum root-mean-square error. Each R block shall be equally 
distributed to each thread for search and match of D blocks in the D block pool, for computation 
of affine transformation, and for generation of RMS. Finally, RMSmin is found out by comparing 
                   ISSN: 1693-6930 
TELKOMNIKA  Vol. 13, No. 3, September 2015 :  1089 – 1096 
1094
the generated RMS. The D block in correspondence to this RMSmin is just the searched block of 
match. Record its corresponding IFS code wi(i, n, si, oi) and perform quantization coding to get 
the fractal code of each range block. The concrete matching process is shown in Fig 5, where 
T0, T1, T2, ... are threads on GPU.  
The following is the concrete steps of fractal image encoding through which the CUDA 
structure is utilized for parallel encoding:  
Input: For an N×N sized grayscale image, the grayscale of pixels is quantized by 8 bits, 
N is generally a power of 2.  
Output: Fractal code, namely wi(i, n, si, oi).  
Step 1: Read in the image data at the CPU side. Partition the original image I into non-
overlapping range blocks jR with the size of R×R; partition the original image I into possibly 
overlapping domain blocks iD  with the size of D×D. And transmit these data into the device’s 
memory.  
Step 2: Perform average sampling to constitute a codebook. Allocate the average 
sampling work of each domain block to each thread, namely assign one thread to perform the 
sub-sampling work of one domain block. 
Step 3: Processing of the domain blocks and range blocks, namely compute sumjR _ , 
powersumjR _ , sumiD _ and powersumiD _ . Allocate the preprocessing of each domain block and range 
block to each thread to perform the computation work.  
Step 4: Computation of mean square error RMS. In the kernel function, distribute the 
range blocks equally into each thread for the match with all the domain blocks in the domain 
pool, and for affine transformation and RMS computation. Finally, find out RMSmin and record 
the fractal code wi (i, n, si, oi) in correspondence to this RMSmin. 
Step 5: Transmit the fractal code from the device end to the CPU end. 
Step 6: Output the fractal code wi (i, n, si, oi). 
 
 
 
 
Figure 5. Matching Process between Range Blocks and Domain Blocks 
 
 
4. Experimental Result Analysis 
To verify the validity of the algorithm, this paper adopts four standard test images - 
256×256×8 Lena, Pepper, Cat and Cell - for a test. These 4 images are of representative 
significance in terms of equilibrium and change of texture and marginal details, and are well 
capable of testing various image processing algorithms. For all the images to be tested, the 
sizes of range blocks are set as 4×4, the sizes of domain blocks as 8×8. The development 
environment of program is Microsoft Visual Studio 2010+ CUDA6.0+Opencv2.3, the system is 
64-bit Windows 7, the memory is 4GB, the display card is Nvidia Geforce GT 630M, and the 
CPU is Core I3. In the following the traditional fractal encoding algorithm and the GPU 
combined parallel fractal encoding algorithm are adopted separately to encode these four 
images. While decoding, a blank matrix is created. Via 9 iterations, the original images can be 
approximated. The experimental result is shown as follows:  
 
 
TELKOMNIKA  ISSN: 1693-6930  
 
A Fast Fractal Image Compression Algorithm Combined with GPU (Hui Guo) 
1095
 
(a) Decoded Images by Traditional Algorithm 
 
(b) Decoded Images by Parallel Algorithm 
 
Figure 6. Effects of Decoded Images by Two Algorithms 
 
 
Figure 6 shows, from naked eyes, the decoded images by the traditional fractal image 
encoding algorithm and by the GPU combined parallel fractal image encoding algorithm have 
achieved the coincident effects, which demonstrates the feasibility of the algorithm in this paper. 
The following Table 1 provides the time (unit: s) of encoding, the PSNR value (unit: dB) of 
decoded images and the speed-up ratio of both algorithms.  
 
 
Table 1. Experimental Data by Using Both Algorithms 
Images 
Traditional Algorithm            Parallel Algorithm 
Speed-up Ratio  T            PSNR            T             PSNR  
Lena  29.13    31.46            0.243         31.46    120 
Pepper 29.11     32.00            0.237         32.00 123 
Cat 29.00     36.98            0.252         36.96 115 
Cell 29.06     34.92            0.245         34.94 119 
 
 
From the data in Table 1, the encoding time by using the GPU combined parallel fractal 
encoding method is significantly shorter than that by the traditional fractal encoding algorithm; 
the maximum speed-up ratio can achieve 123 times, and the decoded images can be retained 
in good quality. Therefore, it is feasible to utilize GPU for parallelization execution of the 
encoding process of fractal image compression in combination with CUDA, which attaches vital 
significance to popularization and application of fractal image compression encoding.  
 
 
5. Conclusion 
This paper has raised a fast fractal image compression algorithm utilizing CUDA on 
GPU. Such parallel fractal image compression method falls into three steps: Average sampling 
of domain blocks, preprocessing of range blocks and domain blocks, computation of minimum 
mean square error. The experiment evinces the algorithm in this paper can achieve an 
acceleration of 123 times as compared with the traditional fractal image compression method, 
and keep the decoded images in good quality. Further development is anticipated in the future 
                   ISSN: 1693-6930 
TELKOMNIKA  Vol. 13, No. 3, September 2015 :  1089 – 1096 
1096
work to use GPU to optimize CPU codes so as to make fractal image compression real-time. 
Furthermore, these methods will be used in other fields like dynamic image encoding, etc.  
 
 
Acknowledgements  
This work was supported by Guangxi Natural Science Foundation Program (Grant No. 
2013GXNSFBA019275, Grant No. 2013GXNSFBA019276) and Guangxi University of Science 
and Technology Research Program (Grant No.2013YB227, Grant No.2013YB228). 
 
 
References 
[1] Bo Wang, Yubin Gao. An Image Compression Scheme Based on Fuzzy Neural Network. 
TELKOMNIKA Telecommunication Computing Electronics and Control. 2015; 13(1): 137-145 
[2] Mohsen Nasri, Abdelhamid Helali, Halim Sghaier, Hassen Maaref. Efficient JPEG2000 Image 
Compression Scheme for Multihop Wireless Networks. TELKOMNIKA Telecommunication Computing 
Electronics and Control. 2011; 9(2): 311-318. 
[3] Cui Xin-Xia, Luo Chen-Xu. Fractal and Chaos Characteristics in Rock Milled Process. TELKOMNIKA 
Indonesian Journal of Electrical Engineering. 2014; 12(1): 530-538. 
[4] Yi Li, Hongchan Zheng, Guohua Peng, Min Zhou. Normal Vector Based Subdivision Scheme to 
Generate Fractal Curves. TELKOMNIKA Indonesian Journal of Electrical Engineering. 2013; 11(8): 
4273-4281. 
[5] Jiang Zheng, Jiang Mingyan. A Fast Fractal Image Compression Algorithm Based on K-mean 
Clustering Optimization. Journal of Electrical& Electronic Education. 2006; 36(03): 22-25. 
[6] Wu Yiquan, Sun Ziyi. Fast Fractal Image Coding Based Immunity Partice Swarm Optimization and 
Fuzzy Kernel Clustering. Journal of Beijing University of Posts and Telecommunications. 2011; 
34(01): 69-74. 
[7] Hui Guo, Yunping Zheng, Jie He. A New HVS-Based Fractal Image Compression Algorithm. Lecture 
Notes in Electrical Engineering. 2012; 138(2): 753-759. 
[8] Ma Wei-wei, SUN Dong, WU Xian-liang. Research on high-order SFDTD parallel computing based on 
GPU. Journal of Hefei University of Technology(Natural Scicence). 2012; 35(7): 926-929. 
[9] BB Mandelbrot. The Fractal Geometry of Nature. Second edition. WH Freedman, New York. 1982. 
[10] M. Barnsley and A. Sloan, A better way to compress images, BYTE, 1988, no.1,215-223  
[11] AE Jacquin. Image coding based on a fractal theory of iterated contractive image transformations. 
IEEE Trans. Image Processing. 1992; 1(1): 18-30. 
 
