


















A project report submitted in partial fulfilment of the 
requirements for the award of the degree of 





Faculty of Electrical Engineering 




























Dedicated to my beloved father, mother, brother, and all my friends for their 





















 First and foremost, I would like to express my earnest gratitude to my project 
supervisor, Professor Dr. Mohamed Khalil bin Mohd Hani for his benevolent 
guidance as well as continuous motivation given throughout this venture.  Without 
his supervision and constant support this project would not be accomplished 
successfully. 
 I would like to thank my parents, siblings and friends who have been truly 
supportive throughout my master study.  Their encouragements do mean a lot to me 
in overcoming all forms of hardships throughout the period of my study. 
In addition, I would like to take this opportunity to thank Intel Malaysia to 
support me financially in pursuing this master degree.  My sincere appreciation also 
goes to the faculty members for providing all the necessary resources to enable the 
work in this project.  Lastly, I would like to thank Lee Yee Hui for giving me 
guidance at the beginning of the project to help me in preparing the necessary 

















Two-dimensional convolution is a prevalent mathematical operation used in 
different areas of digital signal processing such as image processing, video 
processing and analog signal transmission.  The computation intensive nature of 2D 
convolution operation along with the stringent demand of real-time image processing 
in term of response time and throughput rate dismiss the viability of general-purpose 
processor to be used as part of the image processing solutions.  Thus, the design 
work of a fully-dedicated 2D convolution hardware based on systolic array 
architecture with integrated pipeline design is proposed in this project in order to 
achieve optimum hardware performance in term of processing time and throughput 
rate.  To achieve the objective, the entire hardware design is fully described in 
SystemVerilog and cut-set systolization procedure is applied to map 2D convolution 
algorithm to a 3 x 3 based systolic array hardware design.  Upon the end of design 
and integration, the accelerated 2D convolution hardware design goes through 
performance benchmark.  Based on the performance benchmark report, the 
implemented 2D convolution hardware is capable to achieve a throughput rate of 
168M outputs per second.  In addition, it takes 1.54 ms to complete the execution of 
2D convolution based on 512 x 512 grayscale image.  In comparison with general-
purpose processor, the implemented design outperforms general-purpose processor in 
term of execution speed by 43%.  The performance breakthrough marks an important 
milestone to the pipelined 2D convolution hardware design based on systolic array 












Convolusi dua dimensi adalah operasi matematik yang biasa dan luas 
digunakan dalam bidang pemprosesan isyarat digital seperti pemprosesan imej, 
pemprosesan video digital, dan penghantaran isyarat analog dan digital.  Oleh sebab 
operasi 2D convolusi memerlukan pengiraan yang amat intensif dan permintaan 
masa tindak balas yang ketat oleh pemprosesan imej secara langsung, unit pemproses 
umum yang biasa digunakan oleh komputer tidak dapat memenuhi keperluan dan 
spesifikasi yang dinyatakan oleh pemprosesan imej secara langsung.  Oleh itu, unit 
pemproses umum tidak lagi dipentingkan untuk kegunaan di dalam bidang 
pemprosesan imej.  Untuk menyelesaikan masalah ini, projek ini memainkan peranan 
penting untuk menghasilkan satu reka bentuk yang berdasarkan seni bina “systolic 
array” serta seni bina “pipeline” untuk mencapai prestasi yang paling optimum dari 
aspek masa pelaksanaan dan kadar hasilan.  Reka bentuk projek ini dicipta 
sepenuhnya berdasarkan SystemVerilog.  Prosedur “cut-set systolization” juga 
digunakan untuk pertukaraan algoritma 2D convolusi ke reka bentuk berdasarkan 
“systolic array” yang berbentuk 3 x 3 dimensi.  Setelah reka bentuk telah siap dicipta, 
reka bentuk tersebut telah menjalani satu ujian prestasi.  Berdasarkan laporan prestasi, 
reka bentuk projek ini berjaya mencapai kadar hasilan sebanyak 168M hasil/saat.  
Selain itu, reka bentuk ini memakan masa sebanyak 1.54ms untuk satu operasi 2D 
convolusi berdasarkan imej yang berdimensi 512 x 512.  Berbanding dengan 
pemproses umum, reka bentuk tersebut boleh berfungsi lebih laju daripada unit 
pemproses umum sebanyak 43 peratus.  Prestasi yang amat kagum ini telah 
membuktikan bahawa ciptaan ini memainkan peranan yang amat penting untuk 
kegunaan di dalam bidang pemprosesan imej secara langsung pada masa yang akan 
datang. 
 
 
 
