Realnočasna obdelava slikovnega podatkovnega toka by HUDOMALJ, UROŠ
 
 
University of Ljubljana 
Faculty of Electrical Engineering 
Uroš Hudomalj 
Real-Time Processing of Image Streams 
Master’s thesis 
2nd cycle postgraduate study programme in Electrical Engineering 
Supervisors: 
Dr.-Ing. Markus Plattner, Technical University of Munich 
M. Sc. Christopher Mandla, Max Planck Institute for extraterrestrial 
Physics 





Throughout the making of this thesis I was lucky to have received a great deal 
of support and guidance. First of all, I would like to thank my supervisors at the 
Technical University of Munich and the Max Planck Institute for extraterrestrial 
Physics, Dr. Markus Plattner and M. Sc. Christopher Mandla. They were always 
available to discuss any issues I might have stumbled upon and have helped to steer 
me in the right direction. I am thankful for all their comments and advice about the 
thesis. I would also like to thank my supervisor at the Faculty of Electrical 
Engineering, University of Ljubljana, Prof. Dr. Tadej Tuma, for his guidance and wise 
counsel. 
Moreover, I would like to thank the Likar Foundation whose scholarship 
provided me peace of mind while making the thesis during the study exchange in 
Munich. The study exchange was part of the Erasmus+ Programme of the European 
Union. 
In addition, I would like to express my sincere gratitude to my parents and my 
brother Marko for their everyday support and professional counsel throughout my 
years of study. My profound gratitude also goes to my girlfriend. Teodora, thank you 
for your unwavering support and continuous encouragement through the making of 
this thesis.  
Finally, a sincere thank you also to all the others who have helped me in 
countless ways with my thesis, the study exchange and life in Munich. I would 
especially like to thank my friend, Hayden Wen, for proofreading the thesis. This 





1 Introduction ......................................................................................................... 25 
2 System Architecture ............................................................................................ 29 
2.1 System Overview .......................................................................................... 29 
2.2 Camera .......................................................................................................... 29 
2.3 Development Board ...................................................................................... 30 
2.3.1 Field Programmable Gate Array .............................................................. 31 
2.3.2 Double Data Rate Memory ...................................................................... 33 
3 Camera Interfaces ............................................................................................... 35 
3.1 General Purpose I/F Used By Camera I/F .................................................... 35 
3.1.1 Low Voltage Differential Signaling ........................................................ 35 
3.1.2 Transition Minimized Differential Signaling .......................................... 36 
3.1.3 Channel Link............................................................................................ 36 
3.1.4 Universal Serial Bus ................................................................................ 37 
3.1.5 Inter-Integrated Circuit ............................................................................ 38 
3.1.6 D-, C- and M-PHY................................................................................... 38 
3.1.7 UniPro ...................................................................................................... 38 
3.2 HDMI ............................................................................................................ 39 
3.3 USB3 Vision ................................................................................................. 39 
3.4 Camera Serial Interface................................................................................. 39 
3.5 GigE Vision .................................................................................................. 40 
3.6 CoaXPress ..................................................................................................... 40 
3.7 Camera Link ................................................................................................. 41 
3.8 Camera Link HS ........................................................................................... 43 
3.9 Summary of Camera Interfaces .................................................................... 43 
6 Contents 
 
4 Image Processing Algorithms ............................................................................. 45 
4.1 Image Filtering .............................................................................................. 45 
4.2 Edge Detection .............................................................................................. 51 
4.3 Background Subtraction ................................................................................ 54 
4.4 Flat Field Correction ..................................................................................... 54 
4.5 Image Averaging ........................................................................................... 55 
5 System Design ...................................................................................................... 57 
5.1 Camera Link Interface .................................................................................. 58 
5.1.1 Channel Link Transmitter ........................................................................ 58 
5.1.2 Channel Link Receiver ............................................................................ 60 
5.1.3 Camera Link Transmitter ......................................................................... 63 
5.1.4 Camera Link Receiver ............................................................................. 64 
5.2 Image Processing Algorithms ....................................................................... 65 
5.2.1 Algorithms Using Internal Memory ......................................................... 66 
5.2.1.1 Storing and Accessing Pixels in SRAM ........................................... 66 
5.2.1.2 Image Filtering ................................................................................. 72 
5.2.1.3 Edge Detection ................................................................................. 74 
5.2.2 Algorithms Using External Memory ....................................................... 75 
5.2.2.1 Storing and Accessing Pixels in DDR Memory ............................... 76 
5.2.2.2 Background Subtraction ................................................................... 82 
5.2.2.3 Flat Field Correction ......................................................................... 84 
5.2.2.4 Image Averaging .............................................................................. 87 
5.2.2.5 Two-Dimensional (Inverse) Fast Fourier Transform ....................... 90 
6 System Verification ............................................................................................. 93 
6.1 Verification of Camera Link Interface .......................................................... 93 
6.1.1 Channel Link Transmitter ........................................................................ 93 
6.1.2 Channel Link Receiver ............................................................................ 94 
6.1.3 Camera Link Transmitter ......................................................................... 94 
6.1.4 Camera Link Receiver ............................................................................. 96 
6.2 Verification of Image Processing Algorithms .............................................. 97 
6.2.1 Image Filtering ......................................................................................... 98 
6.2.2 Edge Detection ......................................................................................... 99 
Contents 7 
 
6.2.3 Background Subtraction ........................................................................ 100 
6.2.4 Flat Field Correction .............................................................................. 101 
6.2.5 Image Averaging.................................................................................... 104 
6.3 System Demonstration ................................................................................ 105 






Figure 2.1: System overview...................................................................................... 29 
Figure 2.2: Connection between camera and PC. ...................................................... 30 
Figure 2.3: Example of architecture of a SmartFusion2 M2S050 SoC FPGA [41]. .. 32 
Figure 3.1: Representation of signals on LVDS line. ................................................ 36 
Figure 3.2: Timing diagram of one data line and clock of Channel Link. ................. 37 
Figure 3.3: Difference between Channel Link generations [67]. ............................... 37 
Figure 3.4: Structural comparison between CSI-2 and CSI-3 [92]. ........................... 40 
Figure 3.5: Connection of a camera and a frame grabber with Camera Link I/F for 
either base, medium, full or 72 bit configuration. Connections of individual 
signals for mono color 8-bit mode to Channel Link chips (squares) are depicted 
as well. Note: Px denotes a pixel of bit depth of 8, where x is 0-8, and signal 
“S” denotes spare synchronization bit. ............................................................. 42 
Figure 4.1: Applying a simplified spatial filter. ......................................................... 47 
Figure 4.2: Image of planet Saturn with its six moons taken during Cassini space 
mission [111]. Note that the image was transformed to grayscale and 
downscaled to the size of 1216 x 1936. The positions of Saturn’s moons are 
highlighted in red. ............................................................................................. 48 
Figure 4.3: Magnitude spectrum of the image from Figure 4.2. ................................ 48 
Figure 4.4: Spatial representation of a Gaussian LPF with standard deviation of 4.6. 
The centered values are enlarged. ..................................................................... 49 
Figure 4.5: Frequency representation of a Gaussian LPF with the standard deviation 
of 4.6 in the spatial domain .............................................................................. 50 
Figure 4.6: Result of filtering the image with the LPF. ............................................. 51 
Figure 4.7: Frequency representation of the image filtered with a LPF. ................... 51 
Figure 4.8: Prewitt operator masks for horizontal and vertical directions respectively.
 .......................................................................................................................... 52 
Figure 4.9: Detected edges of the image from Figure 4.2 using Prewitt operator. .... 53 
Figure 4.10: Detected edges of the image from Figure 4.6 using Prewitt operator. .. 53 
10 Figures 
 
Figure 4.11: Magnitude of the frequency response of the simplified (red) and original 
non-simplified (blue) running average filters of length 8. ................................ 56 
Figure 5.1: Top schematic of Channel Link transmitter. ........................................... 58 
Figure 5.2: Schematic of PISO register. ..................................................................... 59 
Figure 5.3: Code snippet of clock generator of the transmitter. ................................. 59 
Figure 5.4: Output signals of a standard Channel Link chip [63]. ............................. 60 
Figure 5.5: Top schematic of Channel Link receiver. ................................................ 60 
Figure 5.6: Schematic of SIPO register. ..................................................................... 61 
Figure 5.7: Relationship between clocks and data of Channel Link receiver: (I) 
misalignment between the clock clk_par and its data signal clk_par_data, (II) 
defined phase between the clocks clk_par and clk_ser to sample data at the 
center of the eye diagram, and (III) misalignment between clk_par and the start 
of the first bit of the incoming data. .................................................................. 61 
Figure 5.8: Comparison of received clock signal with the expected clock pattern. ... 62 
Figure 5.9: Extracting data bits based on clock pattern match. .................................. 62 
Figure 5.10: Schematics of Camera Link transmitter. ............................................... 64 
Figure 5.11: Schematics of Camera Link receiver. .................................................... 65 
Figure 5.12: Buffering of incoming image. Set of 8 pixel currently arriving from the 
camera are highlighted in green. The pixels which can be processed are marked 
with yellow and the pixels stored in memory with red. .................................... 67 
Figure 5.13: Circuit for storing an incoming set of pixels and reading out three sets of 
pixels at the same columns of three consecutive rows. .................................... 68 
Figure 5.14: Input and output signals of circuit from Figure 5.13 through time. At 
time n the incoming set of 8 pixels is denoted as P[n][0:7] and is highlighted in 
green. The outputs at time n are highlighted in blue. ....................................... 69 
Figure 5.15: Example of output pixels of circuit from Figure 5.13. .......................... 70 
Figure 5.16: Circuit for storing previous values of the read outputs of circuit from 
Figure 5.13. In case pixels at the boundaries of the frame are being processed it 
replaces the pixels outside the frame with zeros. .............................................. 71 
Figure 5.17: Full circuit for image processing with a filter mask. ............................. 72 
Figure 5.18: Component “pixels_processing” which processes each pixel in parallel.
 .......................................................................................................................... 73 
Figure 5.19: Component for processing a single pixel. .............................................. 73 
Figure 5.20: Component “calculation” of image filtering circuit. ............................. 74 
Figure 5.21: Component “calculation” of edge detection circuit. .............................. 75 
Figure 5.22: Circuit for accessing DDR memory. ..................................................... 76 
Figure 5.23: Circuit for initialization of FDDR controller. ........................................ 77 
Figure 5.24: State machine of write channel of AXI master. ..................................... 77 
Figures 11 
 
Figure 5.25: State machine of read address channel of AXI master. ......................... 78 
Figure 5.26: State machine of read data channel of AXI master. .............................. 79 
Figure 5.27: Circuit for maximizing DDR memory access throughput. .................... 80 
Figure 5.28: Simulation of circuit for increased DDR memory throughput. ............. 81 
Figure 5.29: Circuit for background subtraction. ....................................................... 83 
Figure 5.30: VHDL code snippet for background subtraction processing................. 83 
Figure 5.31: Simulation of background subtraction circuit. ...................................... 84 
Figure 5.32: Circuit for flat field correction............................................................... 85 
Figure 5.33: Simulation of flat field correction circuit. ............................................. 86 
Figure 5.34: Circuit for calculation of flat field corrected image. ............................. 86 
Figure 5.35: Direct implementation of image averaging algorithm. .......................... 88 
Figure 5.36: Resource efficient implementation of image averaging algorithm. ...... 88 
Figure 5.37: Modified implementation of image averaging algorithm. ..................... 89 
Figure 5.38: Full circuit of image averaging. ............................................................. 90 
Figure 6.1: Simulation results of developed Tx component. The read lines mark one 
parallel clock cycle. .......................................................................................... 93 
Figure 6.2: Channel Link loopback test using Camera Link FMCs and cable. ......... 94 
Figure 6.3: Reference image from MATLAB for Camera Link transmitter 
verification. ....................................................................................................... 95 
Figure 6.4: Received image by PC for Camera Link transmitter verification. .......... 95 
Figure 6.5: Schematic of FPGA receiving the Camera Link signal from camera and 
then transmitting it to PC. ................................................................................. 96 
Figure 6.6: Received image from the camera through the FPGA. ............................. 96 
Figure 6.7: Verification flow of implementations of image processing algorithms. . 97 
Figure 6.8: Image processed by the implemented circuit for image filtering (top left), 
the reference image (top right) and their absolute difference (bottom left) with 
its histogram (bottom right). ............................................................................. 98 
Figure 6.9: Image processed by the implemented edge detection circuit (top left), 
reference image (top right) and their absolute difference (bottom left) with its 
histogram (bottom right). ................................................................................ 100 
Figure 6.10: Background image (left) and image to classify its pixels as either back- 
or foreground (right) [139]. ............................................................................ 101 
Figure 6.11: Image processed by the implemented background subtraction circuit.101 
Figure 6.12: Responses to the dark image Dx, y (left) and response to the flat image 
Fx, y (right). .................................................................................................... 102 
Figure 6.13: Original image (top left) [140], distorted image (top right), image 
corrected by the implemented FFC circuit (bottom left), and the complemented 
12 Figures 
 
absolute difference between the original and the corrected image (bottom 
right). ............................................................................................................... 103 
Figure 6.14: Image with added noise (left) and its original (right). ......................... 104 
Figure 6.15: Output image of the implemented circuit of the simplified running 
average filter (top left), output image of the supposed implementation of the 
original non-simplified version of the algorithm (top right), complement of 
their absolute difference (bottom left), and the corresponding histogram 
(bottom right). ................................................................................................. 105 
Figure 6.16: Demonstration setup. ........................................................................... 105 
Figure 6.17: Captured image from demonstration of processing a live video stream 





Table 2.1: Comparison of different FPGA/SoC development boards. *Note: Xilinx’s 
[39] and Microsemi’s [40] DSP slices differ a bit in their capacity. ................ 31 
Table 3.1: Different configurations of Camera Link. ................................................ 41 





Acronyms and Symbols 






Advanced Microcontroller Bus Architecture 
High-performance Bus 
AIA Automated Imaging Association 
ASIC Application Specific Integrated Circuit 
AXI Advanced eXtensible Interface 
CCC Clock Conditioning Circuit 
CLHS Camera Link HS 
CRC Cyclic Redundancy Check 
CSI Camera Serial Interface 
DDR Double Data Rate 
DFF D Flip-Flop 
DFT Discrete Fourier Transform 
DSP Digital Signal Processing 
DVAL Data Valid 
DVI Digital Visual Interface 
FDDR Fabric DDR (controller) 
FFC Flat-Field Correction 
FFT Fast Fourier Transform 
FIFO First-In-First-Out 
FMC FPGA Mezzanine Cards 
EMI Electromagnetic Interference 
ELT Extremely Large Telescope 
16 Acronyms and Symbols 
 
ESO European Southern Observatory 
FPGA Field Programmable Gate Array 
fps frames per second 
FVAL Frame Valid 
HDL Hardware Description Language 
HDMI High-Definition Multimedia Interface 
I2C Inter-Integrated Circuit 
IDFT Inverse Discrete Fourier Transform 
I/F Interface 
I/Os Inputs/Outputs 
JIIA Japan Industrial Imaging Association 
LE Logic Element 
LPF Low-Pass Filter 
LUT Look-Up Table 
LVAL Line Valid 
LVDS Low-Voltage Differential Signaling 
MICADO 
Multi-AO Imaging CamerA for Deep 
Observations 
MIPI Mobile Industry Processor Interface 
NRZI Non Return to Zero Invert 
OSI Open Systems Interconnection 
PC Personal Computer 
PHY Physical Layer 
PISO Parallel-In-Serial-Out 
PLL Phase-Locked Loop 





TMDS Transition Minimized Differential Signaling 
Tx Transmitter 
USB Universal Serial Bus 
VHDL 
Very High Speed Integrated Circuit 
Hardware Description Language 
 
Acronyms and Symbols 17 
 
 
In this thesis the names of the signals of circuits are denoted as signal. Individual 
components of the circuits are marked with double quotation marks. The meaning of 
the signals and the purpose of the components is apparent from the figure of the 
corresponding circuit or is explained in the accompanying text. 
The names of reference MATLAB functions from Image Processing Toolbox 




Kamere so danes nepogrešljiv del v veliko različnih sistemih. Aplikacije, ki 
obdelujejo slike zajete s kamerami, se uporabljajo tako v avtomobilski industriji, kot 
na primer pri avtonomni vožnji, za namene množičnega nadzora, za medicinske 
namene ali za zagotavljanje kakovosti izdelkov v proizvodnih procesih. Kamere so 
pomembne tudi pri opazovanju vesolja in zemlje. 
Iz slik, ki jih zajamejo kamere, se lahko z uporabo ustreznih algoritmov za 
obdelavo slik samodejno izluščijo ključne informacije. Pri večini aplikacij se lahko 
zagotovi relevantnost in koristnost informacij le z obdelavo v realnem času.  
Procesiranje slik v realnem času pa postaja čedalje bolj zahtevno, saj želimo 
zajeti vedno večje slike in vedno več slik na sekundo. S tem naraščajo hitrosti 
podatkovnega toka slik. Tradicionalni sistemi za obdelavo slik, kot so računalniki, tako 
velikih podatkovnih hitrosti v realnem času ne zmorejo obdelati. Zato je potrebno 
uporabiti aplikacijam prilagojene sisteme. 
V kolikor obstoječi sistem ni sposoben obdelave v realnem času, lahko med 
kamero in ciljni sprejemnik slik dodamo sistem, ki omogoča obdelavo zajetih slik v 
realnem času. Ti sistemi so pogosto zasnovani kot namenska vezja (ang. ASIC) ali pa 
so zasnovani na podlagi vezij FPGA. Slednji so običajno primernejši, saj omogočajo 
enostavno prilagoditev sistema novim zahtevam, ki se pojavijo. Prav tako je pri 
majhnih proizvodnih serijah uporaba vezij FPGA cenejša kot pa izdelava namenskih 
vezij. 
Namen magistrske naloge je bil raziskati zmogljivosti sistemov za obdelavo 
podatkovnega toka slik v realnem času z uporabo vezij FPGA in razviti sistem, ki bi 
se lahko uporabil za namene projekta MICADO. MICADO je inštrument za zajem 
slik, ki bo del teleskopa ELT, katerega razvija agencija ESO. Inštrument bo uporabljen 
za opazovanje zvezd, odkrivanje novih zunajosončnih planetov in za raziskovanje 
območij, kjer so gravitacijska polja izjemno močna, kot so recimo področja okoli 
supermasivne črne luknje v središču naše galaksije. Astronomske slike so izredno 
velike, kar se odraža v veliki hitrosti podatkovnega toka slik. V okviru magistrske 
20 Povzetek 
 
naloge so bile zato opredeljene tudi omejitve realno časne obdelave podatkovnega toka 
slik z vezji FPGA. 
Razvit sistem je bil implementiran na razvojni plošči Smartfusion2 Advanced 
Development Kit proizvajalca Microsemi. Razvojna plošča vsebuje SmartFusion2 
M2S150 SoC FPGA. Za zajem slik je bila namesto teleskopa uporabljena industrijska 
kamera GO-2400M PMCL proizvajalca Jai. Kamera pošilja zajete slike preko 
vmesnika Camera Link. 
Izdelan sistem se vstavi med kamero in ciljni sprejemnik slik, zato mora biti 
sposoben zajema in oddaje slik preko vmesnika, ki ga uporablja kamera. Tako sta bili 
v okviru magistrskega dela razviti logični vezji sprejemnika in oddajnika za vmesnik 
Camera Link. Vezji sta bili implementirani v FPGA. Razviti logični vezji podpirata 
delovanje vmesnika Camera Link v osnovni, srednji ali polni konfiguraciji pri 
frekvencah do 38,2 MHz. 
Poleg sprejemnika in oddajnika za vmesnik Camera Link je bilo za sistem 
razvitih tudi nekaj implementacij pogosto uporabljenih algoritmov za obdelavo slik, 
in sicer: 
 filtriranje slik, 
 detekcija robov, 
 odstranjevanje ozadja, 
 povprečenje slik in 
 korekcija ravnega polja (ang. flat-field correction).  
Od aplikacije je odvisno, kateri našteti algoritmi so dejansko implementirani v vezju 
FPGA. 
Največja velikost slike, ki jo lahko zajame uporabljena kamera, je 1216 x 1936 
slikovnih točk. Kamera sočasno pošilja preko vmesnika po 8 slikovnih točk. Hitrost 
zajema slik pogojuje uporabljeno frekvenco vmesnika. Kamera podpira frekvence 
vmesnika 37,125, 74,25 in 84,85 MHz. Razvit sistem lahko zaradi frekvenčnih 
omejitev vhodno/izhodnih enot uporabljenega vezja FPGA sprejema podatke preko 
vmesnika pri frekvencah do 38,2 MHz. Zato je bila hitrost zajema slik omejena na 
118,23 slik na sekundo. Pri 8 bitni globini zajetih slik je tako hitrost podatkovnega 
toka slik enaka 2,2 Gbps. 
Podatkovni tok slik je lahko obdelan v realnem času le, če je hitrost obdelave 
podatkov najmanj enaka hitrosti podatkovnega toka. Zato zasnovan sistem prav tako 
obdeluje podatke s hitrostjo 2,2 Gbps. Takšna hitrost je bila dosežena s sočasno 
obdelavo 8 slikovnih točk pri enaki frekvenci, kot jo uporablja vmesnik. 
Pri nekaterih implementiranih algoritmih je bilo potrebno za njihovo delovanje 
v pomnilniku shraniti eno ali več slik. Notranji pomnilnik v FPGA v obliki SRAM je 
Povzetek 21 
 
premajhen, da bi lahko hranil tako veliko količino podatkov. Zato potrebujejo ti 
algoritmi dostop do večjega, zunanjega pomnilnika DDR. Dostop do pomnilnika DDR 
je počasnejši kot pa do pomnilnika SRAM; še posebej pri posamičnem dostopanju do 
podatkov v pomnilniku DDR. Hitrost dostopa do pomnilnika neposredno omejuje 
hitrost obdelave slik. Zato je bilo razvito vezje, ki omogoča največjo možno hitrost 
dostopa do pomnilnika DDR, in sicer z uporabo več zaporednih prekrivajočih skupnih 
dostopov (ang. overlapping burst accesses). Za uporabljen razvojni sistem je hitrost 
dostopa do pomnilnika DDR omejena na 5.3 Gbps. Pri hitrosti podatkovnega toka slik 
2.2. Gbps so zato lahko v sistemu implementirani le algoritmi, ki potrebujejo največ 
dva dostopa do pomnilnika DDR. 
Algoritmi so bili razviti v okolju Simulink. Nato so bili generirani opisi vezij 
HDL, kateri so bili kasneje implementirani v FPGA. Delovanje generiranih vezij smo 
simulirali. Izhodne obdelane slike, ki smo jih pridobili s simulacijo, smo nato 
primerjali s slikami obdelanimi z istimi algoritmi implementiranimi s knjižnico Image 
Processing Toolbox orodja MATLAB. Na podlagi ujemanja obeh načinov obdelave 
slik smo verificirali pravilnost generiranih vezij.  
Orodje MATLAB z njihovo knjižnico Vision HDL Toolbox že ponuja rešitev za 
implementacijo algoritmov za filtriranje slik in detekcijo robov v vezjih FPGA. Vendar 
knjižnica v času izdelave naloge ni podpirala sočasne obdelave več slikovnih točk in 
ni omogočala zadostnih podatkovnih hitrosti. Zato ta rešitev ni bila uporabljena pri 
razvoju sistema. Kasneje sta bili sicer ti dve omejitvi odpravljeni z najnovejšo verzijo 
knjižnice. Orodje MATLAB pa ne ponuja rešitev za ostale tri algoritme, ki smo jih 
želeli implementirati. 
V okviru magistrske naloge je bilo tako poleg vezja za zajem in oddajo 
podatkovnega toka slik preko vmesnika Camera Link, uspešno implementiranih pet 
algoritmov za realno časno obdelavo slik. Delovanje sistema je bilo demonstrirano na 
primeru realno časne detekcije robov. 
V prihodnje bi razvit sistem lahko nadgradili tako, da bi omogočal še večje 
hitrosti obdelave podatkovnega toka slik in uporabo kompleksnejših algoritmov. 
Vendar bi bilo v tem primeru potrebno za osnovo sistema uporabiti zmogljivejše vezje 
FPGA. 
 
Ključne besede: obdelava slik, obdelava slik v realnem času, FPGA, kamera, 




This master’s thesis presents an FPGA based system for real-time processing of 
a Camera Link image stream. The system is based on Microsemi’s Smartfusion2 
Advanced Development Kit which includes a SmartFusion2 M2S150 SoC FPGA and 
an industrial camera which uses the Camera Link I/F. The system is meant to be 
inserted between the camera and the intended receiver, providing real-time processing 
of the images captured by the camera. The developed system is capable of processing 
images up to the frame rate of 118.23 fps at the maximum image size provided by the 
used camera, namely 1216 by 1936 pixels. 
For acquiring images sent by the camera and to transmit the processed images to 
the intended receiver, a Camera Link receiver and a Camera Link transmitter were 
designed. The receiver and transmitter implemented in the FPGA support the Camera 
Link interface at base, medium and full configurations at frequencies up to 38.2 MHz. 
Some of the widely used algorithms for preprocessing of images were designed 
to be implemented in the system. These are: image filtering, edge detection, 
background subtraction, image averaging, and flat-field correction. Some of these 
algorithms use external DDR memory. Therefore, a circuit for accessing the DDR 
memory was designed as well. The circuit was designed to achieve the highest possible 
throughput of the DDR memory used. 
 
Key words: image processing, real-time image processing, FPGA, camera, 





Cameras are nowadays indispensable gadgets used in a plethora of applications, 
ranging from uses in automotive industry for autonomous driving [1], [2], for security 
and surveillance [3], [4], in medicine [5], for product quality assurance in 
manufacturing [6], as well as observations of both space [7] and earth [8], [9]. 
Most of these applications require images received by the camera to be processed 
in real-time for them to be relevant, useful and to prevent loss of image frames [10]. 
The real-time processing means that the processing of data has to be completed in the 
time available between the two successive input samples [11]. In case of images, this 
can be interpreted that an image has to be processed before the next frame arrives. If 
this processing criterion is not met, frames could be dropped, thus causing the loss of 
information. For most of the mentioned applications a missed frame could mean 
catastrophic consequences. A real-time processing system with such strict processing 
criteria is referred to as a hard real-time system [8], [12]. 
In order to extract the needed information from a captured image, different 
processing algorithms have to be applied. These vary from primitive operations like 
noise reduction to more advanced like feature extraction, pattern recognition and 
object recognition based on which automatic decisions can be made. Their use is 
dependent on the specific application. However, some algorithms are widely used in 
various applications to preprocess the images. The preprocessing algorithms enhance 
the features of interest in the image, simplifying higher-level processing [13]. 
However, processing of images in real-time is getting more and more difficult 
as we are striving to capture ever larger images and at increasing frame rates, altogether 
increasing processing requirements. Therefore, traditional methods for processing 
images from a camera, i.e. with a single or multiple processors, are insufficient and 
other options have to be considered especially for embedded applications where 
processing resources are limited. The solution for achieving processing in real-time is 
to add components to the system with dedicated functionality of image processing. 
The processing can be implemented for example in Application Specific Integrated 
26 1 Introduction 
 
Circuits (ASICs), which are optimized for power consumption and size. But ASICs 
are more expensive and less flexible for adaptation if the application changes, 
compared to other option of implementing the processing using Field Programmable 
Gate Arrays (FPGAs). The image processing is application specific, thus a modifiable 
system is more appropriate especially when used in products of low quantity [11]. If 
an FPGA is not included in the system which receives the images, a simple solution is 
to insert the FPGA between the camera and the receiver. In this case the FPGA must 
acquire the image sent by the camera via an interface (I/F), process it and then send it 
to the intended receiver via the same or even a different I/F. The receiver notices only 
an additional latency produced by the processing in the FPGA. 
In this thesis, a system which is capable of processing an image stream in real-
time is developed. The system is based on an FPGA development board which is meant 
to be inserted between the camera and the intended receiver, when the latter does not 
include the needed hardware resources for real-time image processing. However, the 
developed system could easily be simplified to be implemented on a system which 
already includes an FPGA as well. In order to include the system between the camera 
and the intended receiver, a receiver (Rx) and a transmitter (Tx) compatible with the 
I/F of the camera is needed. For this thesis, an industrial camera which uses Camera 
Link I/F is chosen. Therefore, a Camera Link receiver and transmitter are designed as 
well. 
For the real-time image processing, some of the widely used algorithms for 
preprocessing of images are designed to be implemented in the system. The 
implemented algorithms include: 
• image filtering, e.g. for noise reduction, 
• edge detection, 
• background subtraction for detection of moving objects, 
• image averaging for noise reduction, and 
• flat-field correction (FFC) for lowering distortions in the image produced by 
the camera. 
The use of the listed algorithms depends on the application specific needs. With 
their implementation in the developed system, further high-level image processing can 
be sped up. 
The real-time image processing algorithms are developed in Simulink [14] using 
HDL library which allows for automatic generation of HDL code with HDL coder 
[15]. The generated code can be synthesized and implemented on an FPGA. 
Development of the algorithms with Simulink also allows for straightforward 
1 Introduction 27 
 
verification of the implemented algorithms as their functionality can be compared with 
MATLAB’s functions from the Image Processing Toolbox [16]. 
MATLAB already provides solutions for implementing image filtering and edge 
detection algorithms in an FPGA with its Vision HDL Toolbox [17]. However, other 
algorithms which need information about multiple frames are not supported. 
Moreover, even the supported algorithms have only just recently been updated to 
process multiple pixels in parallel. Prior to this update processing of high frame rate 
and high resolution video with those algorithms was not possible [18], [19]. This thesis 
aims to tackle these limitations. 
The system was developed with the goal to investigate the possibilities of 
real-time image processing with FPGAs which could be used for Multi-AO Imaging 
CamerA for Deep Observations (MICADO) project [20], [21]. MICADO is an 
instrument being developed by a consortium of partners from around Europe, 
including the Max-Planck-Institute for extraterrestrial Physics, and the European 
Southern Observatory (ESO). MICADO is one of the first-light instruments for the 
Extremely Large Telescope (ELT) [20]. The ELT is a ground-based telescope 
currently being constructed by ESO. When completed it will be the world's largest 
optical/near-infrared telescope [22]. The MICADO instrument will be used to study 
stars in nearby galaxies as well as for discovering exoplanets. It will also be able to 
explore environments where gravitational forces are extremely strong. One such 
example are the areas close to the supermassive black hole at the center of our galaxy 
[20]. 
Astronomical images can be very large [23]. Therefore, one of the requirements 
of the developed system was to process images of the maximum size that can be 
provided by the camera. It was also desired to process the images at the highest 
possible achievable frame rate. 
In the following sections, the developed system as well as the theory behind it 
are presented in detail. In Section 2, a more detailed overview of the system as well as 
an overview of the used components is given. The Section 3 describes and compares 
different camera interfaces which are commonly used by industrial cameras. The 
mentioned image processing algorithms and their practical usage are illustrated in 
Section 4. The implementation of the Camera Link I/F circuits and the image 
processing algorithms are presented in Section 5. The Section 6 provides the 
verification of the developed system. The Section 7 concludes the thesis and gives an 
outlook for possible further work. 
 
29 
2 System Architecture 
In the following subsections an overview of the developed system for real-time 
processing of a Camera Link image stream is given. Its main components, namely the 
used camera and development board, are presented in detail as well. 
2.1 System Overview 
For the purpose of this thesis an industrial graded camera GO-2400M PMCL 
[24] by Jai was chosen for image capturing. A PCIe frame grabber board [25] from 
Silicon Software in combination with a personal computer (PC) was used to receive 
the images from the camera. For configuring the frame grabber and displaying the 
received images the microDisplay software was used [26]. 
The developed real-time image processing algorithms were implemented in a 
SmartFusion2 M2S150 system-on-chip (SoC) FPGA [27] of the Microsemi’s 
SmartFusion2 Advanced Development Kit [28]. For connecting the camera to the 
development board and the development board to the PC, two FPGA Mezzanine Cards 
(FMCs) [29] by Alpha Data were used. The system setup is depicted in Figure 2.1. 
 
Figure 2.1: System overview. 
2.2 Camera 
The camera [24] provides images of a maximum size of 1936 by 1216 pixels at 
a maximum frame rate of 165.59 frames per second (fps) with bit depths of 8 or 10 
bits. It uses the Camera Link I/F at one of the three possible frequencies: 37.125, 74.25, 
30 2 System Architecture 
 
or 84.85 MHz, with either base, medium or full Camera Link configuration. The 
camera configuration is dependent on the frequency of the Camera Link I/F and the 
camera tap configuration, which also define the Camera Link configuration used [30]. 
The camera configuration defines the bit rate and consequently the processing 
requirements of the FPGA. One of the requirements of the thesis was to process images 
at the highest possible frame rate. Those are achievable when the camera operates at 
tap configuration of 1X8-1Y. The tap configuration of 1X8-1Y means that in each 
clock cycle, 8 bits are transmitted via the I/F. According to the camera datasheet [30] 
the camera was configured to the maximum image size, bit depth of 8 and tap 
configuration of 1X8-1Y. In this configuration the camera uses full Camera Link I/F. 
The camera was configured using Jai’s GO-2400 Control Tool [31]. 
The functionality of the camera and the correctness of its configuration were 
verified by directly connecting the camera to the frame grabber of the PC (Figure 2.2). 
The configuration worked as expected as the frame grabber successfully displayed the 
incoming images. 
 
Figure 2.2: Connection between camera and PC. 
2.3 Development Board 
The Microsemi’s Smartfusion2 Advanced Development Kit [28] was chosen for 
the system because it supports simultaneous connection of two FMCs. This 
requirement is essential for the designed system as it can be seen from the overview 
of the system (Figure 2.1). Other development boards by Microsemi [32] include at 
most one FMC connector except for the RTG4 Development Kit [33]. The latter 
includes an RT4G150 FPGA which has more resources than the SmartFusion2 
M2S150 SoC FPGA of the chosen board [27]. The RT4G150 FPGA is resistant to 
radiation-induced configuration upsets, thus is a lot more expensive [34] than the 
chosen FPGA [35]. Similarly, there are few development boards with two FMC 
connectors by Xilinx [36]. Their boards which do have two connectors are all a lot 
more expensive than the chosen board, e.g. [37], all but one [38]. The latter one is of 
approximately the same price as the chosen board but includes less hardware 
2 System Architecture 31 
 
resources. An overview of the specifications of the mentioned development boards is 
given in Table 2.1. 
 























Logic elements 146k 152k 326k 85k 
DSP slices * 240 462 840 220 
Total SRAM 4.5 Mb 5.2 Mb 16 Mb 4.9 Mb 
DDR maximum 
transfer rate 
667 Mbps 667 Mbps 1600 Gbps 1333 Gbps  
FMC 
connectors 
2 2 2 2 
Price $1,043 [12] $21,576 [11] $1,695 $895 
Table 2.1: Comparison of different FPGA/SoC development boards. *Note: Xilinx’s [39] and 
Microsemi’s [40] DSP slices differ a bit in their capacity. 
Based on the comparison of the specifications of different development board 
the Microsemi's SmartFusion2 Advanced Development Kit proves to be an appropriate 
choice for the developed system. 
The two most important parts for the design of the developed system are the 
FPGA and the external memory of the chosen development board. The FPGA is the 
center part of the system as this is where the image processing algorithms as well as 
the circuit for interfacing the camera are implemented. As it is shown in the later 
sections, the developed system needs to store a lot more data than can be stored in the 
internal memory of the FPGA. Therefore, external memory needs to be used. These 
two parts of the development board are described more in detail in the following 
subsections. 
2.3.1 Field Programmable Gate Array 
The FPGA implements the designed circuits in the form of interconnected logic 
elements (LEs), also referred to as logic cells. Each LE includes a 4-input look-up table 
(LUT) which can be programmed to perform any 4-input combinatorial function. The 
LE includes besides the LUT also a D flip-flop (DFF) from which sequential circuits 
can be designed. Moreover, the FPGA has optimized circuits for digital signal 
processing (DSP) whose use makes designs more efficient. Similarly, the FPGA 
32 2 System Architecture 
 
includes additional internal memory in the form of SRAM blocks. The way the 
building blocks of the FPGA are interconnected between one another can be 
programmed. Subsequently, the programmed content of the LUTs of the LEs and the 
interconnection of the building blocks define the functionality of the FPGA [41], [42]. 
In the Figure 2.3 an example of the internal architecture of an FPGA of the same FPGA 
family as the one used in this thesis is displayed. 
 
Figure 2.3: Example of architecture of a SmartFusion2 M2S050 SoC FPGA [41]. 
Note that Figure 2.3 does not show how the building blocks of the FPGA are 
interconnected. The signals are routed differently based on their purpose, i.e. clock and 
data signals. The clock signals are usually connected to a lot of other elements which 
can be on different parts of the circuit. To ensure that the clock signals arrive at the 
same time to all the elements they are routed over dedicated clock distribution 
networks with low skew. For generating different clocks in the circuit the FPGA 
includes the Clock Conditioning Circuit (CCC) [43]. The CCC is a programmable 
2 System Architecture 33 
 
phase-locked loop (PLL) circuit. From the input signal of a certain frequency the CCC 
can generate up to 4 clocks of lower or higher frequencies than the frequency of the 
input. The phase of the output clocks can be set [44]. 
The Figure 2.3 shows another important part of the FPGA, namely its inputs and 
outputs (I/Os). Similarly to other building blocks I/Os can be connected to other blocks 
and their functionality configured [41]. The I/Os of the FPGA support different I/O 
standards amongst which is also the Low-Voltage Differential Signaling (LVDS) 
which is used by the Camera Link I/F. The maximum supported frequency of the 
LVDS I/O signal is 267.5 MHz [45]. 
2.3.2 Double Data Rate Memory 
The chosen development board has a total of 1 GB of external double data rate 
(DDR) memory available [28]. This is a lot more than the internal SRAM memory. 
Therefore, it can be used to store more data. However, access times to the external 
DDR memory are a lot longer than times for accessing SRAM [46]. Therefore, special 
care has to be taken to maximize the throughput of the DDR memory. 
For accessing the DDR memory the FPGA includes an ASIC, named Fabric 
DDR (FDDR) controller. The FDDR controller provides the DDR memory with 
control and data signals as well as the clock. The controller runs at maximal 333 MHz, 
thus the DDR has the same limitation as well [47]. The controller executes read or 
write transaction to or from the DDR memory either via 32 bit wide AHB or 64 bit 
wide AXI bus [48], [49]. The controller has a requirement that its clock is a multiple 
of the bus clock. Therefore, the configuration for providing the highest data throughput 
is for the DDR to operate at 333 MHz and the bus at 166.66 MHz [50]. 
The use of the AXI bus provides higher throughputs than the AHB bus. This is 
on the account of the wider bus. The AXI bus also includes separate channels for 
reading and writing as well as for control and data signals. The separate read and write 
channels allow for simultaneous read and write operations on the bus. Moreover, 
because the control and data channels are separated a new transaction can be issued 
even before the last one has completed. This is referred to as overlapping transactions. 
The FDDR controller allows simultaneous read and write operations. However, 
the operations are internally queued as DDR cannot be simultaneously read from and 
written to. The controller also supports overlapping transactions but only for read 
accesses and just up to 5 [47], [50]. As already hinted the FDDR controller internally 
reorders the incoming transaction for optimal accessing of DDR, thus maximizing 
throughput. 
34 2 System Architecture 
 
In order to achieve the highest possible throughput it is imperative not to access 
the memory individually but in burst mode. In burst mode only one address is provided 
and a sequence of values are read or written. With burst accesses the access latency is 
decreased [50]. The FDDR controller with the AXI bus supports bursts of maximum 
length of 16 [49]. Moreover, to maximize the throughput the DDR should be accessed 
in a way which minimizes the changes of rows of the DDR. When a new row of the 
DDR is accessed it must first be pre-charged. This takes additional time before the row 
can be accessed, thus reducing the overall throughput. With all the mentioned 
optimizations for maximizing the memory throughput the maximum achievable 
throughput of the DDR is 5.32 Gbps [50]. However, the DDR memory has to be 
periodically refreshed which is not taken into account in the aforementioned maximal 




3 Camera Interfaces 
An interface is the point at which two entities make contact [52]. In the case of 
camera interfaces, one of those entities is a camera which transmits captured images 
to the other entity, usually a computer [10]. 
There exist many different I/F standards for cameras, each having its unique 
features [10], [53], [54]. However, they share some similarities. For example, 
differential signaling is used to lower susceptibility to electromagnetic interference 
(EMI). This allows high data rates which are needed for transmitting video. Moreover, 
serialization of data is used to lower the number of needed lines for the interconnect, 
saving space and costs [55]. Most of the camera I/F use other general purpose I/F as 
their basis and adapt them to specific needs for video transmission applications. 
To choose the most appropriate camera I/F for a certain application the pros and 
cons of each I/F should be considered. In the following subsections some of the most 
widely used camera I/Fs are presented and compared. However, first are presented the 
general purpose I/F which are used by some of the camera I/F. 
3.1 General Purpose I/F Used By Camera I/F 
The following general purpose I/Fs are used as a basis for the camera I/Fs. These 
general purpose I/Fs differ vastly between one another because they define 
interconnect which corresponds to different layers of Open Systems Interconnection 
(OSI) reference model [56]. 
3.1.1 Low Voltage Differential Signaling 
Low voltage differential signaling (LVDS) is an I/F used for high-speed, low-
power transmission of binary data over copper. It is defined by the TIA/EIA-644 
standard, created in 1996. The data is transmitted as a voltage difference between two 
differential lines (V+ and V-). The lines are terminated using a resistor that matches 
the characteristic impedance of the lines, which is usually 100 Ohm. Figure 3.1 depicts 
36 3 Camera Interfaces 
 
the signals on a LVDS line and the data the signals represent. The LVDS uses low 
voltage levels (typically 350mV) to reduce power consumption and increase switching 
speed to a theoretical maximum transmission rate of 1.923 Gbps [57]–[59]. 
 
Figure 3.1: Representation of signals on LVDS line. 
3.1.2 Transition Minimized Differential Signaling 
Transition minimized differential signaling (TMDS) is a serial I/F developed by 
Silicon Image Inc. It utilizes a specific 8b/10b encoding to maximize resistance to EMI 
by achieving DC-balance and minimizing the number of transitions at the same time. 
Moreover, to further minimize susceptibility to interference it uses twisted pairs and 
differential signaling, similarly to LVDS [60], [61]. 
3.1.3 Channel Link 
Channel Link is a general purpose 7:1 serializer/deserializer (SerDes) interface 
developed by Texas Instruments (formerly National Semiconductor) in 1990s [55], 
[62]. It was developed as an alternative to the widening of data buses, which was 
needed due to the increasing throughput requirements [63], [64]. Channel Link is a 
low cost solution because it uses few lines and does not require synchronization 
training patterns nor a clock source at the receiver end [62]. The data is transmitted 
over n+1 LVDS lines, where: 




One additional LVDS line is used for transmitting the bus clock [62]. The clock 
has a duty cycle of 4:3 [65]. Figure 3.2 shows timing diagram of one data line and 
clock transmitted via Channel Link. The MSB bit is transmitted first. 
3 Camera Interfaces 37 
 
 
Figure 3.2: Timing diagram of one data line and clock of Channel Link. 
Channel Link is capable of serializing parallel buses of widths up to 48 bits and 
clocks up to 133 MHz. In case of 48 bit wide bus, the 7th bit of the each of the serialized 
data lines is used for DC-balancing [62], [65]. 
Channel Link has developed through the years into its 2nd and 3rd generation, 
named Channel Link II and Channel Link III respectively. The 3rd generation was 
introduced in 2010 [66]. The 2nd generation further decreases the number of signal 
pairs required by encoding data and clock onto a single pair compared to Channel Link. 
The 3rd extends the advantage of the 2nd generation by additionally incorporating a 
low-speed, bidirectional I2C control bus on the same pair. This further reduces 
interconnect size, weight and cost [10], [66], [67]. Figure 3.3 illustrates the differences 
between Channel Link generations. 
 
Figure 3.3: Difference between Channel Link generations [67]. 
3.1.4 Universal Serial Bus 
Universal Serial Bus (USB) is a widespread general purpose I/F initially meant 
to provide interconnect between PCs and telephones as well as to unify I/Fs for PC 
peripherals, adding plug-and-play feature to them. It was designed in the 90s by 
Compaq, Intel, Microsoft and NEC [68]. Since then the USB has progressed into its 
2nd [69] and 3rd generation [70] to answer the increasing demands for higher transfer 
rates. USB 2.0 increased the initial maximal transfer rate of 12 Mbit/s to 480 Mbit/s 
[69]. With the currently newest version USB 3.2 the maximum transfer rate was 
increased to 2 GB/s [71]. 
38 3 Camera Interfaces 
 
Like the name suggests, the USB transmits data as a serial bit stream. Therefore, 
the USB serializes the data before transmitting them [68]–[71]. The transmitted data 
is encoded, which has changed between generations. The 1st and 2nd generation use 
so called “Non Return to Zero Invert” (NRZI) encoding where a logical one is 
represented by no change in signal level, on the other hand a logical zero is represented 
by a change in the level [68], [69]. The 3rd generation switched to use of 8b/10b 
encoding [70] and to 128b/132b encoding for the maximum transfer rates in USB 3.2 
[71]. In the first two generations the data is transmitted over a differential pair, denoted 
D+/D-. Besides the two differential lines, the USB I/F also provide VBUS and GND 
lines for powering the devices [68], [69]. Later versions added additional lines to 
increase the transfer rates [70], [71]. 
3.1.5 Inter-Integrated Circuit 
Inter-Integrated Circuit (I2C) is a bidirectional 2-wire bus developed in 1982 by 
Philips Semiconductors (now NXP Semiconductors). It supports multiple masters and 
multiple slaves on the bus as each device has a unique address. Therefore, the I2C is 
very space efficient for connecting multiple devices as it requires only two lines, a 
serial data line (SDA) and a serial clock line (SCL). Transfer rates of the I2C can reach 
up to 5 Mbit/s [72]. The I2C is used in different applications [73], especially for control 
purposes [74]. 
3.1.6 D-, C- and M-PHY 
The D-, C- and M-PHY are all physical layer (PHY) standards from Mobile 
Industry Processor Interface (MIPI) Alliance [75], targeted for use in mobile industry. 
All the mentioned PHY standards are similar as they serialize data and use differential 
signaling for transmission on each lane. All of them are also scalable by simply 
incorporating multiple lanes. However, M-PHY uses special 8b/10b encoding with 
embedded clock, reducing the need for a dedicated clock lane and increasing resistance 
to EMI, thus increasing maximal transmission rate per lane. On the other hand it does 
not include low-power mode compared to other PHY standards [76], [77]. 
3.1.7 UniPro 
UniPro is MIPI’s transportation layer for interconnecting chipsets and 
peripherals in mobile devices, developed in 2007. It uses M-PHY standard as the 
physical layer. The UniPro’s main applications are either as a standalone interface for 
chip-to-chip and interprocessor communications or as a building block for multimedia 
interfaces in devices like smartphones, laptops and cameras, to name but a few. The 
3 Camera Interfaces 39 
 
I/F can be implemented on as few as four wires and also provide quality of service 
feature [78]. 
3.2 HDMI 
High-Definition Multimedia Interface (HDMI) is a widely used interface with 
over 4 billion consumer devices supporting it [79]. It was developed in 2003. Amongst 
the funding members are Phillips, Panasonic, Sony and Toshiba [80]. The main 
applications of HDMI include transmitting audiovisual signals from DVD-players, 
computers and cameras to video displays. It also features content protection using data 
encryption and authorization [81], [82].  
There are multiple versions of the HDMI standard, with the newest one 
currently, version 2.1, has a bandwidth capability of 48 Gbps [83]. For data transfer it 
uses TMDS with four differential lines, three for data and one for clock. The data lines 
carry video control signals, video pixels or auxiliary data like audio [81]. The data on 
the data line is serialized 10:1 [81]. HDMI also includes two additional lines for status 
exchange using I2C I/F and an optional line for additional control. The HDMI 
complies with the CEA-861 standard and is thus compatible with older Digital Visual 
Interface (DVI) [81]. 
3.3 USB3 Vision 
USB3 Vision is an I/F for vision applications which is based on the USB 3.0 
standard [84]. It was standardized in 2011 by Automated Imaging Association (AIA), 
which is the largest machine vision trade group in the world [85]. The purpose of USB3 
Vision is to standardize the use of the USB in machine vision industry [86]. The main 
advantage of this I/F is the use of the USB as no additional hardware is needed, i.e. 
frame grabbers at the receiver, and no camera manufacturer specific drivers [87]. The 
USB3 Vision only defines its own transport layer which is specially adapted to the 
needs of machine vision technology [86]. 
3.4 Camera Serial Interface 
Camera Serial Interface (CSI) is an I/F between a camera and a processor, 
developed by MIPI Alliance [77]. The CSI advantages are low power consumption, 
high-performance, low cost and low EMI which are crucial characteristics for mobile 
industry [88], [89]. 
40 3 Camera Interfaces 
 
The CSI has two versions which are currently used, namely CSI-2 and CSI-3. 
The former builds upon MIPI’s standards D-PHY and C-PHY [77]. The CSI-2 is a 
simple, high-speed, packet based protocol primarily intended for point-to-point image 
and video transmission between cameras and host devices. It is widely adopted by the 
mobile industry [88]–[90]. The CSI-3, on the other hand, is an application layer I/F 
used to integrate cameras on top of UniPro [91]. It was developed in 2012 and supports 
even higher resolutions and frame rates than CSI-2 due to higher maximal data 
transmission rates per lane of M-PHY compared to D- and C-PHY [76], [88]. The 
structural comparison of the two CSI I/Fs is shown in Figure 3.4. 
 
Figure 3.4: Structural comparison between CSI-2 and CSI-3 [92]. 
3.5 GigE Vision 
GigE Vision is a widespread camera I/F standard built upon standard IP 
networks using Ethernet. It was developed by AIA [93]–[95]. The GigE Vision 
protocol merely defines the communication between video sources and sinks which 
use the Ethernet for the interconnect and IP with UDP as the transport layer [94], [95]. 
The main advantages of the GigE Vision are high scalability and low cost as well as 
long transmission lengths due to the use of standard Ethernet hardware [93]. It can also 
utilize Power over Ethernet (PoE) when available [96]. 
3.6 CoaXPress 
CoaXPress is an I/F developed mainly for machine vision applications but is also 
suitable for other high speed data transmission applications. The standard was defined 
in 2010 under supervision of Japan Industrial Imaging Association (JIIA) [97], [98]. 
As the name suggests it uses coaxial cables over which data is transmitted in packets, 
3 Camera Interfaces 41 
 
encoded using 8b/10b encoding. Over the same cable, power can be supplied to the 
camera. Moreover, multiple cables can be used to increase I/F bandwidth [99], [100]. 
3.7 Camera Link 
Camera Link is an I/F for vision applications developed by AIA in 2000 [57]. It 
builds upon Channel Link I/F to provide a specification more appropriate for vision 
applications, standardizing connections between cameras and frame grabbers [101]. 
The Camera Link supports six different configurations depending on the number 
of data bits transmitted in a single clock cycle (Table 3.1). Depending on the number 
of data bits transmitted either one or two cables need to be used. The cables can also 
provide power to the camera [101]. 
 
Configuration Number of data bits Number of cables 
Lite 10 1 
Base 24 1 
Medium 48 2 
Full 64 2 
72 bit 72 2 
80 bit 80 2 
Table 3.1: Different configurations of Camera Link. 
Besides data bits, synchronization bits are also transmitted. Camera Link 
interface defines four synchronization signals: frame valid (FVAL), line valid 
(LVAL), data valid (DVAL) and spare (to be defined for future use) [101]. 
The DVAL signal carries redundant control information. All of the control 
signals should be (redundantly) transmitted by each Channel Link chip as well as the 
clock signal. The 80 bit version uses all of the redundant synchronization signals to 
carry data bits, thus increasing throughput. Depending on the number of transmitted 
data bits either one, two or three 28 bit Channel Link chips are needed. The data bits 
represent raw pixel values [101]. 
Figure 3.5 shows connection between a camera and a frame grabber using 
Camera Link in either base, medium, full or 72 bit configuration. Connection of data 
and synchronization bits to individual Channel Link chips for mono color 8-bit mode 
can be seen in the figure. The Camera Link also defines 4 general-purpose camera 
42 3 Camera Interfaces 
 
control signals and an asynchronous serial communication from the camera to the 
frame grabber and vice versa. These lines are shown in the figure as well. 
 
Figure 3.5: Connection of a camera and a frame grabber with Camera Link I/F for either base, 
medium, full or 72 bit configuration. Connections of individual signals for mono color 8-bit mode to 
Channel Link chips (squares) are depicted as well. Note: Px denotes a pixel of bit depth of 8, where x 
is 0-8, and signal “S” denotes spare synchronization bit. 
3 Camera Interfaces 43 
 
3.8 Camera Link HS 
Camera Link HS (CLHS) is a new I/F standard developed from Camera Link in 
2012 by AIA [54]. Main improvements are increased maximum cable lengths, low 
jitter and low latency, as well as increased bandwidth with the use of 64b/66b 
encoding. The CLHS defines the use of optical fibers, further increasing maximum 
cable lengths. It also implements cyclic redundancy check (CRC) for error detection 
[102], [103]. Sending of the same image to multiple frame grabbers of multiple 
computers is simplified with the CLHS, which can be used to increase the computing 
capabilities or to establish a failsafe system [104]. 
3.9 Summary of Camera Interfaces 
Table 3.2 shows a comparison of presented camera I/Fs in the previous 
subsections. The most important characteristics, namely the maximum achievable 
bandwidth and maximum (passive) cable length, are compared. For each I/F the option 
of powering the camera over the same cable used for data transfer is stated as well. 








HDMI 6 GB/s [83] 3 m [83] No Content protection 
USB3 
Vision 
350 MB/s [86] 5 m [87] Yes [87] 
Use of standard USB 




GB/s per lane) 
[92] 





125 MB/s [93] 100 m [93] Yes [96] 
Use of standard 
Ethernet hardware, 






100+ m [100] Yes [100] 






10 m [53] Yes [101] No data overhead 
CLHS 2.1 GB/s [103] 100+ m [102] Yes [104] 
High bandwidths and 
lengths with included 
error detection 
Table 3.2: Comparison of camera interfaces. 
44 3 Camera Interfaces 
 
Based on Table 3.2 it can be seen that Camera Link I/F provides reasonably high 
bandwidths at relatively long lengths. It also provides the option of powering the 
camera over the same cable used for the I/F. However, the most important reason, why 
it is the most appropriate choice for a real-time image processing system, is the fact 
that it transmits raw pixel values without any overhead. The receiver and transmitter 
are therefore simpler compared to other I/F, meaning they use less FPGA resources. 
As there is no overhead, the processing latencies are shorter. Because the I/F is not 




4 Image Processing Algorithms 
In the following subsections some of the standard image processing algorithms 
are presented and their practical usage illustrated. A good understanding of the 
algorithms is necessary to achieve an efficient hardware implementation. The 
implementation of the image processing algorithms in an FPGA is described in 
Section 5. 
A plethora of image processing is done on grayscale images. Even in case of 
color images some processing is done on each individual color of the image [13]. The 
focus of image processing in this section is on monochrome digital images, as the 
utilized camera is monochrome as well. 
Monochrome digital images can be represented as two-dimensional discrete 
functions 𝑓(𝑥, 𝑦), where x and y are coordinates of a pixel in an image. The value of 
the function 𝑓(𝑥, 𝑦) represents light intensity of the pixels which is commonly referred 
to as the gray level of the pixel [13], [107]. 
4.1 Image Filtering 
Similarly to one-dimensional (1D) signals, two-dimensional (2D) signals can be 
described in the frequency domain by applying 2D discrete Fourier transform (DFT) 
[13], [107]. Different frequencies correspond to different properties of the image. For 
example edges, where there is a large change in light intensity, are associated with 
higher frequencies. As with 1D signals, certain features of an image can be highlighted 
by applying different filters [107]. 
Applying a filter to the image can be either done in spatial domain or in 
frequency domain. In the spatial domain the image can be filtered by calculating the 
2D convolution of the image with the impulse response of the filter according to 
Equation (4.1). 
46 4 Image Processing Algorithms 
 
 











𝑟(𝑥, 𝑦) presents filtered image and ℎ(𝑥, 𝑦) the impulse response of the filter, 
which is also called the mask. The parameters M and N denote the size of the image 
and the filter. 
The same operation can be done in frequency domain Equation (4.2) due to the 
convolution theorem, which states that the Fourier transform of a convolution of two 
signals equals the point-wise product of their Fourier transforms. 
 𝑅(𝑢, 𝑣) = 𝐹(𝑢, 𝑣) ∙ 𝐻(𝑢, 𝑣) (4.2) 
𝑅(𝑢, 𝑣), 𝐹(𝑢, 𝑣) and 𝐻(𝑢, 𝑣) are Fourier transforms of 𝑟(𝑥, 𝑦), 𝑓(𝑥, 𝑦) 
and ℎ(𝑥, 𝑦) respectively. 𝐻(𝑢, 𝑣) is commonly referred to as the system or transfer 
function of the filter. 
To apply the filter in the frequency domain, the input image 𝑓(𝑥, 𝑦) first has to 
be transformed into the frequency domain by applying 2D DFT. After the filtering the 
result has to be transformed from the frequency domain back to the spatial domain 
using 2D inverse DFT (IDFT) in order to get the filtered image [13], [107]. 
The 2D DFT and IDFT can be implemented using 2D Fast Fourier Transform 
(FFT), which is a name for very efficient algorithms for calculation of DFT. The 
efficiency of an algorithm is usually expressed by its computational complexity, i.e. 
the amount of resources needed by the algorithm [108]. The computational complexity 
of 2D FFT is 𝑂(𝑁𝑀 log(𝑁𝑀)) compared to direct computation of DFT which 
is 𝑂((NM)2). The same computational complexities apply for filtering in the 
frequency and spatial domain respectively [109], [110]. 
Calculating the filtered image with the FFT approach seems more resource 
efficient. But this is not the case if small spatial filters are used. For most applications 
small spatial filters can be constructed by simplifying the full filter function. For 
example, small coefficients of the full filter function can be replaced with zeros. This 
limits the calculations of Equation (4.1) to just the non-zero filter coefficients, i.e. a 
smaller filter: 
 𝑟(𝑥, 𝑦) =  
1
𝑀 ∙ 𝑁






4 Image Processing Algorithms 47 
 
 where ℎ𝑆(𝑚, 𝑛) is the simplified spatial filter function of size 2a+1 by 2b+1. It usually 
applies that 𝑎 ≪ 𝑀 and 𝑏 ≪ 𝑁. Therefore, it is more resource efficient to apply the 
filter in the spatial domain than in the frequency domain. When the simplified filter is 
correctly constructed the difference between the results is negligible for practical 
purposes [13], [109]. The constructed filters are usually of a square shape. The process 
of applying the small spatial filter Equation (4.3) is visualized in Figure 4.1. For 
simplicity, a spatial filter of size 3 by 3 is used. The process depicted in the figure is 
repeated for each pixel of the image. Note that a small problem arises with the pixels 
at the boundaries of the image. For their processing, the pixels outside the image would 
be needed. However, they are unknown. Different assumptions can be used for those 
pixels, the simplest one being to assume those values are zero [13]. 
 
Figure 4.1: Applying a simplified spatial filter. 
An example of an astronomical image is presented in Figure 4.2. It shows the 
planet of Saturn. The image is a combination of multiple images. They were taken 
during the fourth year of Cassini space mission [111]. If one carefully observes, all of 
the six moons of Saturn can be seen. For easier detection their position is highlighted 
in red. In Figure 4.3 the frequency representation of the image is shown. Note that only 
the magnitude spectrum of the image is shown and the magnitudes are scaled 
according to Equation (4.4). 
 𝐺(𝑢, 𝑣) = 𝑙𝑜𝑔(1 + |𝐹(𝑢, 𝑣)|) (4.4) 
In Equation (4.4) |𝐹(𝑢, 𝑣)| is the magnitude spectrum of the image and 𝐺(𝑢, 𝑣) 
the image to be plotted [109]. 
48 4 Image Processing Algorithms 
 
 
Figure 4.2: Image of planet Saturn with its six moons taken during Cassini space mission [111]. Note 
that the image was transformed to grayscale and downscaled to the size of 1216 x 1936. The positions 
of Saturn’s moons are highlighted in red. 
 
Figure 4.3: Magnitude spectrum of the image from Figure 4.2. 
Captured images from cameras, like all real signals, include some noise, i.e. an 
unwanted component of the image. Low-pass filters (LPFs) are used to minimize the 
high-frequency noise which is reflected in the small details in the image. With 
4 Image Processing Algorithms 49 
 
unwanted details removed from the image by a LPF the relevant information is more 
easily extracted [13], [109]. 
In Figure 4.4 the spatial representation of a LPF with a Gaussian distribution is 
presented. The used standard deviation σ of the Gaussian function in the spatial 
domains is 4.6. Note that the values of the filter quickly fall towards zero. Therefore, 
the center values are enlarged for clearer presentation. The corresponding frequency 
representation of the LPF is shown in Figure 4.5. 
 
Figure 4.4: Spatial representation of a Gaussian LPF with standard deviation of 4.6. The centered 
values are enlarged. 
50 4 Image Processing Algorithms 
 
 
Figure 4.5: Frequency representation of a Gaussian LPF with the standard deviation of 4.6 in the 
spatial domain 
The Gaussian function is commonly used as a LPF for images due to its property 
of declining towards zero without any ringing [13]. Moreover, the Gaussian function 
falls at approximately 1% of its maximum already at 3σ away from its center and even 
below 4ppm for values further than 5σ. For practical purposes the influence of those 
values can be neglected [112]. As indicated this limits the calculation of the applied 
filter just to the non-zero coefficients as in Equation (4.3). This vastly simplifies the 
calculation of the filtered image as already indicated. 
The result of filtering the image from Figure 4.2 with the presented LPF 
according to Equation (4.1) is shown in Figure 4.6. The frequency representation of 
the filtered image is shown in Figure 4.7 as well. When comparing spatial images of 
Figure 4.2 and Figure 4.6 it can be seen that small details of the original image (Figure 
4.2) are mostly removed. It can also be observed that the frequency spectrum of the 
filtered image (Figure 4.7) is attenuated in the high-frequency range, which is in 
accordance with Equation (4.2). Note that the same results would have been achieved 
if the filtering was done in the frequency domain. Also practically the same results 
would have been achieved if the filtering was done with a spatial filter consisting just 
of the zoomed-in values in Figure 4.4. 
4 Image Processing Algorithms 51 
 
 
Figure 4.6: Result of filtering the image with the LPF. 
 
Figure 4.7: Frequency representation of the image filtered with a LPF. 
4.2 Edge Detection 
Another common application in image processing is edge detection [13], [109], 
[113]. As already mentioned, edges are represented by discontinuities in light intensity 
52 4 Image Processing Algorithms 
 
of the image and correspond to high frequencies. With edge detection the purpose is 
to highlight those discontinuities and de-emphasize regions where light intensity is 
mostly constant [13]. 
For detecting changes in spatial domain differentiation operators are used. Those 
operators act as high pass filters. Therefore, edge detection with differentiation 
operators is a form of image filtering [109]. The most commonly used operators for 
these purposes are Laplacian, Prewitt and Sobel operators. The former is based on the 
second-order derivative [13] while the latter two are based on the first-order derivative 
[109]. All the different operators give similar results for detecting edges [113]. The 
simplest one of them is the Prewitt operator. With its use the edges are detected by 
calculating the gradient of the image in the horizontal and vertical directions. The 
corresponding masks for this operator are shown in Figure 4.8. The edges are obtained 
by calculating the magnitude of the gradient. 
 
Figure 4.8: Prewitt operator masks for horizontal and vertical directions respectively. 
Figure 4.9 shows the magnitude of the gradient of the image from Figure 4.1. 
The gradient was calculated using the Prewitt operator. In the Figure 4.9 one can notice 
the Saturn’s moons more clearly. 
4 Image Processing Algorithms 53 
 
 
Figure 4.9: Detected edges of the image from Figure 4.2 using Prewitt operator. 
However, the small details in the image can make higher-level processing based 
on the detected edges more difficult [114]. Therefore, it is common to apply a LPF 
before detecting the edges. The result of the detected edges of the filtered image 
(Figure 4.6) is shown in Figure 4.10. With the small details removed, further 
processing is simplified. 
 
Figure 4.10: Detected edges of the image from Figure 4.6 using Prewitt operator. 
54 4 Image Processing Algorithms 
 
4.3 Background Subtraction 
Background subtraction is a commonly used method for detection of moving 
objects, subsequent recognition and tracking. It is used in many different applications 
from surveillance, medical image processing and virtual reality [3], [115], [116] to 
space applications, e.g. for space debris tracking [117]. 
The idea behind it is to subtract the current image 𝑓(𝑥, 𝑦) from a reference image 
of the background 𝐵(𝑥, 𝑦). If the difference between the pixels of the two images is 
above a certain threshold 𝑇ℎ, then those pixels are classified as foreground, i.e. are 
part of an object which entered the scene [3], [116]. This is mathematically expressed 
by Equation (4.5). 
 |𝑓(𝑥, 𝑦) − 𝐵(𝑥, 𝑦)| > 𝑇ℎ (4.5) 
The biggest challenge with background subtraction is how to acquire a valid 
representation of the background. It must include no moving object and be regularly 
updated to take into account the changing illumination. There are many approaches to 
tackling these issues [115], [118]. 
4.4 Flat Field Correction 
When capturing an image with a camera, the image is distorted to some degree. 
There is some fixed-pattern noise present due to the distortion caused by the camera’s 
lens and the small manufacturing differences between the responsivities of each 
individual pixel. The distortions make image processing like object recognition more 
difficult. Flat field correction (FFC) is applied to correct those distortions [119]–[123]. 
The distortion of the original image 𝑈(𝑥, 𝑦) can be mathematically described as: 
 𝑁(𝑥, 𝑦) = 𝑓(𝑈(𝑥, 𝑦)) (4.6) 
where 𝑁(𝑥, 𝑦) is the distorted image captured by the camera. One of the simplest 
models of describing the distortion is with a linear model according to Equation (4.7): 
 𝑁(𝑥, 𝑦) = 𝑈(𝑥, 𝑦)𝑆𝑀(𝑥, 𝑦) + 𝑆𝐴(𝑥, 𝑦) (4.7) 
where 𝑆𝑀(𝑥, 𝑦) is the multiplicative and 𝑆𝐴(𝑥, 𝑦) the additive parameters of the 
distortion, respectively. If the parameters of the distortion are known, the original 
image can be calculated according to Equation (4.8). 
 𝑈(𝑥, 𝑦) =
𝑁(𝑥, 𝑦) − 𝑆𝐴(𝑥, 𝑦) 
𝑆𝑀(𝑥, 𝑦)
 (4.8) 
4 Image Processing Algorithms 55 
 
To acquire the model parameters, two reference images have to be taken. One 
reference image is chosen as an image where no light comes to the sensor, i.e. a dark 
image. When no light comes to the sensor the acquired image should consist of all 
pixels being zero. But this is not the case due to the dark current of the sensor [119], 
[120], [122]. With this process the parameter 𝑆𝐴(𝑥, 𝑦) is acquired according to 
Equation (4.9) from Equation (4.7) because 𝑈(𝑥, 𝑦) = 0. 
 𝑆𝐴(𝑥, 𝑦) =  𝐷(𝑥, 𝑦) (4.9) 
The variable 𝐷(𝑥, 𝑦) denotes the response of the camera to the dark image. The 
other reference image is chosen as a uniform illumination of the sensor which produces 
values close to the saturation level of the pixels but should not exceed it. Such an image 
is also referred to as a flat image. From Equation (4.7) and based on Equation (4.9) the 
parameter 𝑆𝑀(𝑥, 𝑦) can be calculated according to Equation (4.10): 
 𝑆𝑀(𝑥, 𝑦) =
𝐹(𝑥, 𝑦) − 𝐷(𝑥, 𝑦) 
𝐾
 (4.10) 
4.5 Image Averaging 
Another common way to improve the quality of the captured images is to join 
the information from multiple images [111]. One such simple approach is to average 
multiple images in a sequence, reducing the noise in the output image [13]. To achieve 
this each pixel can be treated as a 1D time signal which is filtered with a running 














where 𝑦(𝑛) is a pixel of the averaged image and 𝑥(𝑛 − 1) the last L consecutive 
values of that pixel [124]. The form of the running average filter from Equation (4.11) 
is very inefficient for hardware implementations in an FPGA as it requires L memory 
accesses and L-1 additions. A more efficient form is presented in Equation (4.12). 
 𝑦(𝑛) = 𝑦(𝑛 − 1) +
1
𝐿
[𝑥(𝑛) − 𝑥(𝑛 − 𝐿)] (4.12) 
The filter form from Equation (4.12) requires only one addition and one 
subtraction as well as four memory accesses, namely to fetch the L-th previous value 
56 4 Image Processing Algorithms 
 
of the pixel 𝑥(𝑛 − 𝐿), to store the current value of the pixel 𝑥(𝑛), to fetch the previous 
averaged value of the pixel 𝑦(𝑛 − 1) and to store the current averaged value of the 
pixel 𝑦(𝑛). 
Moreover, the form of the filter form in Equation (4.12) can be further simplified 
by assuming 𝑥(𝑛 − 𝐿) ≅ 𝑦(𝑛 − 1) which results in: 
 𝑦(𝑛) = (1 −
1
𝐿




With the simplified form only two memory accesses are needed, namely one to 
fetch 𝑦(𝑛 − 1) and one to store the calculated 𝑦(𝑛). The simplification is valid if the 
pixels are not changing much through time. The difference between the simplified and 
original non-simplified running average filters can be observed by plotting their 
frequency characteristics (Figure 4.11). 
 
Figure 4.11: Magnitude of the frequency response of the simplified (red) and original non-simplified 
(blue) running average filters of length 8. 
The normalized cut-off frequencies of the simplified and non-simplified running 
average filters from Figure 4.11 are 0.133 and 0.350, respectively. The figure clearly 
shows that the filters are comparable only at low frequencies, i.e. if the signal is slowly 
changing. Note that this is in accordance with the initial assumption about simplifying 




5 System Design 
In order for the system to process the images coming from the camera they need 
to be acquired by the FPGA first. Therefore, a Camera Link receiver (Rx) needs to be 
designed. Similarly, to send the processed images further to the intended receiver, a 
Camera Link transmitter (Tx) is needed as well. 
As indicated in Section 4 some of the image processing algorithms need to 
simultaneously read up to two images from the memory or to write one image in it and 
read another one out. The internal SRAM memory is not large enough to store even a 
single image. Therefore, external DDR memory needs to be used. To recall, the 
maximum throughput of the DDR used is 5.32 Gbps (Section 2.3.2). For the system to 
process the incoming images in real-time according to the algorithms, which require 
two accesses to the memory for the processing of each individual pixel, the data rate 
of the incoming pixels is limited to 2.66 Gbps. At the full image size of 1216 by 1936 
pixels of bit depth of 8 bits this means the maximum achievable frame rate is 141 fps. 
However, due to the refreshing of the DDR its average throughput is even lower 
(Section 2.3.2), thus also the achievable frame rate. To achieve such frame rates the 
Camera Link has to operate at either 74.25 MHz or 84.85 MHz [30]. As explained in 
Section 3.7 the Camera Link serializes signals 7:1, meaning they arrive to the FPGA 
at 519.75 MHz and 593.95 MHz, respectively. These frequencies are above the 
limitation of the LVDS I/Os of the FPGA (Section 2.3.1). However, when operating 
at 74.25 MHz the data could still be correctly acquired if the I/Os were configured to 
use their DDR registers as this effectively doubles the allowed data rate to 535 Mbps. 
Nevertheless, this would vastly increase the complexity of the Camera Link I/F design 
as well as the implementation of the processing algorithms. Therefore, it was decided 
to use the Camera Link I/F at the frequency of 37.125 MHz, hence achieving the frame 
rate of 118.23 fps [30]. 
In the following subsections the implementation of the developed system for 
real-time image processing is presented. The implementations of the image processing 
algorithms (Section 5.2) are developed in Simulink based on which HDL descriptions 
58 5 System Design 
 
of digital circuits were generated. However, Simulink does not support the use of 
external memory by automatic HDL code generation. Therefore, a circuit for storing 
to and reading from the DDR memory is developed in Libero [125]. Libero is also 
used to implement the Camera Link Rx and Tx (Section 5.1). 
5.1 Camera Link Interface 
The developed Camera Link Rx and Tx are designed with respect to the 
mentioned operating frequency of 37.125 MHz. The designed components are 
compatible with either base, medium, full or 72 bit Camera Link configurations. As 
explained in Section 3.7, the Camera Link builds upon the Channel Link I/F, therefore 
a 28 bit Tx and Rx for the Channel Link are developed first. 
There already exists a Tx and a Rx core by Microsemi, specially designed for 
SmartFusion2 which should implement Channel Link I/F [126]. However, the cores 
do not abide by the Channel Link specifications [62], [66], [67] as they require a 
training pattern for synchronization [126]. The cores developed for the system do not 
need to use any training patterns for synchronization as well as use 80 percent less 
DFFs and 55 percent less LUTs compared to Microsemi’s cores. 
5.1.1 Channel Link Transmitter 
Figure 5.1 shows the top schematic of the designed Channel Link Tx. As it can 
be seen from the figure, all the outputs of the Tx component are differential. The circuit 
includes a clock conditioning circuit (“CCC”). The “CCC” is used to generate a serial 
clock (clk_ser) for the serialized output data (TX_DATA) from the input parallel clock 
(clk_par). The clk_par is the clock of the input parallel data (CL_data). The frequency 
of clk_ser is 7 times larger than the frequency of clk_par. 
 
Figure 5.1: Top schematic of Channel Link transmitter. 
5 System Design 59 
 
The component named “channel_link_tx” does the 7:1 serialization of each of 
its four data inputs and generates the transmitted clock with a duty cycle of 4:3. For 
each serialization a Parallel-In-Serial-Out (PISO) register is used as shown in Figure 
5.2. The figure represents a simplified circuit as the asynchronous active low clear 
signals of DFFs are not shown. The parallel data are written into the DFFs when a 
rising edge of the clk_par is detected and then shifted out one by one in the next 7 
clock cycles. In the figure it can be seen that there is an additional DFF used for 
detecting the rising edge of clk_par. 
 
Figure 5.2: Schematic of PISO register. 
The output clock is generated based on a code snippet in Figure 5.3. It represents 
a counter based on which the signal value of a clock signal (clk_sig) is chosen. The 
values are chosen such that the clk_sig has a duty cycle of 4:3 and is correctly aligned 
with output data according to Figure 3.2. By converting the clk_sig to a differential 
signal the output clock of the Tx (TX_CLK) is generated. 
 




 if nrst='0' then 
  clk_sig <= '0'; 
  clk_counter <= (others=>'0'); 
 elsif rising_edge(clk_ser) then 
  clk_counter <= clk_counter + 1; 
  if clk_counter >= 6 then 
   clk_counter <= (others=>'0'); 
  end if; 
  clk_sig <= CLOCK_PATTERN(to_integer(clk_counter)); 
 end if; 
end process; 
60 5 System Design 
 
The component titled “ChannelLink_to_LVDS” simply does the translation of 
input data bits to individual LVDS lines according to Figure 5.4. This is for the signals 
to comply with signal denomination of a standard Channel Link chip [63]. 
 
Figure 5.4: Output signals of a standard Channel Link chip [63]. 
5.1.2 Channel Link Receiver 
Figure 5.5 shows the top schematic of the designed Channel Link Rx. In the 
figure, the conversion from double- to single-ended signals of the input data signals 
(data_ser_in) and the input parallel clock (clk_par) of the Channel Link I/F is not 
shown. Similarly to the Tx, the Rx includes a “CCC”. It generates a serial clock 
(clk_ser) from the clk_par. The frequency of the clk_ser is 7 times larger than the 
frequency of the clk_par and is used to sample the incoming data signals. The clk_ser 
has an exactly defined phase relative to the clk_par. Setting the right phase of clk_ser 
is needed to sample the data signals at the center of the eye diagram [127]. The Channel 
Link does not use any synchronization patterns. Therefore, the transmitted clock signal 
(clk_par) with its distinct pattern Figure 3.2 is used as a source of continuous 
synchronization for correct acquisition of the data. 
 
Figure 5.5: Top schematic of Channel Link receiver. 
5 System Design 61 
 
The sampling of each of the four input data lines is done with its own Serial-In-
Parallel-Out (SIPO) register in the “channel_link_rx” component. A simplified version 
of SIPO is shown in Figure 5.6. An output register is added to receive parallel data at 
each rising edge of the clk_par. 
 
Figure 5.6: Schematic of SIPO register. 
The received input clock clk_par also gets sampled within its own SIPO 
structure via clk_par_data input of the “channel_link_rx” component. The clock 
sampled by the SIPO is denoted as clk_par_data_SIPO. The SIPO register is 13 bits 
long in order to be able to acquire the incoming data correctly even if there is a 
misalignment between the received clock signal clk_par and the data signals 
data_ser_in and clk_par_data. The misalignments happen due to different delays in 
their paths through the circuit because clocks and data signals are routed differently. 
The relationship between clocks and data of the Rx is visualized in Figure 5.7. 
 
Figure 5.7: Relationship between clocks and data of Channel Link receiver: (I) misalignment between 
the clock clk_par and its data signal clk_par_data, (II) defined phase between the clocks clk_par and 
clk_ser to sample data at the center of the eye diagram, and (III) misalignment between clk_par and 
the start of the first bit of the incoming data. 
The SIPO of length of 13 bits with additional logic is able to acquire the data 
correctly up to a misalignment of 6/7 of the parallel clock period. A shorter SIPO 
62 5 System Design 
 
would allow for a smaller margin of error in the alignment. On the other hand, an even 
longer one would improve the margin but then it would not be possible to extract the 
right data as two or more parallel data would be received. Therefore, a SIPO of length 
13 is used. 
The additional logic to extract the correct 7 data bits out of the 13 bit SIPO 
register is implemented further in the “channel_link_rx” component. All subsets of 
length of 7 bits of the clk_par_data_SIPO are compared with the expected clock 
pattern (clk_pattern), namely the sequence “1100011”. The comparison is done based 
on the schematic presented in Figure 5.8. 
 
Figure 5.8: Comparison of received clock signal with the expected clock pattern. 
The results of the comparators are connected to a circuit (Figure 5.9) which 
outputs the same set of the data bits as where a match between the expected clock 
pattern and the sampled clock is found. In case it happens by mistake that multiple 
matches are found the oldest data would have been output. If no matches are found, a 
constant value of “0000000” is output and signal aligned is cleared. This is done 
separately for each of the four input data lines. 
 
Figure 5.9: Extracting data bits based on clock pattern match. 
5 System Design 63 
 
Similarly to the Tx, the Rx (Figure 5.5) includes a component titled 
“LVDS_to_ChannelLink” which does the translation of received data bits from each 
individual LVDS line to standard Channel Link bit positions as shown in Figure 5.6. 
5.1.3 Camera Link Transmitter 
Three instances of the designed Channel Link Tx are joined into a Camera Link 
Tx as shown in Figure 5.10. As it can be seen from the figure, a single “CCC” is enough 
to produce the clocks for all “channel_link_tx” components. A 
“CameraLink_to_ChannelLink” component is added to each Channel Link module for 
connecting actual pixel values and synchronization signals to standard input signals of 
a Channel Link module [101]. For the purpose of the thesis it is assumed that pixel 
values are of constant bit depth of 8 bits. In the figure, control signals of the Camera 
Link I/F are shown as well. The control signals are simply routed through the FPGA. 
One can note multiple inverters in the schematic. These are to invert the signals to 
undo the inversion produced by the FMCs [29], sending the data of the correct polarity. 
64 5 System Design 
 
 
Figure 5.10: Schematics of Camera Link transmitter. 
5.1.4 Camera Link Receiver 
Similarly as with the Tx module, three Channel Link Rx are joined into Camera 
Link Rx (Figure 5.11). As mentioned in Section 3.7, multiple clocks are redundantly 
sent via the Camera Link I/F. As they are redundant, one is chosen to generate the 
serial clock (clk_ser) and the parallel data clock (clk_par). The components 
“ChanneLink_to_CameraLink” do the translation from Channel Link signals to actual 
pixel bits and synchronization signals. The Camera Link control signals are shown as 
well. There are again some inverters due to the FMCs. 
5 System Design 65 
 
 
Figure 5.11: Schematics of Camera Link receiver. 
5.2 Image Processing Algorithms 
The following subsections describe the implementations of the image processing 
algorithms from Section 4. The implementations take into account the targeted 
hardware platform, i.e. the chosen camera and FPGA, and ensure the processing of the 
incoming pixels in hard real-time, i.e. that no frames are dropped, and thus no 
information is lost. As already mentioned, a way to achieve the hard real-time 
processing requirements of the system is to process the incoming pixels at the same 
rate as the new pixels are arriving. The following implementations utilize the parallel 
66 5 System Design 
 
processing capabilities of the FPGA in order to process the high data throughput of the 
camera. 
To recall, the algorithms should process the images of the camera at their 
maximum size, namely 1216 by 1936 pixels, and at the frame rate of 118.23 fps. To 
achieve this frame rate the camera uses full Camera Link configuration at the 
frequency of 37.125 MHz. In the full Camera Link configuration the camera sends a 
set of 8 sequential pixels from the same row each clock cycle. The pixels have a bit 
depth of 8 bits. Between each frame there are 40 blank rows sent and at the start of 
each row additional 64 pixels. The blank rows and pixels are meant for synchronization 
together with the sent synchronization signals FVAL, LVAL and DVAL. Together 
with the blank rows and pixels the size of the full frame is 1256 by 2000 pixels [30]. 
The blank pixels represent only a small part of the frame. For simplicity of the 
implementation they were treated as part of the image and processed together as they 
do not influence the result. 
The algorithms from Section 4 can be divided into two categories based on their 
memory requirements. Some algorithms require more memory than is available in the 
FPGA itself in the form of internal SRAM. Therefore, external DDR memory needs 
to be used. 
5.2.1 Algorithms Using Internal Memory 
The presented image filtering (Section 4.1) and edge detection (Section 4.2) 
techniques require information about the pixels in the neighborhood of the pixel being 
processed, i.e. from multiple columns and rows. The size of the neighborhood 
corresponds to the size of the filter mask used. The pixels of the image from the camera 
are arriving in rows. Therefore, a number of rows corresponding to the size of the filter 
need to be stored in memory. 
5.2.1.1 Storing and Accessing Pixels in SRAM 
For storing a single row of the mentioned full frame of the camera 2 KB of 
memory is needed. There is a total of 561 KB of SRAM available in the used FPGA 
[27]. This is more than enough for filtering and edge detection because we tend to use 
small filters as already indicated in Sections 4.1 and 4.2. Note that for the following 
examples the filter size is chosen to be 3 by 3 for simplicity reasons. 
For processing the set of 8 pixels arriving from the camera the pixels from the 
next row would have to be known. But those have not yet arrived. On the other hand, 
the incoming pixels present the last row of pixels needed for processing the pixels of 
the previous row at the same positions, i.e. at the same columns. For the processing of 
5 System Design 67 
 
the previous row also one row before the previous one is needed. Therefore, if the 
mentioned previous two rows are stored, the previous row can be processed using the 
incoming pixels. This can be clearly seen in Figure 5.12. It presents how an image is 
arriving from the camera. It is the same image of Saturn as in Figure 4.2. However, 
the image’s resolution is vastly decreased so that individual pixels can be seen. The 
incoming pixel set is highlighted in green. The pixels which can be processed are 
marked with yellow. The pixels currently stored in memory are highlighted in red. 
 
Figure 5.12: Buffering of incoming image. Set of 8 pixel currently arriving from the camera are 
highlighted in green. The pixels which can be processed are marked with yellow and the pixels stored 
in memory with red. 
Figure 5.13 shows the circuit for storing the incoming set of 8 pixels (pixels_in) 
in SRAM blocks and simultaneously reading out 3 sets of 8 pixels (cCol_x). The output 
sets contain pixels at the same columns of 3 consecutive rows, namely the current 
(cCol_cRow) and the previous (cCol_pRow) rows as well as the row before the 
previous one (cCol_bRow). Based on these sets the pixels of the cCol_pRow can be 
processed. The processing is done by multiplying the pixels’ neighborhoods of 3 by 3 
pixels with the filter coefficients according to procedure depicted in Figure 4.1. 
68 5 System Design 
 
 
Figure 5.13: Circuit for storing an incoming set of pixels and reading out three sets of pixels at the 
same columns of three consecutive rows. 
The used SRAM blocks are dual port, meaning a read and a write operation can 
be done in the same clock cycle. The incoming data (din) is stored to the write address 
(wr_addr) of the SRAM block which has the input port write enable (wr_en) set. The 
read data appears on the output in the next clock cycle after the read address (rd_addr) 
has been set. 
The wr_addr corresponds to the index of the column of the incoming pixel set. 
The address is generated by a cyclic counter “wr_addr counter” which is incremented 
each clock cycle, i.e. for each set of incoming pixels. It resets when reaching the size 
of the width of the full frame. When it resets, the counter “write_row counter” is 
incremented. The “write_row counter” enables one of the SRAM blocks for storing 
the din. The counter is cyclic as well and resets when reaching the number of rows of 
the filter. It corresponds to the row index of the incoming pixel set divided by modulo 
3. This is how incoming sets of 8 pixels are cyclically stored row by row in SRAM 
blocks. 
Simultaneously while storing the incoming data 3 pixel sets are read from the 
SRAM blocks. The rd_addr corresponds to the index of the column of the read pixel 
sets. It is the same as wr_addr but delayed for one clock cycle as the data is first stored 
to an SRAM block. Therefore, the data is read in the next clock cycle. As explained, 
it takes another clock cycle for the value to appear at the output of the SRAM block. 
5 System Design 69 
 
Therefore, the read pixels are delayed for two clock cycles compared to the incoming 
pixels. 
The read pixel sets from the SRAM blocks (rd_dout) have to be identified to 
which of the three rows of the filter they belong to. This is done with three 
multiplexers, namely “select before_previous/previous/current row”. Their select 
signals carry the index of the SRAM block where the corresponding row is stored. The 
index of the current row is stored by “write_row counter”. Its output is delayed for two 
clock cycles in order to be synchronized with the read pixels. The indexes of the 
previous and the next rows could be simply calculated according to Equations (5.1) 
and (5.2), respectively. 
 𝑝𝑟𝑒𝑣𝑖𝑑𝑥 = (𝑐𝑢𝑟𝑟𝑖𝑑𝑥 − 1) 𝑚𝑜𝑑 3 (5.1) 
 𝑛𝑒𝑥𝑡𝑖𝑑𝑥 = (𝑐𝑢𝑟𝑟𝑖𝑑𝑥 + 1) 𝑚𝑜𝑑 3 (5.2) 
However, this would be a very inefficient implementation from hardware’s 
perspective as adders would be needed. The most efficient solution here is to simply 
construct LUTs which hold for each possible current row index the matching indexes 
for the previous row and the row before the previous one. Those LUTs are denoted as 
“before_previous row” and “previous row” in the circuit from Figure 5.13. 
Figure 5.14 demonstrates the relationship between the input pixels_in and output 
cCol_x signals of the circuit for storing and reading the pixels. One can notice that the 
output signal cCol_cRow is delayed for two clock cycles in comparison with the input 
as already explained. This delay could be omitted if additional hardware was used. 
Nevertheless, due to the limited resources in the FPGA the focus of the thesis was to 
minimize the resource usage of the system instead of its latency. 
 
Figure 5.14: Input and output signals of circuit from Figure 5.13 through time. At time n the incoming 
set of 8 pixels is denoted as P[n][0:7] and is highlighted in green. The outputs at time n are 
highlighted in blue. 
70 5 System Design 
 
Figure 5.15 demonstrates which pixels are output by the circuit from Figure 5.13. 
The output pixels are highlighted in blue and the input pixel set in green. With yellow 
are marked pixels for which a filter mask can be directly applied. The filter mask is 
indicated by orange dots. It can be noted that for the processing of the first of the 
cCol_pRow also the previous outputs cCol_x would have to be known. Similarly, for 
the processing of the last pixel of the cCol_pRow the next outputs would have to be 
available. 
 
Figure 5.15: Example of output pixels of circuit from Figure 5.13. 
It is easy to get the previous values of the outputs cCol_x by simply buffering 
them (pCol_x). This is done by the circuit in Figure 5.16 with the three buffers 
“previous_col_of before_previous/previous/current_row”. The circuit does also one 
more important thing. It replaces the values of the pixels outside the frame with zeros 
so that pixels at the boundaries of the frame can be correctly calculated. This is done 
with the eight multiplexers. Their select signals are connected to circuits detecting the 
boundaries of the frame. 
5 System Design 71 
 
 
Figure 5.16: Circuit for storing previous values of the read outputs of circuit from Figure 5.13. In case 
pixels at the boundaries of the frame are being processed it replaces the pixels outside the frame with 
zeros. 
It can seem that by getting the previous outputs pCol_x the last pixel cCol_pRow 
still cannot be processed. While this holds true, with the pCol_x and cCol_x the last 
pixel of the previous output set pCol_pRow can be processed. By delaying the 
processed first seven pixels of cCol_pRow for one clock cycle they can be combined 
with the processed last pixel of pCol_pRow to get a full processed set of 8 pixels. This 
is done by the components “align_8th pixel” in the full circuit shown in Figure 5.17. 
The processed first seven pixels of cCol_pRow appear at the output res of the 
component “pixels_processing” and the processed last pixel of pCol_pRow at the 
component’s output res_8th. 
72 5 System Design 
 
 
Figure 5.17: Full circuit for image processing with a filter mask. 
The full circuit for image processing with a filter mask includes the two circuits 
from Figure 5.13 and Figure 5.16, named “get_neighborhood_rows” and 
“get_neighborhood_pixels” respectively. When processing the last pixel of the set 
there is one more thing that has to be kept in mind. When the last pixel of the set is 
also the last pixel in a row, its processing is independent of the next pixel set as the 
pixels outside the frame are treated as zeros. Therefore, the last pixel is processed 
together with res and also delayed for the same amount (res_8th_last). When the last 
row is detected, the res_8th_last is output instead of the res_8th. That is how the 
processed output pixel set (pixels_out) is achieved. 
Note that the full circuit from Figure 5.17 includes a component 
“detect_frame_boundries” which is used to detect the frame’s boundaries by observing 
the rising and falling edges of the synchronization signals. The component also delays 
the synchronization signals for the same amount as are the input pixels, keeping the 
processed frame synchronized. 
5.2.1.2 Image Filtering 
The component “pixels_processing” from Figure 5.17 defines what kind of 
processing is done on the pixels. For implementing filtering, the component has to do 
the multiplications of the pixels’ neighborhoods of 3 by 3 with the filter coefficients 
for each pixel of cCol_pRow. The processing of each pixel is done in parallel as shown 
in Figure 5.18. Note that there are two different components for processing the last 
pixel in the set. The component “pixel_7_L” processes the last pixel as if it were the 
5 System Design 73 
 
last pixel in a row. Which of the two is actually output was already explained in Section 
5.2.1.1. 
 
Figure 5.18: Component “pixels_processing” which processes each pixel in parallel. 
The components for processing each pixel are all identical in their function. An 
example of one is shown in Figure 5.19. Each component first extracts the neighboring 
pixels of size 3 by 3 of the pixel it is processing. This is done in component 
“choose_matrix”. Note that for the first and the last pixel, the previous outputs pCol_x 
are also needed. 
 
Figure 5.19: Component for processing a single pixel. 
Once the neighboring pixels are extracted the filter multiplications are done. 
Those are done by the component “calculation” which is shown in Figure 5.20. 
74 5 System Design 
 
 
Figure 5.20: Component “calculation” of image filtering circuit. 
The component “calculation” first extracts a row of the pixel neighborhood and 
corresponding row of the filter coefficients. The extracted values are multiplied 
element-wise and then summed together. This is repeated for the two remaining rows. 
The results of the summed multiplications of rows are added together to get the 
processed pixel value. These operations have to be done at a clock frequency three 
times higher than the rest of the circuit in order to process the incoming data at the 
same rate as the new data is arriving. Processing at a higher frequency was chosen to 
lower the number of multipliers needed. With the described implementation only 3 
multipliers and 5 adders are needed for each of the 8 pixel processed in parallel. The 
same result could be achieved by direct multiplication and later addition of all 
neighboring pixels at the same clock frequency as the rest of the circuit. However, 
such an implementation would require 9 multipliers and 8 adders for each pixel 
processed in parallel. 
5.2.1.3 Edge Detection 
The implementation of edge detection is very similar to the image filtering from 
Section 5.2.1.2 because edge detection is a form of filtering. The only difference 
between the two implementations is in the “calculation” component. In Figure 5.21 the 
“calculation” component of edge detection is shown. Instead of multiplying the 
neighboring pixels with the Prewitt operator masks for horizontal and vertical 
directions the multiplications are replaced with 3 additions and 3 subtractions for each 
direction. The results of applying the operators in both directions are then squared and 
added together. The square root of the sum is calculated at the end. The result is the 
magnitude of the gradient which represents the edges in the image. 
5 System Design 75 
 
 
Figure 5.21: Component “calculation” of edge detection circuit. 
In Figure 5.21 there are some buffers added in the data path. Their purpose is to 
pipeline the calculation process, increasing the maximum frequency at which the 
circuit can operate. Regarding the speed of the calculation the largest delay in the data 
path is produced by a circuit for calculating the square root. To reduce its delay it is 
not implemented using the standard component from the Simulink’s HDL library. 
Instead it is replaced with a LUT. The output is limited to 8 bits because of the image’s 
bit depth, thus the sum can be limited as well. Therefore, it is enough for the LUT to 
have 216 entries. Each entry is an 8 bit number. Hence 64 KB of memory is needed. 
Note that edge detection could also be implemented by just correctly choosing 
the filter coefficients of the presented image filtering implementation. Two of such 
filters would need to be implemented in parallel and the circuit for magnitude 
calculation added. However, it would be very resource inefficient as the multipliers 
would have been used to make just simple multiplications with ±1 and 0. Therefore, 
the presented edge detection implementation presents a better choice. 
5.2.2 Algorithms Using External Memory 
Compared to the other presented algorithms the background subtraction (Section 
4.3), FFC (Section 4.4) and image averaging (Section 4.5) algorithms require at least 
one image to be stored in memory. To store one full frame a total of 2.4 MB is needed. 
76 5 System Design 
 
The FPGA does not include enough SRAM to store such large data. Therefore, these 
data have to be stored in the external DDR memory [28]. As already mentioned, access 
times to external DDR memory are a lot larger than times for accessing SRAM, thus 
limiting the processing capabilities. Therefore, special care has to be taken while 
designing systems which use the DDR memory. 
5.2.2.1 Storing and Accessing Pixels in DDR Memory 
The circuit for accessing the DDR is shown in Figure 5.22. It includes the 
mentioned FDDR controller, an oscillator as the clock source (“OSC”), a clock 
conditioning circuit (“CCC”) for generating all the needed clocks, circuit for FDDR 
controller initialization, and an AXI master for issuing transactions. One can also note 
a component “AXI_timing”. It does timing optimization of the AXI bus signals needed 
because the DDR and the AXI bus run at different frequencies (Section 2.3.2). The 
timing optimization is done according to [128]. 
 
Figure 5.22: Circuit for accessing DDR memory. 
The FDDR controller first has to be correctly initialized. This includes 
initializing its registers and correct power-up sequence of the controller and the DDR 
memory. This is done by the circuit from Figure 5.23 according to [129]. The circuit 
includes a simple processor core (“coreABC”) [130] which does the initialization of 
the FDDR controller’s registers through component “CoreConfigP” [131]. The core 
“CoreResetP” ensures the right reset sequence [132]. 
5 System Design 77 
 
 
Figure 5.23: Circuit for initialization of FDDR controller. 
The AXI master is the component which issues write and read transactions to 
and from the DDR via the FDDR controller. It is implemented as multiple state 
machines, which are presented in Figure 5.24, Figure 5.25 and Figure 5.26. 
 
Figure 5.24: State machine of write channel of AXI master. 
When a write transaction is started, i.e. signal write_start is set, the address to 
which we want to write wr_addr is set on the write channel of the bus (AWADDR). At 
the same time it is signaled that the address on the bus is valid (AWVALID). The master 
78 5 System Design 
 
then waits until the slave accepts the wr_addr (AWREADY). When the address is 
accepted the master provides the first data on the bus (WDATA) and marks it as valid 
(WVALID). It removes the valid flag of the wr_addr as well. The master then waits for 
the slave to accept the provided data (WREADY). When the data is accepted, the master 
can provide new data on the bus for a burst access. When all the data is sent by the 
master (WLAST) and accepted by the slave a new write transaction can start. 
 
Figure 5.25: State machine of read address channel of AXI master. 
Similarly, when a read transaction is started, i.e. signal read_start is set, the 
address from which we want to read rd_addr is set on the read channel of the bus 
(ARADDR) and signaled that there is a valid address on the bus (ARVALID). When the 
slave accepts the read address the master can already provide a new read address even 
if the data from the first address has not yet been received. This is because the slave 
supports overlapping read transactions. 
5 System Design 79 
 
 
Figure 5.26: State machine of read data channel of AXI master. 
At the same time when a read transaction is started, i.e. signal read_start is set, 
and a read address rd_addr is provided by the master, the master can wait to receive 
the data provided by the slave. The FDDR controller signals that new read data is 
available by first clearing the RLAST flag from the previous read transaction. When 
this happens the master signals that it is ready to accept the new data (RREADY). When 
the data is valid on the read bus the master accepts them until the last data of the 
transaction is received (RLAST). Then the data from the next transaction can be 
received. 
As already explained in Section 2.3.2 in order to maximize the throughput of 
memory access it is imperative to access it in bursts and minimize the changing of 
accessing different rows of the DDR. One way of achieving this is to do multiple bursts 
of either read or write operations in a sequence. Therefore, the data to be written to or 
read from the DDR memory has to be buffered to allow multiple transactions in a 
sequence. This is done with the circuit shown in Figure 5.27. The circuit includes the 
component “DDR” which is a modified circuit for accessing the DDR memory (Figure 
5.22). 
80 5 System Design 
 
 
Figure 5.27: Circuit for maximizing DDR memory access throughput. 
The data to be written to (data_in) and the data to be read from (data_out) the 
DDR memory is buffered with First-In-First-Out (FIFO) buffers, named “FIFO_WR” 
and “FIFO_RD” respectively. The FIFOs have to be able to buffer at least enough data 
for one whole sequence. When a read sequence is in progress, the “FIFO_RD” is 
getting filled and “FIFO_WR” is getting emptied. At the end of the read sequence the 
“FIFO_RD” is empty and “FIFO_WR” full. Then a write sequence has to start and the 
roles are reversed. In reality the sequence should start a bit sooner due to the delays 
between the start of the sequence and the first data read or written. 
The component “memory_controller” from the circuit in Figure 5.27 keeps track 
of the fill levels of the FIFOs through their signals almost empty (AEMPTY) and almost 
full (AFULL). The AEMPTY and AFULL signals are used instead of the empty 
(EMPTY) and full (FULL) signals as the sequence should start a bit sooner due to the 
mentioned delays. Based on these control signals a read or a write sequence of bursts 
is initiated. When a sequence is initiated the “memory_controller” sets the write_start 
or read_start signals of the AXI master accordingly. This is presented in the simulation 
of the circuit in Figure 5.28. 
5 System Design 81 
 
 
Figure 5.28: Simulation of circuit for increased DDR memory throughput. 
The Figure 5.28 shows the moment when “FIFO_WR” gets filled with enough 
values that AEMPTY is cleared, signaling that a write sequence can start, i.e. signal 
write_start is set for a clock cycle and the first write address applied (I). The next 15 
bursts follow (II). When the last burst write has started, the read sequence can already 
be initiated (III). Due to the support of overlapping read transactions 5 read addresses 
are applied in sequence (IV). The delay between the applied first read address and the 
first data received can be seen (V). When the first data is read, the “FIFO_RD” signals 
that it is not empty anymore (VI). The remaining read bursts are made. When the 
“FIFO_WR” gets filled with enough values that AEMPTY is cleared again the process 
repeats. 
The “memory_controller” ensures that “FIFO_WR” never gets completely full 
as the new incoming data would have been lost. Similarly, it keeps the “FIFO_RD” 
always filled with at least some data so that there is always valid data read from the 
FIFO. In order for the circuit from Figure 5.27 to function correctly the sum of the data 
rates of data_in and data_out must not exceed the throughput of the DDR memory. 
Otherwise the “FIFO_WR” would overflow and “FIFO_RD” underflow as already 
indicated. 
The number of bursts in a sequence was chosen to be 16. With 16 bursts and the 
burst length of 16 with bus width of 64 bits one full row of the DDR is read or written 
each sequence [51]. If a smaller sequence were used, the memory throughput would 
be lower as different rows of the DDR memory would be used more frequently due to 
the more frequent changes in read and write operations. On the other hand if a larger 
sequence were used, larger FIFOs would be needed, i.e. more resources would be used. 
The component “memory_controller” does one more important thing. It 
generates addresses of the locations for reading and storing the data in DDR memory. 
The sequence of the generated addresses for reading and storing is dependent of the 
applications as it is shown in the following subsections. 
82 5 System Design 
 
As already mentioned the component “DDR” is a modified circuit for accessing 
the DDR memory (Figure 5.22). The only difference is in the AXI master. It provides 
data to be written by enabling reading from “FIFO_WR” through setting the fifo_wr_re 
signal. Similarly, when receiving the read data from the bus it enables writing to 
“FIFO_RD” through setting the fifo_rd_we signal. The write and read data signals of 
the AXI bus are therefore routed directly to the data output of the “FIFO_WR” and the 
data input of the “FIFO_RD”, respectively. 
With the presented implementation of accessing the DDR its maximum possible 
throughput of 5.32 Gbps is achieved. On the other hand, the data rate of the incoming 
set of 8 pixels with a bit depth of 8 bits at a frequency of 37.125 MHz equals 2.376 
Gbps. As already explained, such throughput is large enough for simultaneously 
storing an incoming set of 8 pixels and reading another set or to read two pixel sets out 
of the DDR memory. There is still even some margin left in this case which is 
beneficial as the DDR memory has to be periodically refreshed, lowering the average 
achievable throughput by a small margin. The algorithms which use the DDR memory 
have to be designed according to these limitations. 
Note that for the following algorithms there is no need to store the 
synchronization signals of the incoming Camera Link stream. The incoming 
synchronization signals just need to be delayed correctly in order to keep the processed 
output frame synchronized. For simplicity, the synchronization signals are not shown 
in the following implementations. 
5.2.2.2 Background Subtraction 
A technique for detecting a moving object entering a scene is described in 
Section 4.3. To detect the object the difference between each pixel of the current and 
the background image is evaluated. If the difference of a pixel is above a threshold, it 
is a part of the object in the foreground. Therefore, the evaluation described by the 
Equation (4.5) has to be done on each of the 8 pixels of the incoming set. 
But first the image of the background has to be acquired and stored in memory. 
Later the background image is continuously read from the memory 8 pixels at a time. 
Reading only one pixel set from the DDR memory meets its mentioned constraints. 
The circuit for background subtraction is shown in Figure 5.29. The circuit is 
based on the circuit for accessing DDR memory (Figure 5.27). It includes a modified 
“memory_controller” and two additional components, namely “sub_calc_8” and 
“sub_controller”. 
5 System Design 83 
 
 
Figure 5.29: Circuit for background subtraction. 
The “sub_calc_8” includes 8 parallel implementations of the operation described 
by the Equation (4.5), one for each of the incoming 8 pixels. The operation is 
implemented by the VHDL description presented in Figure 5.30. As it can be seen 
from the code snippet the output is a binary image with pixels of the detected object 
saturated to the maximal value of 255. 
 
Figure 5.30: VHDL code snippet for background subtraction processing. 
The “sub_calc_8” also includes an input named sub_en. As long as it is cleared, 
the input pixel set data_in is directly routed to the output y, i.e. no background 
subtraction is done. This is depicted at the simulation of the circuit in Figure 5.31. 
When the input store_frame gets set (I), the component “sub_controller” holds the 
“memory_controller” and both FIFOs in reset until the start of the next frame (II). 
When the new frame begins it starts getting stored in memory (III). This frame 
represents the background image. The “memory_controller” starts reading out the 
 
-- Extend 8 bit unsigned inputs to signed representation 
 c_e <= signed('0' & c);   -- extend current pixel 
 b_e <= signed('0' & b);   -- extend background pixel 
 -- Subtract 
 y_e <= c_e - b_e; 
 -- Apply threshold 
 y <= (others => '0') when y_e <= THRESHOLD and y_e >= -THRESHOLD else 
          (others => '1'); 
84 5 System Design 
 
background image as soon as a part of it gets written in the memory. But with just 
enough data not to overflow the “FIFO_RD”, i.e. until its afull signal gets set (IV). 
When the frame after the background one begins (V) the signal sub_en is set. The 
sub_en enables the processing of the pixels, thus reading of the background pixels 
from the “FIFO_RD” starts as well. Therefore, it is important the “FIFO_RD” already 
has the first pixels of the background image. The “memory_controller” then constantly 
reads it out (VI). 
 
Figure 5.31: Simulation of background subtraction circuit. 
Note that with the described implementation the background image gets updated 
only when a new signal for storing the background image comes. For some 
applications like outdoor surveillance this could provide inadequate results due to the 
changes in the background because of changing illumination. In such cases the 
background should be updated according to models described in [115], [118]. 
However, those models require more resources and memory accesses which cannot be 
implemented in the used FPGA. 
5.2.2.3 Flat Field Correction 
The FFC technique for correcting the distortions in the image is presented in 
Section 4.4. With a linear model of distortion the image can be corrected according to: 
 𝑈(𝑥, 𝑦) = 𝐾
𝑁(𝑥, 𝑦) − 𝐷(𝑥, 𝑦) 
𝐹(𝑥, 𝑦) − 𝐷(𝑥, 𝑦)
 (5.3) 
For the linear model two reference images are needed. The responses to the dark 
image 𝑈(𝑥, 𝑦) = 0 is denoted 𝐷(𝑥, 𝑦) and the response to the flat image 𝑈(𝑥, 𝑦) = 𝐾 
as 𝐹(𝑥, 𝑦), respectively. Similarly to the background subtraction, the reference images 
first have to be acquired and stored in memory, and later continuously read from it. 
The correction needs to be done on all of the 8 pixels of the incoming pixel set at a 
time. Therefore, two pixel sets need to be read from the memory. This still meets the 
5 System Design 85 
 
mentioned constraints of the DDR memory. The circuit for the FFC is shown in Figure 
5.32. 
 
Figure 5.32: Circuit for flat field correction. 
The circuit works very much like the one for background subtraction (Figure 
5.29), although it differs a bit as two images have to be stored and read. The simulation 
of the circuit is shown in Figure 5.33. The component “FFC_controller” has two input 
signals for initiating storing of 𝐷(𝑥, 𝑦) and 𝐹(𝑥, 𝑦) respectively. After either of the 
signals is set (I and III) the next frame is stored (II and IV) as either 𝐷(𝑥, 𝑦) or 𝐹(𝑥, 𝑦). 
Once both reference images are stored they are continuously read. They are buffered 
by two “FIFO_RD” components. Therefore, the “memory_controller” is also modified 
to route the fifo_rd_we and the read_data from the “DDR” to the correct “FIFO_RD”. 
Note that here the processing starts at the second frame after the last reference frame 
has been stored (VI). In the frame after the last reference frame both “FIFO_RD” 
components are filled (V). This is not done already during the storing of the reference 
frames like in the implementation of the background subtraction because it would 
require a more complex “memory_controller”. 
86 5 System Design 
 
 
Figure 5.33: Simulation of flat field correction circuit. 
The last difference between the FFC and background subtraction circuits is of 
course in the calculation of the processed pixels. The FFC implements Equation (5.3) 
according to Figure 5.34. The component “FFC_calc_8” implements 8 of such circuits 
in parallel. Similarly to “sub_calc_8”, it includes an enable signal (en_ffc). As long as 
the signal is not set, the input pixels are simply routed to the outputs. The en_ffc gets 
set when the reference images have been acquired. 
 
Figure 5.34: Circuit for calculation of flat field corrected image. 
When examining the Equation (5.3) and its implementation (Figure 5.34) it 
would seem that it can be optimized by storing the difference in the denominator 
directly in the memory. This would indeed save one adder for the subtraction but on 
the account of increased complexity of the control circuit. For simplicity reasons it was 
decided against such an implementation. Moreover, the result could also be divided by 
parameter K, removing the need for the multiplier during the operation. However, this 
would vastly decrease the accuracy of the result due to intermediate rounding. 
In order to remove the need for a multiplier it can be assumed that parameter K 
can be set to a power of 2. In this case the multiplication can be replaced with a simple 
bit shift, simplifying the circuit. When applying the flat image it is difficult to set the 
5 System Design 87 
 
source to an exact value because one would need a reference sensor with the same 
responsivity as the one in the camera to set the correct intensity of the source. However, 
the distortions of the camera’s lens are usually the smallest at the center of the image. 
Moreover, it can be assumed that the manufacturing differences between the 
responsivities of individual pixels are random, meaning their mean value is 
approximately 0. Therefore, one can take the average of the center pixels as the 
reference value when setting the source for applying the flat image. As explained in 
Section 4.4 it should be close to saturation. Therefore, the value of 128 is the most 
appropriate choice. 
In Figure 5.34 one can note a multiplexer. Its purpose is to define the output in 
case of division by zero. This can happen if 𝐷(𝑥, 𝑦) = 𝐹(𝑥, 𝑦) which is rarely the case. 
However, if it does happen, the two reference pixels are not enough to correct the 
distortion. To notice such mistakes in the output image a zero is output. 
5.2.2.4 Image Averaging 
In Section 4.5 the image averaging algorithm is presented. The equation of the 
running average filter is: 
 𝑦𝐹(𝑛) = 𝑦𝐹(𝑛 − 1) +
1
𝐿
[𝑥(𝑛) − 𝑥(𝑛 − 𝐿)] (5.4) 
where 𝑦𝐹(𝑛) is a pixel of the averaged output image, 𝑦𝐹(𝑛 − 1) the previous output 
value of that pixel, 𝑥(𝑛) the incoming value of the pixel and 𝑥(𝑛 − 𝐿) its L-th previous 
value. The parameter L denotes the length of the averaging filter. 
In order to filter the incoming pixels from the camera the filter has to be applied 
to each of the 8 pixels of the incoming pixel set. As mentioned in Section 4.5 
implementation of such a filter would need 4 memory accesses for each pixel. The 
DDR memory of the used FPGA board does not provide high enough throughput for 
this. However, as it is explained in Section 4.5 the running average filter can be the 
simplified to: 
 𝑦𝑆(𝑛) = (1 −
1
𝐿




The simplified filter form satisfies the limitations of the DDR memory 
throughput, i.e. the pixel set of the previous output 𝑦𝑆(𝑛 − 1) is read and the output 
pixels 𝑦𝑆(𝑛) of the current set written back to the memory. A direct implementation 
of the filter from Equation (5.5) would yield the circuit in Figure 5.35. Note that such 
an implementation is very resource inefficient as it requires a multiplier and a divider. 
88 5 System Design 
 
 
Figure 5.35: Direct implementation of image averaging algorithm. 
For more resource efficient implementation it is assumed that the length of the 
averaging filter L is a power of 2. With such an assumption there is no need for a 
hardware divider because the division by L can be replaced with a simple shift of bits 
by L to the right. Note that L is limited to 128 because the pixels have a bit depth of 8 
bits. 
Moreover, by rewriting the Equation (5.5) into Equation (5.6) one can notice that 
there is no need for a multiplier as well. This can also be seen from the implementation 
of Equation (5.6) in Figure 5.36. 
 𝑦𝑆(𝑛) = 𝑦𝑆(𝑛 − 1) +
1
𝐿
(𝑥(𝑛) − 𝑦𝑆(𝑛 − 1)) (5.6) 
 
Figure 5.36: Resource efficient implementation of image averaging algorithm. 
However, the implementation from Figure 5.36 does not give the exact same 
result as the original non-simplified filter Equation (5.4) even if the assumption 
𝑥(𝑛 − 𝐿) ≅ 𝑦𝑆(𝑛 − 1) holds true. Both filters should in the ideal case produce the 
same results, meaning their difference Equation (5.7) should ideally be 0. 
 
𝑑𝑖𝑓(𝑛) = 𝑦𝐹(𝑛) − 𝑦𝑆(𝑛) =
= 𝑦𝐹(𝑛 − 1) − 𝑦𝑆(𝑛 − 1) −
1
𝐿
(𝑥(𝑛 − 𝐿) − 𝑦𝑆(𝑛 − 1)) 
(5.7) 
5 System Design 89 
 
For digital filters it is assumed that its memory elements are initialized with 
zeros, i.e. 𝑓(𝑛 < 0) = 0. Therefore, both filters give the same output at the start, i.e. 
𝑦𝐹(0) = 𝑦𝑆(0) = 𝑥(0)/𝐿. However, at all other points of time there is a difference 











; 0 < 𝑛 < 𝐿
𝑑𝑖𝑓(𝐿 − 1) −
1
𝐿
∑(𝑥(𝑘 − 𝐿) − 𝑦𝑆(𝑘 − 1))
𝑛
𝑘=𝐿
;  𝑛 ≥ 𝐿
  (5.8) 
From the Equation (5.8) it is clear that even if the assumption for the 
simplification holds true, namely that 𝑥(𝑛 − 𝐿) ≅ 𝑦𝑆(𝑛 − 1), there will be a 
difference between the two filter implementations. 
The source of the constant error is in the calculations of the first L output images. 
The non-simplified version treats 𝑥(𝑛 − 𝐿) = 0 for the first L images. On the other 
hand, the simplified version treats 𝑦𝑆(𝑛 − 1) = 0 just fort the first image. Therefore, 
the assumption 𝑥(𝑛 − 𝐿) ≅ 𝑦𝑆(𝑛 − 1) is wrong for the next L-1 images. This is solved 
by the implementation presented in Figure 5.37. For the first L images the 
𝑥(𝑛 − 𝐿) part of the circuit is replaced with zeros. The signal n<L is set once there 
have been L images processed, giving in the ideal case the same result as the non-
simplified version of the filter. 
 
Figure 5.37: Modified implementation of image averaging algorithm. 
An overall implementation of the image averaging algorithm including the 
circuit for accessing the DDR memory is shown in Figure 5.38. The 
“memory_controller” is modified to enable reading from the “FIFO_RD” at the start 
of the second frame after the FIFO has already been filled with pixels of the first frame. 
As already indicated, the component “avr_calc_8” implements 8 circuits from Figure 
90 5 System Design 
 
5.37 in order to process each of the 8 pixels form the set. It also includes a frame 
counter to correctly set the signal n<L. 
 
Figure 5.38: Full circuit of image averaging. 
The assumption made here for the simplification of the image averaging filter is 
valid for the setup and applications in this thesis. The camera used has a high frame 
rate, meaning the sequential frames do not differ much for slowly changing processes 
like the observations of the space. 
5.2.2.5 Two-Dimensional (Inverse) Fast Fourier Transform 
The 2D (I)FFT is useful when applying large filters or when we are interested in 
the frequency representation of an image as indicated in Section 4.1. The most 
straightforward way of performing the 2D (I)FFT is to first perform a 1D (I)FFT on 
each row of the image and store the results. Thereafter the 1D (I)FFT is performed on 
each column of the previously calculated result. Such algorithms are referred to as 
Row-Column (RC) algorithms. The implementation of RC algorithm in an FPGA is 
feasible for small image sizes for which the intermediate results can be stored in 
SRAM [133]. However, if large images are used, the SRAM is usually too small. 
Therefore, the intermediate results need to be stored in DDR. The DDR has a lot longer 
5 System Design 91 
 
access times when being accessed column wise. Therefore, different algorithms are 
being developed to overcome the limitations of accessing the DDR. One way is to 
transpose the results of the first 1D (I)FFT, thus changing column-wise accesses to 
row-wise and speeding the overall calculation [134]. Nevertheless, the transpose 
algorithms require an additional memory access [135]. Even more efficient approaches 
exist. However, none of the reported implementations [136], [137] achieve the frame 
rate set in the thesis even with DDR memories with higher throughputs than the one 




6 System Verification 
After the system has been designed its functionality was verified and evaluated. 
For the verification process each individual component of the system was verified 
independently. At the end the system was joined together to demonstrate a real-time 
time processing application of a live video stream as well. 
6.1 Verification of Camera Link Interface 
To verify the implemented Camera Link Tx and Rx first the designed Channel 
Link Tx and Rx were verified as they are used by the Camera Link I/F. The 
verifications were carried out by simulating the circuits or doing tests in hardware and 
observing if the output data matches the expected outcome. The individual 
verifications are described in detail in the following subsections. 
6.1.1 Channel Link Transmitter 
Figure 6.1 shows the outputs of the Channel Link Tx when simulating it with the 
data input is connected to a constant value of x“61D5DD8” (which is expected to 
produce output patterns of: “0011100”, “1100011”, “1010101” and “1111000” at 
LVDS data lines 3 to 0 respectively). For simplicity, only nominally positive signals 
of the outputs are shown in the figure. As it can be seen from the figure, the data 
corresponds to the Channel Link standard as presented in Figure 5.4, confirming the 
correctness of the designed Channel Link Tx component. 
 
Figure 6.1: Simulation results of developed Tx component. The read lines mark one parallel clock 
cycle. 
94 6 System Verification 
 
6.1.2 Channel Link Receiver 
The compatibility of the designed Channel Link Rx with the designed Channel 
Link Tx was tested in a loopback configuration (Figure 6.2) implemented in the FPGA 
of the development board. The two FMCs were used to provide the loopback 
connection via a Camera Link cable. Only one data channel of 7 bits was tested at a 
time. The data to be transmitted via the I/F were set using 7 DIP switches on the 
development board. The data was continuously sent with the designed Tx. The 
received data of the Rx were connected to the LEDs on the FPGA. The sent and 
received data were identical, confirming that the designed Channel Link Tx and Rx 
are compatible. 
 
Figure 6.2: Channel Link loopback test using Camera Link FMCs and cable. 
6.1.3 Camera Link Transmitter 
The functionality of the custom Camera Link Tx was verified by implementing 
it independently from the Rx module in the FPGA and connecting it to the frame 
grabber of the PC. An image generator was also implemented in the FPGA and 
connected to the inputs of the Tx. The image generator was designed in Simulink from 
which its VHDL description was automatically generated by the HDL coder. The 
generator outputs pixels together with Camera Link synchronization signals to produce 
a constant image which has rows of different shades of gray (Figure 6.3). The 
synchronization signals were defined according to [30] to emulate the camera chosen 
for the thesis. 
6 System Verification 95 
 
 
Figure 6.3: Reference image from MATLAB for Camera Link transmitter verification. 
Figure 6.4 presents a screenshot of the received image by the PC. As it can be 
seen, the received (Figure 6.4) and the reference (Figure 6.3) image are identical, 
confirming the correctness of the designed Camera Link Tx. 
 
Figure 6.4: Received image by PC for Camera Link transmitter verification. 
96 6 System Verification 
 
The Tx module was further tested to successfully demonstrate the transmission 
of images at the frame rate of 118.23 fps. For this purpose the image generator was 
modified to output the same image which is circularly shifted for a single line each 
frame, producing a moving image. 
6.1.4 Camera Link Receiver 
The verification of the custom Camera Link Rx was carried out by a hardware 
test with the use of the verified Camera Link Tx. The modules were implemented in 
the FPGA according to Figure 6.5. The camera was connected to the Rx, and the Tx 
was connected to the frame grabber of the PC. 
 
Figure 6.5: Schematic of FPGA receiving the Camera Link signal from camera and then transmitting 
it to PC. 
The designed Rx proved to work according to the Camera Link standard as 
images were successfully transmitted from the camera to the PC through the FPGA at 
the frequency of 37.125 MHz at all Camera Link configurations. An example of a 
received image can be seen in Figure 6.6. 
 
Figure 6.6: Received image from the camera through the FPGA. 
6 System Verification 97 
 
6.2 Verification of Image Processing Algorithms 
In the following subsections the implementations of the image processing 
algorithms described in Section 5.2 are verified. Their verification is based on the 
simulations of the implemented circuits described in VHDL. The test benches for the 
simulations of the circuits are written in Verilog. The model of the DDR memory, 
which is used in some implementations, is taken from [50, p. 7]. The simulations are 
done with ModelSim [138]. For the purpose of the simulations the images to be 
processed are converted with MATLAB to a Camera Link stream. The stream is 
exported in a .txt file so that it can be read from the test bench. Similarly, the test bench 
outputs a Camera Link stream in a .txt file. The processed images are reconstructed 
back with MATALB and evaluated. Moreover, the processed images are compared 
with the processed images of the same algorithms implemented in MATALB using 
the Image Processing Toolbox. The described verification flow of implementations of 
the image processing algorithms is depicted in Figure 6.7. 
 
Figure 6.7: Verification flow of implementations of image processing algorithms. 
98 6 System Verification 
 
In order to evaluate the processed images they are displayed. The absolute 
difference between the reference and the implemented versions is analyzed as well. To 
evaluate the difference between the versions also the histogram of the resulting 
absolute difference is observed. 
Note that the simulations take a long time to complete, especially if multiple 
images need to be processed in sequence, i.e. by background subtraction, FFC and 
image averaging. Therefore, those algorithms are evaluated on images of smaller size 
than the maximal image size of the camera. 
6.2.1 Image Filtering 
The implementation of the image filtering algorithm is first verified with a filter 
size of 3 by 3. The filter implemented a Gaussian LPF with standard deviation of 0.5. 
It is generated in MATLAB with the function ‘fspecial’. The reference filtering of the 
image with the generated filter is done using function ‘imfilter’. The image of Saturn 
(Figure 4.2) is used as the input image for the verification. In Figure 6.8 results of both 
methods are shown as well as the absolute difference between them and its 
corresponding histogram. 
 
Figure 6.8: Image processed by the implemented circuit for image filtering (top left), the reference 
image (top right) and their absolute difference (bottom left) with its histogram (bottom right). 
The two images seem the same. Also the image of the absolute difference does 
not show visible differences. However, the histogram shows that not all the pixels have 
6 System Verification 99 
 
the same value, i.e. the difference image is not completely black. The source of the 
difference lies in the rounding of the coefficients. For the hardware implementation 
they are rounded to a 16 bit fixed floating point representation. If they were 
implemented with double-precision floating-point format, there would be no 
difference. However, such an implementation would require a lot more hardware 
resources, e.g. floating-point multipliers and larger memories. The maximum 
difference using the 16 bit fixed floating point presentation is 1 and altogether below 
3% of all pixels differ. This is sufficient for all practical applications. 
The implementation of image filtering was also verified with a filter size of 7 by 
7. The same small difference due to rounding was observed there as well. Note that 
larger filters were not constructed due to two reasons. Firstly, the implementation 
described in Section 5.2.1.1 is more difficult to generalize on larger filter sizes. And 
secondly, the filter size of 7 by 7 uses up most of the available resources in the used 
FPGA. Therefore, implementing larger filters in this FPGA is infeasible. 
6.2.2 Edge Detection 
Edge detection is a form of filtering. Therefore, MATLAB uses the same 
functions for generating the edge detection filter (‘fspecial’ with parameter “Prewitt”) 
and applying it (‘imfilter’). With the described function for filter generation only a 
mask for one direction of the Prewitt operator is generated. To get the mask for the 
other direction the generated filter is simply transposed. Both masks are applied to the 
image to be processed. To get the reference result of edge detection both filtered 
images are squared and added together. The square root of the sum is calculated in 
order to get the magnitude of the gradient of the image. In Figure 6.9 results of the 
simulations of the implemented edge detection and the described method of calculation 
of the reference response are displayed. Their absolute difference and its 
corresponding histogram are shown as well. For the verification the image of Saturn 
(Figure 4.2) was again used. 
100 6 System Verification 
 
 
Figure 6.9: Image processed by the implemented edge detection circuit (top left), reference image (top 
right) and their absolute difference (bottom left) with its histogram (bottom right). 
From the images in Figure 6.9 it can be clearly seen that both methods yield the 
same results as their absolute difference is a completely black image which can be seen 
from the histogram. In this case there is no problem with precision of the calculations 
as the Prewitt operator includes coefficients of only either ±1 or 0. 
6.2.3 Background Subtraction 
For the verification of the background subtraction the left image in Figure 6.10 
is used as the background image. The right image in the figure is used as the image to 
be processed, i.e. to classify its pixels as either back- or foreground. The threshold 
parameter Th (Equation (4.5)) of the implemented algorithm is set to the value of 15. 
The image processed by the implemented circuit is displayed in Figure 6.11. 
6 System Verification 101 
 
 
Figure 6.10: Background image (left) and image to classify its pixels as either back- or foreground 
(right) [139]. 
 
Figure 6.11: Image processed by the implemented background subtraction circuit. 
From Figure 6.11 it can clearly be seen that some pixels are incorrectly 
classified. However, this is due to the simplicity of the algorithm and not due to its 
implementation. The problem is that the illumination of the background has changed. 
The algorithm does not take such changes into account. As already mentioned, for 
better classification other background subtraction methods should be used. However, 
they cannot be implemented in the used FPGA due to resource limitations. 
6.2.4 Flat Field Correction 
For the verification of the FFC algorithm the distortions of the camera have to 
be modeled. As already indicated in Section 4.4 only linear distortions are considered. 
Those are defined by the two model parameters, namely the responses to the dark 
image 𝐷(𝑥, 𝑦) when no light comes to the sensor, and the response to the flat 
image 𝐹(𝑥, 𝑦), when the sensor is exposed to a constant light. As explained in the 
implementation of the FFC (Section 5.2.2.3) the intensity of the flat image K is set to 
102 6 System Verification 
 
128. For the verification the images in Figure 6.12 are used. The image on the left 
represents 𝐷(𝑥, 𝑦) and the image on the right 𝐹(𝑥, 𝑦). 
 
Figure 6.12: Responses to the dark image 𝐷(𝑥, 𝑦) (left) and response to the flat image 𝐹(𝑥, 𝑦) (right). 
The input image (Figure 6.13 - top left) is distorted with the applied model. The 
resulting distorted image is displayed at the top right of Figure 6.13. The distorted 
image is then processed with the circuit which implements FFC. The processed image 
is displayed at bottom left of Figure 6.13. The complement of the absolute difference 
between the corrected image and the original is also shown (Figure 6.13 - bottom 
right). 
6 System Verification 103 
 
 
Figure 6.13: Original image (top left) [140], distorted image (top right), image corrected by the 
implemented FFC circuit (bottom left), and the complemented absolute difference between the 
original and the corrected image (bottom right). 
When comparing the corrected and the distorted image one can clearly see that 
the image has been improved. On the first glance it seems that the corrected image 
equals the original. However, upon thorough inspection one can notice the differences 
between them. Those are apparent in the complemented image of their absolute 
difference. As it can be seen the differences are most notable at the corners. The 
differences between the original and the corrected version stem from the nature of the 
distortions themselves. For example due to the optical distortions, light coming to the 
pixels at the edges of the sensors is attenuated. Therefore, small variations in the light 
intensity of neighboring pixels in those regions are not visible anymore in the captured 
image if they are attenuated below the sensitivity of the pixel. Those distortions cannot 
be corrected anymore. The model reflects these kinds of distortions. 
104 6 System Verification 
 
6.2.5 Image Averaging 
In order to be able to observe the benefits of the image averaging algorithm first 
a set of nosy images need to be constructed. They are created with MATLAB using 
function ‘imnoise’ with its parameters set to generate noise with a Gaussian 
distribution with zero mean value and variance of 0.01. An example of the constructed 
image with added noise is shown on the left of Figure 6.14. The original image is 
shown on the right. 
 
Figure 6.14: Image with added noise (left) and its original (right). 
For the verification of the implemented image averaging algorithm (Section 
5.2.2.4), i.e. the simplified filter form, the length of the filter L is set to 8. The result 
of averaging 8 nosy images with the implemented circuit is shown at the top left of 
Figure 6.15. If the FPGA supported 4 accesses to the DDR memory, the non-simplified 
filter form could have been implemented (Section 5.2.2.4). The output of such a 
supposed circuit is shown for comparison (top right). The complement of their absolute 
difference is displayed as well together with the histogram of the absolute difference 
(bottom left and bottom right, respectively). 
6 System Verification 105 
 
 
Figure 6.15: Output image of the implemented circuit of the simplified running average filter (top 
left), output image of the supposed implementation of the original non-simplified version of the 
algorithm (top right), complement of their absolute difference (bottom left), and the corresponding 
histogram (bottom right). 
From the figure it is evident that both filters reduce the noise in the image. The 
image of the complement of their absolute difference and its histogram show that there 
is a small difference between them. Nevertheless, the difference is small enough to be 
neglected for most practical purposes. Therefore, these results validate the 
simplification of the filter. 
6.3 System Demonstration 
The designed system was joined together to implement the designed Camera 
Link Tx and Rx as well as the edge detection algorithm to demonstrate the use of the 
system on a live video stream coming from the camera. The setup is presented in 
Figure 6.16 and a captured image can be seen in Figure 6.17. 
 
Figure 6.16: Demonstration setup. 
106 6 System Verification 
 
 






In this master’s thesis an FPGA based system for real-time processing of a 
Camera Link image stream was successfully developed. The system is based on 
Microsemi’s Smartfusion2 Advanced Development Kit which includes a 
SmartFusion2 M2S150 SoC FPGA and an industrial camera which uses the Camera 
Link I/F. The system is meant to be inserted between the camera and the intended 
receiver, providing real-time processing of the images captured by the camera. It is 
capable of processing images up to the frame rate of 118.23 fps at the maximum image 
size provided by the used camera, namely 1216 by 1936 pixels. Therefore, the 
designed system met the initially set goals. 
A Camera Link Rx was designed for acquiring the images sent by the camera. 
In order to transmit the processed images to the intended receiver, a Camera Link Tx 
was designed as well. Because the Camera Link I/F is based on the Channel Link I/F, 
a Channel Link Rx and Tx were developed first. The circuits were implemented in the 
chosen FPGA and support the Camera Link I/F at base, medium and full configurations 
at frequencies up to 38.2 MHz. 
Some of the widely used algorithms for preprocessing of images were designed 
to be implemented in the system. The implemented algorithms were: 
• image filtering, e.g. for noise reduction, 
• edge detection, 
• background subtraction for detection of moving objects, 
• image averaging for noise reduction, and 
• flat-field correction (FFC) for lowering distortions in the image produced by 
the camera. 
Some of the listed algorithms need to store multiple images, for which DDR 
memory is used. To achieve the aforementioned frame rate also for the algorithms 
using the DDR memory a circuit for accessing the DDR needed to be designed. The 
circuit achieved the highest possible throughput of the used DDR memory. 
108 7 Conclusion 
 
The functionality of the developed Camera Link I/F circuits as well as of the 
implemented image processing algorithms were successfully verified. The former was 
verified by simulating the circuits as well as testing them in hardware and observing if 
the output data matched the expected outcome. The latter was verified by comparing 
the results with reference implementations of the algorithms in MATLAB. At the end 
the system was joined together to successfully demonstrate a real-time processing 
application of edge detection on a live video stream coming from the camera. 
Nevertheless, the performance of the system could be further improved. First of 
all, the frame rate of the processed images could be increased if the implemented 
Camera Link Rx and Tx supported higher frequencies. As explained, for the FPGA 
used in the thesis this would require the use of DDR registers of the I/Os because of 
their frequency limitations. The use of DDR registers would increase the complexity 
of the whole design but allow the Camera Link I/F to function at frequencies of up to 
76.4 MHz. Nevertheless, even this increased operating frequency would not allow 
processing of images at frame rates close to the camera’s maximum for all the 
implemented algorithms. The processing frame rate of algorithms which use the DDR 
memory would not increase much. The reason for this is the throughput limitation of 
the DDR memory. To be more precise the limitation lies in the FDDR controller for 
accessing the DDR. Its operating frequency is limited to 333 MHz due to fabric 
limitations. Because it provides the clock for the DDR it also limits its use. As 
mentioned, the FDDR is implemented as an ASIC. Therefore, it is not possible to 
increase its throughput. The only way to also vastly improve the processing rate of 
algorithms which use the DDR memory would be to use other development boards. 
As mentioned those are a lot more expensive than the one used in this thesis. Moreover, 
on a development board with a more powerful FPGA other also high-level algorithms 






[1] B. Ranft and C. Stiller, “The Role of Machine Vision for Intelligent Vehicles,” 
IEEE Trans. Intell. Veh., vol. 1, no. 1, pp. 8–19, Mar. 2016. 
[2] Y. Yang, H. Luo, H. Xu, and F. Wu, “Towards Real-Time Traffic Sign Detection 
and Classification,” IEEE Trans. Intell. Transport. Syst., vol. 17, no. 7, pp. 2022–
2031, Jul. 2016. 
[3] S. H. Shaikh, K. Saeed, and N. Chaki, Moving object detection using background 
subtraction. New York: Springer, 2014. 
[4] S. A. Meshram and R. S. Lande, “Traffic surveillance by using image processing,” 
in 2018 International Conference on Research in Intelligent and Computing in 
Engineering (RICE), San Salvador, 2018, pp. 1–3. 
[5] G. Dougherty, Digital Image Processing for Medical Applications. New York: 
Cambridge University Press, 2009. 
[6] B. G. Batchelor, Ed., Machine vision handbook. London: Springer, 2012. 
[7] K. Akiyama et al., “First M87 Event Horizon Telescope Results. III. Data 
Processing and Calibration,” The Astrophysical Journal Letters, p. 52, 2019. 
[8] “Real-time anomaly and pattern recognition in high-resolution optical satellite 
imagery for future on-board image processing on Earth observation satellites,” 
PhD Thesis, Universität der Bundeswehr München, 2014. 
[9] NASA, “NASA’s ECOSTRESS Detects Amazon Fires from Space,” NASA/JPL. 
[Online]. Available: https://www.jpl.nasa.gov/news/news.php?feature=7490. 
[Accessed: 28-Sep-2019]. 
[10] B. Brown, “Camera Connections,” Vision Systems Design, 01-Apr-2008. 
[Online]. Available: https://www.vision-systems.com/non-factory/environment-
agriculture/article/16739233/camera-connections. [Accessed: 22-Sep-2019]. 
[11] C. Li, S. Balla-Arabe, and F. Yang-Song, Architecture-Aware Optimization 
Strategies in Real-time Image Processing. Great Britain and USA: John Wiley & 
Sons, Inc., 2017. 
[12] K. G. Shin and P. Ramanathan, “Real-time computing: a new discipline of 
computer science and engineering,” Proc. IEEE, vol. 82, no. 1, pp. 6–24, Jan. 
1994. 
[13] R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd ed. New 
Jersey: Prentice-Hall, 2002. 
110 References 
 
[14] MathWorks, “Simulink - Simulation and Model-Based Design.” [Online]. 
Available: https://www.mathworks.com/products/simulink.html. [Accessed: 22-
Sep-2019]. 
[15] MathWorks, “HDL Coder - MATLAB & Simulink.” [Online]. Available: 
https://www.mathworks.com/products/hdl-coder.html. [Accessed: 22-Sep-2019]. 
[16] MathWorks, “Image Processing Toolbox.” [Online]. Available: 
https://www.mathworks.com/products/image.html. [Accessed: 22-Sep-2019]. 
[17] MathWorks, “Vision HDL Toolbox.” [Online]. Available: 
https://www.mathworks.com/products/vision-hdl.html. [Accessed: 01-Oct-2019]. 
[18] MathWorks, “Release Notes for Vision HDL Toolbox - MATLAB & 
Simulink.” [Online]. Available: 
https://www.mathworks.com/help/visionhdl/release-notes.html. [Accessed: 01-
Oct-2019]. 
[19] MathWorks, “MathWorks Announces Release 2019b of MATLAB and 
Simulink.” [Online]. Available: 
https://www.mathworks.com/company/newsroom/mathworks-announces-
release-r2019b-of-matlab-and-simulink.html. [Accessed: 01-Oct-2019]. 
[20] ESO, “MICADO.” [Online]. Available: https://www.eso.org/public/teles-
instr/elt/elt-instr/micado/. [Accessed: 22-Sep-2019]. 
[21] MPE, “MICADO.” [Online]. Available: http://www.mpe.mpg.de/ir/micado. 
[Accessed: 22-Sep-2019]. 
[22] ESO, “The European Extremely Large Telescope (‘ELT’) Project.” [Online]. 
Available: https://www.eso.org/sci/facilities/eelt/. [Accessed: 22-Sep-2019]. 
[23] K. Kuehn and R. Hupe, “Real-Time Analysis of Large Astronomical Images,” 
Journal of Astronomical Instrumentation, vol. 1, no. 1, Mar. 2012. 
[24] Jai, “GO-2400-PMCL datasheet.” [Online]. Available: 
https://www.jai.com/uploads/documents/English-Manual-Datasheet/GO-
Series/Datasheet_GO-2400-PMCL.pdf. 
[25] SiliconSoftware, “Datasheet SiliconSoftware microEnable 5 marathon VCL.” 
[Online]. Available: https://silicon.software/pdf-generator/?product_id=763. 
[26] SiliconSoftware, “microDisplay.” [Online]. Available: 
http://www.siliconsoftware.de/download/live_docu/RT5/en/Tools/microDisplay.
html. 
[27] Microsemi, “PB0115: Product Brief SmartFusion2 SoC FPGA revision 27.” 
2018. 
[28] Microsemi, “UG0557: User Guide SmartFusion2 SoC FPGA Advanced 
Development Kit revision 4.” 2016. 
[29] Alpha Data, FMC-CAMERALINK User Manual revision 2.7. 2019. 
[30] Jai, “User Manual: GO-2400M-PMCL and GO-2400C-PMCL.” [Online]. 
Available: https://www.jai.com/uploads/documents/English-Manual-
Datasheet/GO-Series/Manual_GO-2400-PMCL.pdf. 





[32] Microsemi, “Boards and Kits.” [Online]. Available: 
https://www.microsemi.com/product-directory/design-resources/1712-dev-kits-
boards. 
[33] Microsemi, “PB0051: Product Brief RTG4 FPGA revision 11.” 2019. 
[34] Digi-Key, “RTG4-DEV-KIT-CG-1.” [Online]. Available: 
https://www.digikey.com/product-detail/en/microsemi-corporation/RTG4-DEV-
KIT-CG-1/RTG4-DEV-KIT-CG-1-ND/7696692. 
[35] Digi-Key, “M2S150-ADV-DEV-KIT.” [Online]. Available: 
https://www.digikey.com/products/en?keywords=smartfusion2%20advance%20
development%20kit. 
[36] Xilinx, “Boards and Kits by Board Function.” [Online]. Available: 
https://www.origin.xilinx.com/products/boards-and-kits/see-all-bk-board-
function.html. 
[37] Xilinx, “Xilinx Kintex-7 FPGA KC705 Evaluation Kit.” [Online]. Available: 
https://www.origin.xilinx.com/products/boards-and-kits/ek-k7-kc705-
g.html#hardware. 
[38] Xilinx, “Xilinx Zynq-7000 SoC ZC702 Evaluation Kit.” [Online]. Available: 
https://www.origin.xilinx.com/products/boards-and-kits/ek-z7-zc702-g.html. 
[39] Xilinx, UltraScale Architecture UltraScale Architecture DSP Slice: User 
Guide: UG579 (v1.9). 2019. 
[40] Synopsys, “Inferring Microsemi SmartFusion2, IGLOO2 and RTG4 MACC 
Blocks.” 2016. 
[41] Microsemi, “UG0445: User Guide SmartFusion2 IGLOO2 Fabric revision 7.” 
2019. 
[42] S. Hauck and A. DeHon, Eds., Reconfigurable Computing: The Theory and 
Practice of FPGA-Based Computation. Elsevier, 2008. 
[43] Microsemi, “UG0449: User Guide SmartFusion2 and IGLOO2 Clocking 
Resources revision 7.” 2017. 
[44] Microsemi, “IGLOO2/SmartFusion2: Clock Conditioning Circuit with PLL 
Configuration.” 2014. 
[45] Microsemi, “DS0128: Datasheet IGLOO2 FPGA and SmartFusion2 SoC 
FPGA revision 12.” 2018. 
[46] L. Null and J. Lobur, The essentials of computer organization and architecture, 
Fourth edition. Burlington, Massachusetts: Jones & Bartlett Learning, 2015. 
[47] Microsemi, “SmartFusion2 FPGA Fabric DDR Controller Configuration.” 
2014. 
[48] Microsemi, “Application Note AC409: Connecting User Logic to AXI 
Interfaces of High-Performance Communication Blocks in the SmartFusion2 
Devices - Libero SoC v11.7.” 2016. 
[49] ARM, “AMBA AXI Protocol Specification v1.0.” 2003. 
[50] Microsemi, “Application Note AC422: SmartFusion2 - Optimizing DDR 
Controller for Improved Efficiency - Libero SoC v11.7.” 2016. 




[52] P. Loshin, TCP/IP Clearly Explained. Morgan Kaufmann Publishers, 2003. 
[53] National Instruments, “Choosing the Right Camera Bus,” 05-Mar-2019. 
[Online]. Available: http://www.ni.com/de-de/innovations/white-
papers/12/choosing-the-right-camera-bus.html. 
[54] AIA, EMVA, and JIIA, “Global Machine Vision Interface Standards.” 2016. 
[55] National Semiconductor, “Channel Link II Design Guide.” 2011. 
[56] ISO, “Information Technology - Open System Interconnection - Basic 
Reference Model: The Basic Model.” 1994. 
[57] PULNiX America Inc., “Camera Link: Specifications of the Camera Link 
Interface Standard for Digital Cameras and Frame Grabbers.” 2000. 
[58] Texas Instruments, “Interface Circuits for TIA/EIA-644 (LVDS): Design 
Notes.” 2002. 
[59] “Low-voltage differential signaling.” [Online]. Available: 
https://en.wikipedia.org/wiki/Low-voltage_differential_signaling. [Accessed: 22-
Sep-2019]. 
[60] “Transition-minimized differential signaling.” [Online]. Available: 
https://en.wikipedia.org/wiki/Transition-minimized_differential_signaling. 
[Accessed: 22-Sep-2019]. 
[61] R. Murphy, “What is TMDS and why is it in my HDMI?” [Online]. Available: 
https://www.ramelectronics.net/tmds.aspx. [Accessed: 22-Sep-2019]. 
[62] National Semiconductor, “National Semiconductor Channel Link Design 
Guide.” 2006. 
[63] Texas Instruments, “DS90CR287/DS90CR288A datasheet.” 2013. 
[64] “Channel Link.” [Online]. Available: 
https://en.wikipedia.org/wiki/Channel_Link. [Accessed: 22-Sep-2019]. 
[65] Texas Instruments, “DS90CR485 datasheet.” 2013. 
[66] National Semiconductor, “National Semiconductor’s Channel Link III SerDes 
First to Integrate Zero Latency Control Channel for Industrial Video 
Applications.” 2010. 
[67] National Semiconductor, “Channel Link Embedded Clock Serializers and 
Deserializers.” 2010. 
[68] Compaq, Intel, Microsoft, and NEC, “Universal Serial Bus Specification: 
Revision 1.1.” 1998. 
[69] Compaq et al., “Universal Serial Bus Specification: Revision 2.0.” 2000. 
[70] Hewlett-Packard, Intel, Microsoft, ST-NXP, and Texas Instruments, 
“Universal Serial Bus 3.0 Specification: Revision 1.0.” 2008. 
[71] Apple et al., “Universal Serial Bus 3.2 Specification: Revision 1.0.” 2017. 
[72] NXP Semiconductors, “UM10204: I2C-bus specification and user manual.” 
2014. 
[73] P. Łukasz, O. Marcin, K. Dariusz, and J. Marek, “I2C Interface Design for 
Hardware Master Devices,” 2012. 
[74] M. H. Daware and A. S. Patil, “Implementation of I2C Bus Protocol on FPGA,” 
vol. 2, no. 8, 2015. 
References 113 
 
[75] MIPI Alliance, “Camera and Imaging.” [Online]. Available: 
https://mipi.org/specifications/camera-and-imaging. [Accessed: 22-Sep-2019]. 
[76] ADVANTEST, “Question: ‘What’s the difference between MIPI D‐PHY and 
MIPI M‐PHY?’” [Online]. Available: 
https://www.advantest.com/documents/11348/9523df99-15e1-41b4-a13b-
6a3157fc5df0. [Accessed: 22-Sep-2019]. 
[77] Tektronix, “Understanding and performing MIPI M-PHY Physical and 
Protocol Layer.” 2012. 
[78] MIPI Alliance, “Specifications: MIPI UniPro.” [Online]. Available: 
https://www.mipi.org/specifications/unipro-specifications. [Accessed: 22-Sep-
2019]. 
[79] HDMI Licensing Administrator, Inc., “Press: HDMI Interface Extends 
Exceptional Digital Quality with Single-Cable Simplicity to Over 4 Billion 
Consumer Devices.” [Online]. Available: 
https://www.hdmi.org/press/press_release.aspx?prid=137. [Accessed: 22-Sep-
2019]. 
[80] HDMI Licensing Administrator, Inc., “About us.” [Online]. Available: 
https://www.hdmi.org/about/index.aspx. [Accessed: 22-Sep-2019]. 
[81] Hitachi Ltd. et al., “High-Definition Multimedia Interface: Specification 
Version 1.3a.” 2006. 
[82] Digital Content Protection, “HDCP deciphered: White Paper.” 2008. 
[83] HDMI Licensing Administrator, Inc., “Overview.” [Online]. Available: 
https://www.hdmi.org/manufacturer/hdmi_2_1/index.aspx. [Accessed: 22-Sep-
2019]. 
[84] AIA, “USB3 Vision: version 1.0.1.” 2015. 
[85] AIA, “About AIA - global vision systems trade association.” [Online]. 
Available: https://www.visiononline.org/mvo-content.cfm/machine-
vision/About-AIA/id/81. [Accessed: 22-Sep-2019]. 
[86] Basler, “USB 3.0 Interface and USB3 Vision Standard.” [Online]. Available: 
https://www.baslerweb.com/en/vision-campus/interfaces-and-standards/usb3/. 
[Accessed: 22-Sep-2019]. 
[87] Stemmer Imaging, “USB3 Vision.” [Online]. Available: 
https://www.stemmer-imaging.com/en/knowledge-base/usb3-vision/. [Accessed: 
22-Sep-2019]. 
[88] P. Lefkin and R. Wietfeldt, “Understanding MIPI Alliance Interface 
Specifications,” 2014. [Online]. Available: 
https://www.electronicdesign.com/communications/understanding-mipi-alliance-
interface-specifications. [Accessed: 22-Sep-2019]. 
[89] MIPI Alliance, “Specifications: MIPI Camera Serial Interface 2 (MIPI CSI-2), 
Available at:” [Online]. Available: https://mipi.org/specifications/csi-2. 
[Accessed: 22-Sep-2019]. 
[90] K. Lim, G. S. Kim, S. Kim, and K. Baek, “A Multi-Lane MIPI CSI Receiver 
for Mobile Camera Applications,” IEEE Trans. Consumer Electron., vol. 56, no. 
3, pp. 1185–1190, Aug. 2010. 
114 References 
 
[91] MIPI Alliance, “Specifications: MIPI Camera Serial Interface 3 (MIPI CSI-
3).” [Online]. Available: https://www.mipi.org/specifications/csi-3. [Accessed: 
22-Sep-2019]. 
[92] Allied Vision, “MIPI CSI-2 A new camera interface for developers of 
embedded machine vision systems,” 2017. [Online]. Available: 
https://www.stemmer-imaging.com/media/uploads/avt/AL/ALLIED-VISION-
EN-MIPI-CSI-2-A-new-camera-interface-for-developers-of-embedded-machine-
vision-systems.pdf. [Accessed: 22-Sep-2019]. 
[93] AIA, “Vision Staandards: GigE Vision.” [Online]. Available: 
https://www.visiononline.org/vision-standards-details.cfm?type=5. [Accessed: 
22-Sep-2019]. 
[94] Stemmer Imaging, “GigE Vision.” [Online]. Available: https://www.stemmer-
imaging.com/en/knowledge-base/gige-vision/. [Accessed: 22-Sep-2019]. 
[95] AIA, “GigE Vision Specification - v2.1.” 2018. 
[96] Basler, “Gigabit Ethernet and GigE Vision.” [Online]. Available: 
https://www.baslerweb.com/en/vision-campus/interfaces-and-standards/gigabit-
ethernet/. [Accessed: 22-Sep-2019]. 
[97] “CoaXPress.” [Online]. Available: https://en.wikipedia.org/wiki/CoaXPress. 
[Accessed: 22-Sep-2019]. 
[98] JIIA, “About the JIIA.” [Online]. Available: http://jiia.org/en/about/outline/. 
[Accessed: 22-Sep-2019]. 
[99] CoaXPress, “The world’s leading interface standard for high-speed imaging.” 
[Online]. Available: http://www.coaxpress.com/. [Accessed: 22-Sep-2019]. 
[100] JIIA, “CoaXPress Standard: Version 1.1.1.” 2015. 
[101] AIA, “Camera Link Specification – v2.1.” 2018. 
[102] AIA, “Vision Standards: Camera Link HS.” [Online]. Available: 
https://www.visiononline.org/vision-standards-details.cfm?type=10. [Accessed: 
22-Sep-2019]. 
[103] AIA, “Camera Link HS Specification - v1.0.” 2012. 
[104] Stemmer Imaging, “Camera Link HS.” [Online]. Available: 
https://www.stemmer-imaging.com/en/knowledge-base/cameralink-hs/. 
[Accessed: 22-Sep-2019]. 
[105] AIA, “Vision Standards: Camera Link.” [Online]. Available: 
https://www.visiononline.org/vision-standards-details.cfm?type=6. [Accessed: 
22-Sep-2019]. 
[106] Vision System Design, “SYSTEM PERFORMANCE: Exposing jitter and 
latency myths in Camera Link and GigE Vision systems,” 01-Jan-2011. [Online]. 
Available: https://www.vision-systems.com/home/article/16737772/system-
performance-exposing-jitter-and-latency-myths-in-camera-link-and-gige-vision-
systems. [Accessed: 22-Sep-2019]. 
[107] H. J. Trussell and M. J. Vrhel, Fundamentals of Digital Imaging. Cambridge 
University Press, 2008. 
[108] P. Pudlak, Logical foundations of mathematics and computational complexity: 
a gentle introduction, 1st ed. New York: Springer, 2013. 
References 115 
 
[109] A. C. Bovik, The Essential Guide to Image Processing. Elsevier, 2009. 
[110] Wu-Sheng Lu and Andreas Antoniou, Two-Dimensional Digital Filters. 
Marcel Dekker, Inc., 1992. 




[112] M. S. Nixon and A. S. Aguado, Feature Extraction and Image Processing. 
Newnes, 2002. 
[113] D. Adlakha, D. Adlakha, and R. Tanwar, “Analytical Comparison between 
Sobel and Prewitt Edge Detection Techniques,” International Journal of Scientific 
& Engineering Research, vol. 7, no. 1, p. 4, 2016. 
[114] C. I. Gonzalez, P. Melin, J. R. Castro, and O. Castillo, Edge Detection Methods 
Based on Generalized Type-2 Fuzzy Logic. Cham: Springer International 
Publishing, 2017. 
[115] W. Kim and C. Jung, “Illumination-Invariant Background Subtraction: 
Comparative Review, Models, and Prospects,” IEEE Access, vol. 5, pp. 8369–
8384, 2017. 
[116] O. Barnich and M. Van Droogenbroeck, “ViBe: A Universal Background 
Subtraction Algorithm for Video Sequences,” IEEE Transactions on Image 
Processing, vol. 20, no. 6, 2011. 
[117] P. Torteek et al., “Space debris tracking based on fuzzy running Gaussian 
average adaptive particle filter trackbefore-detect algorithm,” Research in 
Astronomy and Astrophysics, vol. 17, no. 2, 2017. 
[118] M. Piccardi, “Background subtraction techniques: a review,” in 2004 IEEE 
International Conference on Systems, Man and Cybernetics, 2004. 
[119] V. Van Nieuwenhove, J. De Beenhouwer, F. De Carlo, L. Mancini, F. Marone, 
and J. Sijbers, “Dynamic intensity normalization using eigen flat fields in X-ray 
imaging,” Opt. Express, vol. 23, no. 21, Oct. 2015. 
[120] Stemmer Imaging, “LEIA explains: What is the effect of flat field correction 
and why is it absolutely essential for the use of line scan cameras?” [Online]. 
Available: https://www.stemmer-imaging.com/en-gb/technical-tips/leia-explains-
what-is-the-effect-of-flat-field-correction/. [Accessed: 22-Sep-2019]. 
[121] Y. Zheng, R. Xiao, J. C. Gee, and C. Kambhamettu, “Single-Image Vignetting 
Correction from Gradient Distribution Symmetries,” IEEE Transactions on 
Pattern Analysis and Machine Intelligence, vol. 35, no. 6, 2013. 
[122] D. Tomaževič, B. Likar, and F. Pernuš, “Comparative evaluation of 
retrospective shading correction methods,” Journal of Microscopy, vol. 208, no. 
3, pp. 212–223, 2002. 
[123] P. Kask, K. Palo, C. Hinnah, and T. Pommerencke, “Flat field correction for 
high-throughput imaging of fluorescent samples,” Journal of Microscopy, vol. 
263, no. 3, pp. 328–340, 2016. 
[124] W.-S. Gan and S. M. Kuo, Embedded Signal Processing with the Micro Signal 
Architecture. USA: John Wiley & Sons, Inc., 2007. 
116 References 
 
[125] Microsemi, “Libero SoC v12.0 and later.” [Online]. Available: 
https://www.microsemi.com/product-directory/design-resources/1750-libero-soc. 
[Accessed: 22-Sep-2019]. 
[126] Microsemi, “UG0645: User Guide Low Voltage Differential Signaling 7:1 
revision 4.” 2016. 
[127] W. Gang, L. Cai, W. Shigang, G. Kai, and W. Xiaolan, “Implementation of 
Camera Link Interface on Virtex-5 FPGA,” RJASET, vol. 5, no. 22, pp. 5244–
5248, May 2013. 
[128] Microsemi, “Application Note AC450: Timing Optimization for AXI3 DDR 
Interfaces Using SmartFusion2/IGLOO2 - Libero SoC v11.7.” 2016. 
[129] Microsemi, “IGLOO2: DDR Controller and Serial High Speed Controller 
Standalone Initialization Methodology.” 2014. 
[130] Microsemi, “CoreABC v3.7: Handbook.” 2016. 
[131] Microsemi, “CoreConfigP v7.1: Handbook,” p. 18, 2016. 
[132] Microsemi, “CoreResetP v7.1: Handbook,” p. 16, 2016. 
[133] R. Srikantaswamy, J. Laha, T. Bhatia, and A. Mal, “Competent solution for 
2D-FFT and 2D-IFFT Computation using FPGA IP-core,” International Journal 
of Computer Applications, 2015. 
[134] T. Lenart, M. Gustafsson, and V. Öwall, “A Hardware Acceleration Platform 
for Digital Holographic Imaging,” Journal of Signal Processing Systems, vol. 52, 
no. 3, pp. 297–311, Sep. 2008. 
[135] T. Liu, “Mapping Large 2D FFTs onto FPGAs using High Level Synthesis,” 
Master’s thesis, Eindhoven University of Technology, 2016. 
[136] J. S. Kim, C.-L. Yu, L. Deng, S. Kestur, V. Narayanan, and C. Chakrabarti, 
“FPGA architecture for 2D Discrete Fourier Transform based on 2D 
decomposition for large-sized data,” in 2009 IEEE Workshop on Signal 
Processing Systems, Tampere, Finland, 2009, pp. 121–126. 
[137] B. Akin, P. A. Milder, F. Franchetti, and J. C. Hoe, “Memory Bandwidth 
Efficient Two-Dimensional Fast Fourier Transform Algorithm and 
Implementation for Large Problem Sizes,” in 2012 IEEE 20th International 
Symposium on Field-Programmable Custom Computing Machines, Toronto, 
Canada, 2012, pp. 188–191. 
[138] Microsemi, “ModelSim ME.” [Online]. Available: 
https://www.microsemi.com/product-directory/dev-tools/4900-modelsim. 
[Accessed: 22-Sep-2019]. 
[139] “Index of /~narayana/castanza/I2Rdataset.” [Online]. Available: http://vis-
www.cs.umass.edu/~narayana/castanza/I2Rdataset/Lobby.zip. [Accessed: 22-
Sep-2019]. 
[140] M. Wakin, “Standard test images - Lena/Lenna,” 13-May-2003. [Online]. 
Available: https://www.ece.rice.edu/~wakin/images/lena512.bmp. [Accessed: 22-
Sep-2019]. 
 
