Abstract -Correlation filters due to its three protuberant advantages have proven very effective for automatic target detection, biometric verification and security applications. In this paper, correlation filters are implemented in hardware FPGA keeping in view their importance in real time applications. Hardware implementation results are placed in comparison with results generated through software. These results are almost similar with a negligible variation i.e. 10 -4, which is demonstrated in the experimental section, in addition to valuable time reduction. The hardware design of these filters is implemented in LabView which can be subsequently employed in real-time security applications. This design may be expanded for other advanced variants of correlation filters in future work.
I. INTRODUCTION
Theoretically human and machine vision is similar but identification of the target objects by machines is a prodigious task in image processing, as compared to humans, when there is distortion, noise and cluttered backgrounds. By closely relating the response of artificial sensors to human vision perception, better performance can be achieved. Thus sensors which are chosen for aim detection should follow the best approximation of human senses no matter what variations and occlusions are present in test images.
Correlation filters, which had been successfully used in the past for better object and shape recognition, are designed for implementation in software as well as hardware because of its three pronounced benefits which are easy detection of correlation peak, good distortion tolerance, and the ability to suppress clutter noise. These filters may accommodate a wide range of distortions in which they can achieve reliable object recognition. These filters can be implemented in hardware base which can be efficiently used for real time applications. The advanced variants of correlation filters have been used for biomedical, defense and security applications. Kumar et al. proposed an Optimal Trade-off Maximum Average Correlation Height (OT-MACH) filter which minimized the energy function. Different correlation filters can be implemented by changing the value of the Optimal Trade-off (OT) parameters [1] . Its advanced variants are resistant to distortion, noise and background clutter. Another interesting attribute of the correlation filter is that it gives pronounced peaks for test images by simple computations. These are linear filters which have been implemented in software with different variants. These filters haven't been implemented in hardware yet.
There are several other solutions for the hardware implementation of the filter but these are less flexible and cost effective, such as hardwired solutions. Media-processors include new Very-Long-Instruction-Word (VLIW) and programmable super-scalar Digital Signal Processors (DSPs) embedded with heavy-duty computational blocks / cores. These are highly efficient, low cost, flexible processors best suited for the hardware design of correlation filters.
In this paper, the hardware design of a correlation filter has been implemented in the National Instruments LV 8.6 which is a programming environment (graphical and dataflow based) for embedded design. LV combines a Single Cycle Time Loop (SCTL) block feature which is a special loop of the LV FPGA Timed Loop structure. This loop computes all functions inside a block when it uses with an FPGA source in one specified clock cycle. SCTL can be used with derived clocks to regularize the loop on speeds different from 40 MHz by default. Timing properties of the Timed Loop cannot be changed when it is used with an FPGA source. The code can be written for conventional sub-systems in a hardware description language (HDL) and incorporated into LV. The optimized architecture of the OT-MACH has been proposed in the LV (FPGA) design environment and put into operation using the architecture of the FPGA. It invokes the LV (FPGA) synthesis tool first followed by mapping the resultant code and engaging the platform specific tools of the targeted FPGA. Xilinx Kintex-7 is the target FPGA used in our experimentation. The National Instruments FlexRio board is our experimental platform; more specifically it comprises a Xilinx Kintex-7 device integrated in conjunction with the host. This paper is segmented into five sections; after this introduction a literature review is presented in Section II which presents some related work starting with early techniques and reviewing some related implementations. Section III describes the detailed methodology used to implement correlation filters on the FPGA using LV. Section IV illustrates the computed results of the hardware architecture and gives a comparison with software implementation in MATLAB. Section V gives brief conclusions.
II. BACKGROUND REVIEW
Young et al. proposed a hybrid optical correlator in which the input image is Fourier transformed digitally at video rate with a digital signal processor [2] . The input data is combined with a digital template and then loaded onto a high frame rate Spatial Light Modulator (SLM). The output of the optical Fourier transform then implements a correlation between the input image and reference image. This hybrid hardware approach of optical correlator gave high speed results as compared to previous implementations.
Various optimized approaches for 1-D and 2-D Fast Fourier Transformation (FFT) implementation, used for implementing correlation filters, have been proposed in the past. The high level framework including 1D-FFT and 2D-FFT FPGA implementations for instantaneous applications was proposed by Uzun et al. [3] . The parallel approach in 2D-FFT calculation is achieved by numerous 1D-FFT processors which share additional external memory.
Kumar et al. presented a survey paper of all variants of correlation filters and their mathematical expressions [4] . A hard problem of pattern recognition is to detect a targeted item in the presence of distortion in orientation, position and scale, especially in cluttered backgrounds for which a methodology was proposed by Rehman et al. in which a logarithmic correlation filter is combined with a band-pass difference of Gaussian (DOG) filter [5] . The resultant method was capable of creating invariance for object identification despite several forms of distortion.
Research work on a method of combining of two existing techniques was introduced by Bone et al. which was capable of producing invariance to several kinds of rotation and scale distortions [6] . In this procedure, the Maximum Average Correlation Height (MACH) filter was used to provide orientation invariance tolerance to the noisy backgrounds along with the logarithmic mapping algorithm which is invariant to in-plane rotation and scale changes and was implemented in the form of a memory shift.
Recently Awan et al. proposed a combined framework of the EEMACH filter and the DOG filter to develop a system which can give a pronounced peak in the presence of background clutter and distortions [7] . The proposed filter gave enhanced target recognition results leading to a higher percentage of correct automated decisions in all situations as compared to previously proposed correlation filters.
The values of the optimal trade-off correlation parameters can be selected randomly on the basis of experiments. These values are then optimized depending upon the dataset and application through a hierarchal particle swarm optimization algorithm by Tehsin et al. [8] .
Diaz et al. implemented non-linear correlation filters in FPGA. [9] Non-linear correlation filter took too much time in computations and hardware design of non-linear filters had less computational cost but he had not given any reduced computational time which he claimed in paper. Those filters are robust to non-Gaussian noise and non-homogenous illumination.
Various variants of correlation filters with enhanced performance have been proposed by many researchers in the past few decades [11] [12] [13] [14] [15] [16] . These filters are efficiently used in many applications such as biomedical, defense and security applications. By keeping in view its effectiveness, the OT-MACH filter has been implemented by us in hardware for the usage in real time applications.
III. METHODOLOGY
An optimal tradeoff approach proposed by Kumar, is used to minimize the following energy function [1, 10] :
The resultant Optimal Trade-off MACH filter given as [1] :
Here , α β and γ are optimal trade-off parameters which are used to implement three variants of the correlation filter, 
x S is a diagonal matrix measuring the similarity of training data to the class mean in the frequency domain [1] :
and C is spectral density matrix of additive input white noise.
In this paper we implemented the hardware design of the MACH filter which gives a correlation peak against a targeted object. It maximizes the performance measure called Average Correlation Height (ACH) while minimizing Average Similarity Measures (ASM). The value of α=0.01, β=0.1 and γ=0.3 has been chosen for the MACH filter implementation [17] .
Correlation filters are implemented in frequency domain as this involves a simple frequency domain multiplication. For the 2D-FFT we accelerate a 1D-FFT algorithm in the Lab View (LV) FPGA using an SCTL block. The SCTL is a particular loop of the LV FPGA Timed Loop structure. This computes every function inside the block by one selected clock cycle when used with an FPGA. SCTL can be used with derived clocks to regularize the loop on speeds different from 40 MHz by default. In the FPGA target one cannot dynamically change timing properties of the Timed Loop. Other than SCTL employment, to execute one iteration takes a minimum of 3 clock cycles for one while loop in FPGA VI. It executes due to the enable chain for FPGA VI compilation. The enable chain is applied for assurance of dataflow when the FPGA VI is compiled into a bit file. The performance of SCTL in FPGA VI varies depending on what is inside in the loop. Logic is implemented in combinatorial hardware so the FPGA configuration for code generation uses fewer resources. Instead of doing addition, the result is saved, and then multiplied and again the result is saved. The SCTL can do this in 1 x clock cycle and the step involved in storing the intermediate results is avoided. FPGA resources are conserved as no flip flop is required in between the operations for storing the result. In figure 1 , the complete block diagram of the hardware design of the MACH is given. The FFT of the images is implemented on an FPGA in LV and the remaining implementation is done on the host of LV. In host mode, LV uses the building blocks which run on the system's processor. In our design, we are using FPGA for the FFT as it takes more computations. Thus a separate processor has been used and once we compute the FFT for the training images we store these on the host for further experimentation which reduces computations.
IV. EXPERIMENTAL RESULTS
For the hardware implementation on FPGA, the MACH filter was trained from 0-40 degrees out of plane rotated images as shown in figure 2 . Out of plane rotation is one of the important factors in tracking of any object. The data set of images of the Amsterdam image library that contain out of plane rotated images of size 16x16 resolution were used [18] . Details of FPGA hardware resources used for complete implementation after place and routing is provided in table 1 below.
Only 22.1% of Slice Registers, 18% of block RAMs, 56.4 % of Slice LUTs and 1% of DSP 48a blocks are used out of total hardware resources. All timing requirements are met; it takes 324 µsec for computation for an image of 16x16 pixels. However, the FPGA VI code is generic for some of the latest types of NI LV FPGAs like the 7975R and 7972R etc.
Device Utilization Used Total Percentage
The impleme transferred ba
The correlati domain to fr shown in figu The correlation peak of 25 out of plane rotated test images generated by MATLAB, is given in figure 5 . The Correlation Output Peak Intensity (COPI) is the parameter for analyzing the correlation peak [19, 20] . COPI value of the simple MACH filter for the test image is 0.0016. The hardware specification used for experimental results is Intel Core2 Dou 2.1 GHz with 3GB RAM. Hardware design takes less computational time as compared to software design. The computational cost of training, testing in hardware and software implementation is given in Table 4 . The computational cost estimated by MATLAB varies in each simulation. The average time for training and testing is 674 ms in MATLAB and average time in LabView is 348 ms. The hardware computational cost is low (almost half) as compared to software cost which is expedient for real time applications.
V. CONCLUSION
Hardware implementation of any software is the definitive goal for most of the development and research works. However, practical or hardware application of any study or research has its own complications, variations and peculiarities, which need to be addressed during the development and implementation phases. Many software applications have been developed which are being used for FPGA programming. LabView is one of these, which is very accurate and reliable. A lot of efforts have been put in and as a result excellent results were achieved, which opened different avenues for real time employment of correlation filters in the fields of medical, security and military applications. The proposed hardware implementation is flexible and can be modified to be used for different images and techniques.
