ABSTRACT This paper presents a delta robotic visual-servoing tracking method using zero-mean normalized cross-correlation (ZNCC)-based grayscale template matching hardware core on field-programmable gate arrays (FPGAs). The concurrent FPGA-based ZNCC hardware core with cascading multiplicationaccumulate (MAC) circuits is designed, which can largely reduce FPGA hardware resource consumptions. A compact optical imaging system with a front 45 • -slant mirror and an optical filter film are proposed, which can efficiently filter out the background cluttered artifacts. The trajectory visual-servoing tracking and dynamic tracking experiments based on our built-up delta robotic visual tracking platform are implemented. The experimental results indicate that the presented FPGA-based embedded robotic visual tracking method can efficiently improve an object trajectory tracking performance.
I. INTRODUCTION
Visual tracking is to seek the solutions for continuously locating and following an object of interest in sequential image frames. Furthermore, visual servoing is to extend visual tracking into a control loop through visual feedbacks for regulating the machine's state automatically. Owing to its huge merits, visual tracking / servoing technology has been applied in many applications, such as surveillance, traffic regulation, autonomous vehicle navigation, micromanipulation, and industrial automation. Facing the lasting challenges from occlusion, background clutter, shape deformation, and scale variation etc., many visual tracking methods have been presented, such as area/feature matching-based method, classifier-based method, and particle filter etc. when considering the improvements at the different stage of visual tracking [1] , [2] . Robotic visual servoing systems are mainly categorized into position-based visual servoing and imagebased visual servoing [3] . In particular real-time and precision in position-based visual tracking should be paid more concerns in practical applications.
Matching-based visual tracking / servoing is more suitable for industrial scenarios, which is more regular than natural
The associate editor coordinating the review of this manuscript and approving it for publication was Bora Onat.
scenarios. Template matching based visual tracking is the most common approach owing to its simplicity on windowshifted matching computation, which kernel similarity measures include cross-correlation (CC), normalized crosscorrelation (NCC), zero-mean normalized cross-correlation (ZNCC), the sum of squared differences (SSD), and the sum of absolute differences (SAD). Considering the limited performance of correlation filter tracker due to the direction of correlation computation and a single template with improper updating scheme, Liu presented a multi-templatematching-based tracker using mutual buddies similarity and memory filtering [4] . Li adopted SSD-based template matching through gradient-based inverse optimized fast searching to implement an in-plane micro motion tracker of the precision positioning stage based on micro-vision [5] . In order to improve tracking precision of visual servoing for microscopic objects, a six-axis motion visual tracker was presented, in which in-plane 3-DOF motion was obtained by NCC-based angle-discriminated template matching with affine constraints [6] . Feature matching based visual tracking / servoing commonly is more robust than template matching based methods. Liu presented a binocular micro-vision based 3D-motion tracker through high precision statistical estimation of feature points by Kalman filter [7] . A stereo micro-vision based optical fiber precision alignment system was built up, in which the 5-DOF motion of the optical fibers was extracted by feature stereo correspondence matching for position-based visual servoing [8] . The shape features or angle features also were used in the visual servoing systems of mobile robot or an inverted pendulum [9] , [10] . But due to its algorithm simplicity, template matching based trackers could attract the attention of many researchers more largely than feature matching based trackers if overcoming heavy computation consumptions from quantities of multiplication-accumulate (MAC) operations and exhaustive search.
In order to accelerate a template matching computation, especially ZNCC, the coarse-to-fine search [11] , successive elimination technique [12] , [13] , integral image technique [14] , [15] , and bound partial correlation [16] softcomputing schemes were presented. Moreover, hardwareaccelerated template matching methods, such as Graphics Processing Units (GPU), Digital Signal Processing (DSP) and Field Programmable Gate Array (FPGA), may be chosen due to their running-faster advantages than software-algorithmaccelerated methods. Therein, the FPGA technique is more suitable to be used to realize a template matching computation due to its intrinsic window-shifted operations in a way of in-stream and parallel processing. Because of the most straightforward implementation of the SAD algorithm, SAD based stereo matching/ tracking on FPGAs were found in many research works [17] - [19] . Although SAD-based FPGA circuit is simple, its object-locating precision heavily depends on the environmental lighting condition than other anti-illumination-variations algorithm. Local correspondence methods on FPGAs were successfully applied mostly in realtime stereo vision [19] , [20] and seldom in visual tracking / servoing [21] . Some image-based registration techniques on FPGAs, such as Blob detection were adopted in robotic visual servoing [22] - [24] . But Blob-on-FPGA is a kind of binary algorithm, it hardly locates and tracks the objects precisely.
Based on the state-of-art of FPGA-based robotic visual tracking, we develop a ZNCC template matching based visual-servoing tracking system on a Delta robot in an eyeto-hand configuration according to real-time and precision requirements of robotic visual tracking. Because the captured images are usually background-cluttered, and the installation space for the imaging units is limited, we present a compact optical design that can aid in directly capturing a clear objecttracked image by optically filtering out background disturbs. Based on this built-up FPGA-based visual tracking system, various tracking experiments are implemented. This paper's main contributions are concluded as 1) A concurrent ZNCC template matching hardware core with cascading MACs circuits on FPGAs is designed for robot visual tracking, which consumes fewer FPGA resources than the other concurrent designs, and surpasses other FPGAbased template matching in precision and robustness.
2) In order to extend the field of view (FOV) as soon as possible and filter out the background clutter, a compact optical imaging system is designed, in which include a simple optical sheet filter and a 45 • -slant plane mirror. 3) An eye-to-hand FPGA-based robotic visual-servoing tracking system is built up and verified experimentally in an actual scenario based on a Delta robot.
The following parts of this paper are arranged as follows. First, a concurrent ZNCC grayscale template matching method with cascading MAC circuits on FPGAs is presented in Section II. Then a compact optical imaging system with a front optical sheet filter and a 45 • -slant plane mirror is designed, and robotic visual servoing scheme and ZNCC-based visual tracking core on FPGAs are described in Section III. In Section IV, the visual tracking experiments are implemented and discussed for the presented FPGA-based robotic tracking method. Finally, some conclusions are drawn in Section V.
II. ZNCC-BASED TEMPLATE MATCHING ON FPGAs

A. ZNCC-BASED TEMPLATE MATCHING
It is well known that the ZNCC algorithm has an obvious advantage of anti-illumination-variations compared to other correspondence matching methods, such as SAD, CC, and NCC, etc. In order to verify the anti-illumination-variation performance, we evaluate the matching performance between ZNCC and NCC methods under the simulated uneven illumination image on PC, which results are shown in Table 1 (The detailed experiment descriptions are omitted). Obviously, the anti-illumination-variation performance of ZNCC surpasses that of NCC.
But ZNCC algorithm needs large amounts of multiplication and addition operations, as may require lots of logic hardware resources in an FPGA. Meanwhile, ZNCC-based template matching runs in a mode of sliding window computation, as is also suitable to stream processing in FPGAs. This paper presents a concurrent ZNCC circuit with a cascading MAC scheme on an FPGA for less valuable resource consumption and strict timing. The grayscale template matching scheme is described in Fig. 1 .
As shown in Fig. 1(a) , the object boxed in red is found by template T (x, y), as the object searching way is depicted in Fig.1 (b) . After the template slides in the source image I (x, y) from left to right and up to down at every step of one column pixels, the overlay pixels in the template image and reference image are used to compute similarity metrics referring to (1) .
(1) VOLUME 7, 2019 FIGURE 1. Working mechanism of grayscale template matching: (a) template T(x, y), which size is m × n is searched in image I(x, y) until maximum similarity of correspondence matching; (b) sliding window operations for similarity metrics computation between template matrix and original image matrix in the system clock cycle.
FIGURE 2.
Simplified ZNCC expression for in-stream and parallel computation on an FPGA, in which terminal I, II, III, IV, and V are the outputs of correlation computation between reference and template images, the related energy difference between reference and template, a sum of the reference image, and energy of the reference image, respectively.
where the pixel index i = x + i, and the pixel index j = y + j, T i,j is a pixel value of the template image T (x, y) at i, j, T is average of all pixel values in the image T (x, y), I i,j is a cached image pixel value from the video stream at the template window index i, j, andĪ x,y is average of all pixel values in the cached image.
Due to large amounts of MAC computations in (1), ZNCC formula should be simplified further considering stream processing and parallel computation on an FPGA. The simplified equation [25] is formulated and shown in Fig.2 .
In the simplified ZNCC expression, C and output II/ IV are constant and can be precomputed outside FPGAs. So these constants are integrated into the final FPGA circuit. Moreover, the output I/III/V should be computed in an FPGA due to different reference images. In the three subunits, correlation coefficient (CC) subunit needs large amounts of MAC operations. Because the template image should be feed into computation modular of unit I, it should be pre-stored in the Block-Ram in the FPGA. Some researchers proposed some solutions. Lindoso proposed a cascading MAC circuit with valuable DSPSlice resources in an FPGA for CC computation [26] . Zhou developed a reconfigurable hybrid MAC scheme combined cascading and concurrent MAC computation [27] . In contrast, we present concurrent computation architecture with cascading MAC computation for CC computation, which will be described in the next subsection. In our designed practical circuit, division and square root operations are implemented by calling LPM-DIVIDE IP and ALTSQRT IP provided in IDE software Quartus II, respectively.
B. IMAGE CACHING ON FPGAs
According to the simplified ZNCC formula, the reference image from the camera and the template image should be cached before further processing in an FPGA. Comparing to a memory-based image caching method, a shift-registerbased image caching method can easily satisfy real-time processing requirements for an imaging pixel stream due to its inherent concurrent characteristics. So we adopt a synchronized FIFO buffering technique for multi-row pixel buffering, which buffering mechanism is shown in Fig. 3 . In Fig. 3 , every FIFO shares the same clock signal clock. The read-enable terminal rd_en in the current FIFO is linked to the write-enable terminal wr_en in the next neighboring FIFO. So the output q can be output to the data input terminal data of the next neighboring FIFO synchronically. After finishing data pouring to every FIFO, every row grayscale pixels are buffered, meanwhile, every FIFO outputs its data-buffered signals line01, . . . , or line31.
In order to understand the buffering process intuitively, timing sequence logic in the multi-row buffering is depicted in Fig. 4 . It is obvious that all row data keep in alignment after finishing 32-row buffering, as indicated by a jumping signal dout_vld for the windowed pixels output valid. The data alignment also can be observed in the red box zone in Fig. 4 . Obviously, after alignment, the serialized imaging pixel stream is projected to an image matrix for the following template matching.
In order to construct correlation operations in an FPGA, a FIFO-based column-scanning 32-row-pixel buffering scheme is designed, which can pipeline buffered sub-images for template matching according to a special row-column scanning mode, as depicted in Fig. 5 . This buffering process is a kind of shift-register-based buffering method. It is obvious that template matching in a size of 32 × 32 runs once after 32 clock cycles. This kind of concurrent operation makes template matching run very fast, which is different from the memory-based approach [25] . But there is a shortcoming of low matching-based location precision at some position due to template size step in the column-scanning process along the row directions but high location precision due to 88872 VOLUME 7, 2019 one-pixel step along the column direction. But high matching-based location precision can be obtained for motion objection tracking when adopting a scheme of keeping high correlation results and neglecting low correlation results. This paper adopts this kind of approach for high precision realtime object tracking.
C. FPGA-BASED ZNCC DESIGN
According to the above simplified ZNCC expression, concurrent pipelining hardware architecture is designed as shown in Fig. 6 . The REGs in Fig. 6 represent the registers which function as the role of data buffering in the pipelining processing at every clock cycle.
In order to realize concurrent operation and resource reuse in an FPGA, the hardware architectures for the unit I, III, and V in the ZNCC expression are designed as shown in the following. The circuit architecture for unit III in Fig. 2 is depicted in Fig. 7 , in which inputs come from the outputs of a buffering circuit in Fig. 3 , and functions as integration on the windowed camera pixel stream.
The circuit architecture for unit V in Fig. 2 is depicted in Fig. 8 , in which two-row buffering pixels of the camera image stream are multiplied each other in a concurrent mode, and sum-addition reuse is adopted for the final squared sum. This kind of circuit structure can largely reduce consumption of adder resources with the aid of semi-concurrent processing mode.
The circuit architecture for the unit I is depicted in Fig. 9 , in which one-row buffering pixels of the camera image stream is multiplied by a corresponding row buffering template pixels in a concurrent mode, and sum-addition reuse is adopted for the final correlation sum. Similarly, this kind of semiconcurrent processing can largely reduce hardware resource consumption.
D. MATCHING EXPERIMENTS
In the evaluation experiment, the size of the reference image is 640×480, and the sizes of the template images are 20×20, 32 × 32, and 60 × 60 respectively. The ZNCC algorithm VOLUME 7, 2019 The hardware consumption comparisons on an Altera Cyclone V@5CSEMA5F31C6 between full current and concurrent with a multiplication-accumulate (MAC) circuits for ZNCC are listed in Table 3 under the same environment: reference image size 640 × 480 and template image size 32 × 32. The result indicates that the presented concurrent architecture for ZNCC consumes less FPGA hardware resources.
In order to evaluate our method with other competitive FPGA-based ZNCC method [25] , we compare their hardware resource consumptions on a same Stratix IV FPGA (EP4SE530H35C2) used in [25] . The results are listed in Table 4 . Obviously our method consumes fewer hardware resources than the ZNCC circuit in [25] under non-DSP and DSP modes. The reasons are that the correlation window computations are implemented by MAC circuit in our designed circuit but feedback FIFOs in [25] 's circuit, and windowed image is projected by register-based pipelining technology in our method but by memory-based pipelining technology. 
III. DESIGN ON FPGA-BASED ROBOTIC VISUAL TRACKING A. DESIGN ON OPTICAL IMAGING SYSTEM
According to the requirements of compactness and large FOV of the optical imaging system, a novel compact backview optical imaging system with a 45 • -slant plane mirror is developed, which is used to eye-to-hand robotic visual tracking based on a Delta robot, as depicted in Fig. 10(a) . Obviously, the working distance of the camera is S1 + S2, and the FOV size is W. In the proposed imaging system, the optical axis of the micro camera, which is connected with the FPGA board (DE1-SOC), is horizontal along with X-axis of the Delta robot. A 45 • -slant plane mirror is mounted along the horizontal optical axis, as to make the optical axis turn to vertical orientation along Z-axis of the Delta robot. This kind of light path configuration enlarges the actual FOV under the limited installation space. Meanwhile, the captured image is background cluttered because the scattered light from the robot body can easily enter in the camera' lens when looking at the object from down to up.
In Fig. 10(b) , the moving target is hardly discriminated from the background. Facing this problem, we ingeniously add a front semi-transparent optical sheet in the light path, as shown in Fig. 10 . From Fig. 10(c) , the clutteredbackground is successfully removed because of the semitransparent sheet functions as a special nonlinear optical 88874 VOLUME 7, 2019 filter. The filtered object image can largely reduce the burden of image processing on FPGAs.
B. ZNCC-BASED VISUAL TRACKING CORE ON FPGAs
The object position extraction computation is a key module in the robotic visual tracking system due to the requirements on real-time and high precision. We design a ZNCC-based object position extraction circuit on an Altera FPGA, shown in Fig. 11 , which includes four main modules, including an image stream buffering core, a ZNCC-based template matching core, a VGA display core, and a UART data transmission core. The image stream buffering core functions as camera video stream buffering based on FIFO row buffering, and template image buffering based on shift register-based row buffering. The VGA display core can be used to output the captured image and the tracked red-boxed object image in real-time to a VGA monitor. The UART transmission core functions as data communication from the FPGA end to the PC end.
C. DELTA ROBOTIC VISUAL TRACKING CONTROL
Based on the presented ZNCC-based template matching core on FPGAs, a robotic visual tracking control with a Delta robot is proposed as shown in Fig. 12 . The position error e between the position Po of the tracked target and the end position Pt of the Delta robot is used to control the robot's motion by a proportional regulator. The positions Po and Pt are calculated by the FPGA-based ZNCC core and the robot kinematics mapping respectively. The error e is mapped to the joint position error by the proportional gain K and inverse kinematics mapping. Meanwhile, the joint servo motors are controlled by a PID regulator Hpid in an interior loop. From  Fig.12 , the open transfer function can be simply treated as K× Hpid only considering the position loop and ignoring other loop units' influence due to the system's complexities. Because its control performance depends on the PID parameters, the system can be recognized in stability at a normal condition. But we will evaluate the control stability experimentally in Section IV.
IV. EXPERIMENTS AND VERIFICATION
A. EXPERIMENTAL SETUP
In order to evaluate the presented robotic visual tracking methods, the different visual tracking experiments are implemented. The experimental setup is shown in Fig. 13(a) , which is mainly composed of a Delta robot, an FPGA-based embedded vision imaging system, and an industrial control computer. The Delta robot was constructed by ourselves before, which controller is a PCI-based 4-axis motion controller in the industrial control computer. The FPGA-based embedded vision imaging system is constituted of a DE1-SOC FPGA development board with an integrated micro camera (752 × 480 MT9V034 with 4mm-focus lens), a 45 • -slant plane mirror, a semi-transparent optical film, and a diffuse reflection light source. The object position extracted is transmitted to the PC through an RS232 interface.
Based on this experimental setup, a laser tracker AT901 by Leica is adopted for tracking performance evaluation in the visual tracking experiments. A target reflection ball is mounted on the object for the visual tracking experiment, shown in Fig. 13(b) . A target reflection ball is mounted on the end of the robot for the visual tracking experiment based on visual servoing, shown in Fig. 13(c) . In order to evaluate the dynamic tracking performance, a dynamical analyzer (DT9857E with QuickDAQ by DT) with a Kilter piezoelectric accelerator is adopted.
B. EXPERIMENTAL MATCHING COMPARISONS & DISCUSSION
Obviously, the uneven illumination phenomenon can be seen inevitably at the different location of the FOV of our presented imaging system from Fig. 10(c) . Hence, we must evaluate the matching-based location performance of our chosen and designed ZNCC-on-FPGA method comparing with other FPGA-based methods. The ZNCC on a PC and SAD, NCC, and ZNCC on FPGAs are used to implement object matching under an arbitrary object motion speed. We randomly designate the matching points as 39 object positions on the FOV plane, as shown in Fig. 14 . The evaluating results are listed in Table 5 , in which ZNCC on PCs is designated as a metric standard. From the comparisons, we can find that ZNCC-on-FPGAs method exhibit better object-locating performance than the other FPGA-based methods under uneven illumination.
In order to further evaluate the tracking performance, it is necessary to discuss the differences with another competing tracking method reported in [5] . The tracking algorithm in [5] is a kind of iteration-based optimization method, which searches the object in sub-regions of neighboring frames. Hence, the objection tracking in [5] is not strictly timing due to the unstable motion of the object and the non-real-time PC operation system, as may result in severe performance degrading. We evaluate the matching time with [5] 's results (Table 3 in [5] ) under the same frame image 480 × 480 and template 100×100 sizes. Our running time is 3.65812 ms less than the average time cost 4.2 ms of AIOS in [5] . Obviously, our presented FPGA-based ZNCC method runs faster and more stable than AIOS in [5] .
C. REAL-TIME OBJECT TRAJECTORY TRACKING
In order to evaluate the object tracking performance of ZNCC-based template matching on FPGAs independently, a visual tracking experiment without visual servoing is implemented. When an object is dragged and sliding on the glass plate with a semi-transparent film, the object position is computed and recorded in real-time by the FPGA-based ZNCC circuit and is measured by the laser tracker simultaneously. The visual tracking comparisons are shown in Fig.15 . Obviously, our present ZNCC-based visual tracking error is less 88876 VOLUME 7, 2019 than 0.73 mm in object motion speed less than 0.7m/s when the motion trajectory is smooth.
D. REAL-TIME OBJECT VISUAL-SERVOING TRACKING
In order to evaluate the tracking performance based on visual servoing, the robotic visual-servoing tracking experiments under the different object motion speeds are implemented. The actual trajectories, recorded by our FPGA-based ZNCC method and the laser tracker, and their deviations are depicted in Fig. 16 . The four kinds of speed tracking trajectories all are arbitrarily commanded in the irregular traces, where turn sharply at some points. From the results, it is obvious that the tracking deviations increase with the object motion speed, and especially, the dynamical tracking deviation fluctuate largely when the object swerve sharply. But based on the visual servoing control, the tracking deviations tend to motion VOLUME 7, 2019 steadily after the sharp-turning points. This verifies the presented FPGA-based ZNCC robotic visual tracking method is efficient.
E. DYNAMICAL PERFORMANCE OF REAL-TIME TRACKING
In order to further evaluate the dynamical visual-servoing tracking performance, we design a kind of special acceleration-based dynamical experiments. This dynamical experiment requires the end of the Delta robot is commanded to move to a defined position along a linear trajectory at three kinds of motion speeds. When the robot stops, the vibration acceleration temporal signals are recorded simultaneously using a dynamical analyzer described in Subsection A. Meanwhile, three motion speed dynamical experiments are implemented under an open loop and visual-servoing close loop control modes respectively. The vibration acceleration temporal signals are drawn in Fig. 17 . The reduced stabilization time is listed in Table 6 that the maximum vibration accelerations and stabilization times in the closed loop control mode are all smaller than those in the open loop. This phenomenon demonstrates that visual servoing control successfully suppresses the impactinduced residual vibration, and improves the dynamical tracking performance. This can be explained using Nyquist sampling theorem, as the sampling frequency 60 Hz is larger than two times of the first order natural frequency of the Delta robot. The actual sampling frequency is limited by the GPIO interface to the micro camera. When using high-speed lowlevel camera interface, such as Cameralink, the dynamical performance can be further improved largely if measurement precision of the object tracked is increased simultaneously.
V. CONCLUSIONS
Visual-servoing can provide an efficient solution for a Delta robot close loop control. This paper proposes a ZNCC-based grayscale template matching concurrent circuit with cascading MAC computation on FPGAs, which can largely reduce valuable hardware consumption on a FPGA. Based on the presented FPGA-based ZNCC method, we develop a robotic visual-servoing tracking system using a Delta robot. In order to overcome target-matching issue due to background cluttered image, we design a compact optical imaging system with 45 • -slant plane mirror and front optical filter film in the optical path. This largely lowers the complexity of the image preprocessing circuit in the FPGA. Various robotic visual trajectory tracking and dynamical visual-servoing tracking experiments are implemented. The experimental results verify the effectiveness in robotic visual tracking, and the presented visual-servoing tracking method has a better tracking precision and dynamical tracking performance than those in an open loop control mode. 
