Abstract. We present an FPGA implementation of image processing algorithms, which is attached to a special purpose embedded board. Though typical embedded boards are weak in their computing powers, we strengthen them with FPGA-based parallel processing chips. Our implementation finally shows remarkable speed-ups. We present our overall architecture and implementation details. Our system shows a reasonable solution to enhance the computing power of relatively low-power devices.
Introduction †
When an digital image is processed with pixel-wise masking operations, most of its processing time are used for repeatedly applying the same operation to each pixel [1] . These kinds of repeated operations are one of the most suitable ones to be accelerated by parallel processing techniques. Thus, we have plenty of parallel-processing methods for these image processing operations.
In various fields including automatic control of mechanical robots, there are lots of hardware devices based on these kinds of simple digital image processing techniques [2] . These devices generally have plenty of hardware specifications, while their purposes are usually restricted to the pre-defined areas. In this case, it is most important to find the most optimal way of implementing the specified functionality with the restricted hardware devices.
In this paper, we aim to apply image-processing techniques for the motion control of a two-legs walking robot, as shown in Figure 1 . As its main computing power, this robot is equipped with an embedded board, which is actually hard to achieve image processing operations in real-time. Thus, we accelerated the given hardware with parallelized special hardware: an FPGA chip.
We analyzed the required image processing operations, and designed overall architecture. We use HDL (hardware description language) [3] to design our specialpurpose FPGA chips. This FPGA implementation of parallel image processing algorithms enables us to achieve the real-time motion control of the walking robot, which actually removes the pre-defined shaped objects. Our implementation result shows the way of accelerating low-tier embedded systems with FPGA-based parallel processing.
Fig. 1. Our final implementation:
Section 2 represents our design of the parallelized image processing system, and also our FPGA implementation. Implementation results are in Section 3, and finally conclusions are followed in Section 4.
Design and Implementation
The target embedded device is an ARM-based development board, with uCLinux Operating System. More specifically, it is equipped with the Eagle-series main CPU, 64MB SDRAM, 64MB NAND Flash memory, and an FPGA connection with 16bit address and data buses [4] .
Our overall system architecture is represented in Figure 2 . The simulator part has actually three different versions of image processing algorithms: a PC-based software implementation, another target embedded device-based software implementation, and our FPGA-based parallelized implementation. We can execute a specific image processing algorithm for all three implementation, and compare the results each other. The two-legs walking robot act as an I/O device. Its internal camera gets the digital images and transfers those images to the embedded board. The robot motion control algorithm works on this embedded board, with the computing support of the FPGA chip. Finally, our robot detects some geometric objects as obstacles in its walking path. The camera-captured images from the robot are 320x240 resolution with RGB colors. At the first stage of our algorithm, we apply a median filter to remove potential noises. And then, a morphology filter is used to additionally remove noises in the image.
At the next stage, the distance-transformation matrices are used to isolate the candidate regions for the geometric objects [5] . It also checks their RGB colors with color correction techniques. Additional location information are used to identify those geometric objects in the input images. Figure 3 shows the information flow on the embedded board attached to the walking robot. From the NTSC-type video camera, the images are captured and converted into digital images. Then, our FPGA chip performs a median filter and another morphology filter on those images. The main CPU processes the distancetransformation matrix algorithm in its sequentially implemented execution. Then, the
Advanced Science and Technology Letters
Vol.28 (CIA 2013) motion control algorithms for the walking robot are executed to make required motions of the robot.
Implementation Results
To make FPGA implementations, we carefully selected a set of mask-based image processing algorithms: median filter, morphology filter, mean filter, sharpening filter, and edge detection filter. For comparison purpose, we actually implemented all these algorithms on three different forms: a PC-based software implementation, an embedded board-based software implementation, and an FPGA implementation. In the case of the embedded board-based software implementation, the target embedded board has relatively low computing power, and thus, shows much slow execution speeds. For example, the median filter algorithm works in 1,070 msec on the target embedded board, while the PC-version works in 44 msec, which is about 25 times faster. Our FPGA implementation fully utilizes the speed-up of parallelized implementation, and shows much faster execution speeds of 97 msec. For the practical experiments, we have checked the success and failure of the 2-legs walking robot, as shown in Table 1 . We provide three different types of objects: circles, triangles, and squares. We tested 30 times for each object, to both of embedded-board based software implementation and our FPGA-based parallelized implementation. As shown in Table 1 , our FPGA implementation outperforms the software implementation for circles and triangles, while shows similar score for squares. Conclusively, our FPGA-based parallelized implementation outperforms the software-based implementation, with remarkable speed-ups.
Conclusion
In this paper, we realized specific image processing algorithms into their parallelized implementations, as FPGA chips, for relatively low-powered embedded devices. Our final analysis shows that these FPGA implementations mark final scores equal to or better than typical embedded device implementations. In contrast, from the processing time point of view, it works much faster than embedded device implementations, and even similar speed to the PC implementations. Our implementation results show that parallelized implementations of image processing algorithms on FPGA chips can supplement the low computing power of typical embedded devices.
