Recently embedded technology has been widely applied to machine vision and embedded vision systems are more and more popular. This paper reviews the advances on embedded vision systems, and then compares and analyzes their frameworks in processing ability, cost and performance. A discussion is provided for some unsolved problems for embedded vision systems. Finally, the future of embedded vision system is outlined.
I. Introduction
Traditionally, machine vision systems are realized by personal computers (PC), which usually utilize certain cards to capture images. The image processing is implemented with the CPU of the PC. It has been shown that PC-based vision systems can realize complex image processing algorithms and can satisfy the requirements of general applications. However, PC-based system is of a large size, which limits its applications on compact systems. Recently, the embedded systems are being rapidly developed, which are widely applied in the fields of industrial control, automobile and robotics. With the help of the embedded systems, many kinds of application-specified embedded vision systems (EVS) have been developed.
The rudiment of the EVS roughly was originated in the 1980's. After that, more and more industrial organizations, companies and academic institutions have dedicated their research on the EVS with the advances of the integrated circuit (IC) chips, embedded system design and CMOSbased image sensors. In particular, rapid progress enables three basic elements of the EVS (i.e., image sensor, image processing and results output device) to be built in one unit. The EVS loads the image processing algorithms into the specific hardware including power supply module, I/O module, memory control module, image sensor module and data processing module. These modules are combined to achieve an integrative design of embedded systems, which endows the EVS with the features of low-cost, easy-to-install and easy-to-use. The function and performance of the EVS become very powerful with the advances of the embedded technology, especially the enhanced ability of microprocessor, DSP and FPGA, as well as the increase of memory integration degree with low cost. The attractive characteristics of EVS (i.e., small size, high reliability and portability) make it cover much wide applications from the low-level image processing to high-level video stream processing.
The rest of the paper is organized as follows. The hardware architecture is reviewed in Section II. The related algorithms are discussed in Section III. The existing unsolved problems and the prospects in the EVS are discussed in Section IV. The concluding remarks are provided in Section V. 
II. Hardware Architecture of the EVS
The EVS mainly consists of three modules, namely an image sensor module, a communication module and an image processing module. Figure 1 illustrates the architecture sketch. The image sensor module captures the image information, translates the optical signal into analog and digital image signals, and then sends the signals to the image processing module. The CCD or CMOS sensor with ADC is configured in this module. Communication module is the pivotal part of the EVS which connects all the parts of the EVS together. It is used to transfer the image data, the extracted information, processing results and control information in the system. The communication tasks can be implemented through various interfaces such as Ethernet interface, serial and parallel ports. With Ethernet, a lot of EVS can form a distributed vision system network.
A. Architecture of image processing module According to the number of processors used in image processing module, there are two kinds of architectures, namely the singular processor system and the multiple processors system. For the singular processor system, there exist some embedded processors available for the EVS (i.e., microprocessor, ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Processor) and FPGA (Field Programmable Gate Array). For the multiple processors system, many processors listed above can be combined in either a serial structure or a parallel structure.
1) Singular Processor Architecture: In this architecture, some researchers chose microprocessor as the embedded processor [1] - [3] , [4] , [6] , which is cheap but has limited processing power. Some adopted DSP [5] , [7] ,which is low cost and has more power in image processing and video stream processing. However, more than one DSP are usually employed in practice. Some utilized ASIC, which has good processing capacity but high design cost and risk. Others deployed FPGA [8] - [10] , [11] , [21] . It is the hallmark of the FPGA to function well in developing many parallel vision algorithms. The exploitation of FPGA-based EVS has the merits of short development cycle and fast implementation. More advanced applications to either HPCs (Handheld Personal Computer) or PDAs (Personal Feature: Development of the Embedded Vision System: A Survey Digital Assistant) [15] have displayed a trend of swift growth in recent years.
a) Based on Microprocessor: RISC processor can be chosen as the embedded processor in the processing module [1] - [3] , [6] , and the ARM microprocessor [20] can also be adopted. Because of the limited ability of microprocessor, the algorithms that can be realized in the EVS based on it are simple, such as color blob tracking, color statistics [1] - [3] , frame differentiating, edge detection and color histogramming [1] , [2] . An EVS based on SX52 microprocessor and CMOS image sensor was developed in [1] . It consists of four chips, namely a CMOS image sensor (OV6620), an SX52 microprocessor, an RS232 level shifter and a frame buffer. Figure 2 illustrates its hardware architecture. The functions of this system includes color blob tracking, color statistics, frame differentiating and noise filter. An example of color blob tracking is shown in Figure 3 (a), and an example of frame differencing is given in Figure 3 (b) . A frame buffer is utilized in this system. It makes its processing speed up to 50 fps, and this speed is considerable high compared with low cost processors. This system has also low power consumption. The accompanying drawback, however, is the handicap to developing additional on-chip algorithms in firmware given the scarce resources and complexities of firmware coding on the SX52 microprocessor. Figure 3 : Processing image [1] :(a) Color blob tracking; (b) The left image is the reference frame, and the right one shows the bitmap A novel miniature programmable vision module that combines an analog very-large-scale-integrated-circuits (aVLSI) with a digital post-processor (MPC555) was presented in [6] . The MPC555 controls the sensor read-out and allows for additional, high level processing of the image and optical data. This system is intended to offer cheap yet powerful vision capabilities to small robotic platforms. It also has the potential in scientific education, where students could easily implement different computer vision algorithms well suited for cognitive tasks such as ego-motion estimation or object tracking. b) Based on DSP: DSP is a processor for signal processing, which completes many processing algorithms using specific hardware internally. The instruction set of DSP as well as its logic are fixed, which implies that the connections between logic gates cannot be changed. The systems based on DSP are more powerful than microprocessor, so more functions are obtained such as object tracking and classification [5] , [7] . The cost of this kind of EVS is lower than that of ASIC.
An EVS based on temporal contrast vision sensor and DSP was developed in [5] . It comprises temporal contrast vision sensor, FIFO buffer memory and a simple low-cost and low-power consumption fixed-point DSP. Figure 4 shows the architecture of the vision system, where the vision sensor completely suppresses image data redundancy and encodes visual information in sparse Address-Event-Representation (AER) data. The sensor delivers high temporal resolution data at a low data rate. Various post-processing algorithms, such as object tracking, vehicle speed measurement and object classification have been implemented on this EVS. In the experiments of traffic acquisition in the highway and people tracking, this system acquires very good results as shown in Figure 5 . This system is of little volume, low power consumption, whose total size is 7x7x7 cm and the power consumption is only 2.5 W. Figure 4 : System architecture [5] Figure 5 : People tracking results [5] c) Based on ASIC: ASIC is a hardware dedicated to some fixedpoint algorithms or special applications. Sometimes, the system implemented by general processor or DSP does not satisfy the speed requirements of image processing algorithms. An alternative is to implement EVS with ASIC. The ASIC is the fastest in the implementation of various algorithms. However, it is not practical for low volume applications and it is expensive to produce in small quantities. To develop an EVS with ASIC is a high-risk and time-consuming task. Since the ASIC is designed for specific application, it needs to redesign the chip and hardware circuit when the algorithm alters. Therefore it lacks flexibility for the extended usage.
d) Based on FPGA: FPGA is a chip accompanying with the development of IC manufacture technique. The functions of the chip can be changed by varying the internal hardware logic. Therefore it could meet the needs of the applications through programming. The main advantage of FPGA-based design is the flexibility to exploit the inherently parallel nature of many vision problems. Compared to ASIC designs, the design-implement-test-debug cycle with FPGA is relatively short, and making minor modification to the existing design is a simple task. In contrast to ASIC hardware, less actual hardware is needed if the system is designed to support multiple, mutuallyexclusive modes of operations. The vision system using FPGA to implement the image processing algorithms is not only flexible in software design but also near to the high execution speed of the specific hardware. Indeed, it has a high performance and price ratio.
A programmable parallel architecture which is to be used for signal pre-processing in intelligent embedded vision systems was described in [8] . It is implemented and tested using a Celoxica RC 1000 Prototyping platform with a Xilinx XCV2000E FPGA, and the image is captured by a CMOS digital camera. Figure 6 illustrates its architecture. Some preprocessing functions are provided (i.e., filtering, correlation, transformation and edge detection). The speed of processing is up to 667 frames per second on an image of 256x256 pixels. The system is implemented on an SOPC (System on a Programmable Chip) and has parallel architecture and fast processing speed. Figure 7 is the result of the pre-processing for number plate recognition.
Feature: Development of the Embedded Vision System: A Survey Figure 6 : System architecture [8] Figure 7 : Results of numberplate [8] In [11] , a novel FPGA-based architecture dedicated to active vision was proposed. It permits a high degree of versatility and allows the implementation of parallel image processing algorithms. Its design is based on the assumptions that the strategy of visual processes can be divided into three successive tasks: attention, focusing and high-level processing. Figure 8 demonstrates its architecture. As shown in Figure 9 , search is primarily achieved by performing the edge detection of the attention module and the template tracking of the focusing module as the examples. This approach is based on FPGA technology and a CMOS imager, and it reduces the classical bottleneck between sensor and processing. This design results in a high-speed and real-environment vision. The drawback is that the high-level processing has to be performed on a host computer rather than on the embedded system. Figure 8 : Architecture of the system [11] 2) Multi-processors Architecture: There are different architectures in the multi-processors system including serial and parallel structures [16] , [17] . In the former case, each processor does a part of the image processing task in series and the processors are cascaded with memory buffer (see Figure 10 ). In the latter case, however, the image data is divided into many blocks which are processed by different processors (see Figure 11 ). The parallel structure is limited by the board space and cost, and it is unsuitable for all image processing. In addition, the positions of the processors are not equal in general. One processor is used as the main processor, and the others are co-processors. They collaborate to realize the needed functions. The collocations of processors include SIMD (Single Instruction Multiple Data) processor with micro-controller [19] , [22] , DSP with FPGA [12] - [14] , FPGA with ANN (Artificial Neural Network) [18] , and so on. The cost of the EVS based on this architecture is higher than the singular processor architecture, as the design circuit is complicated. a) Based on SIMD processor and micro-controller : In this combination, the SIMD processor dedicates to the image processing, and the micro-controller, e.g., 8031 and 8051, is used as the local host to execute the high-level processing task without real-time require-ments. The micro-controller has all the necessary components inside to make a small, yet complete system. It has also a large number of usable I/O pins to control the image sensors and its peripheral.
In [19] , a wireless smart camera based on an SIMD video-analysis processor (named IC3D) and an 8051 microcontroller as a local host was introduced. The IC3D is a member of the NXP semiconductors' Xetal family of SIMD processors. Its kernel is formed by the Linear Processor Array (LPA) with 320 RISC processors with data paths of 10-bits wide. It executes the low-level image processing, and the 8051 is for intermediate and high-level processing and control. Both processors are coupled using a dual port RAM that enables them to work in a shared workspace on their own processing pace. Despite its high pixel-performance, the IC3D is an inherently low-power consumption processor. For typical applications such as feature finding, face detection and the power consumption is usually below 100 mWatt. b) Based on DSP and FPGA: The system whose processor architecture is the hybrid structure of DSP and FPGA has Figure 9 : The processing results [11] : (a) Motion detection; (b) Template tracking Feature: Development of the Embedded Vision System: A Survey the programmability after manufactured. It also provides high computation performance and can be used in many applications of real-time image processing. Its main advantage is the structure flexibility. The hybrid architecture is much general and is suitable for modular design. Its development cycle is short and the system is easy to be maintained. There are different designs on the computation tasks schedule between DSP and FPGA for different systems. In general, the DSP is the main processor and the FPGA is the co-processor. There is also a problem on the parallel degree even though in the cases that DSP and FPGA are running at the same time, or DSP is free to wait for the results of FPGA when FPGA is processing. The performance of this processor architecture is good. It is used commonly in image processing.
An EVS was proposed in [12] , which is composed of a DSP and a dedicated LSI (Large-scale integration). Some low-level image processing is implemented by LSI, such as spatial filtering, feature extraction and block matching operations. This system employs TI's TMS320C6713 as the floating point DSP and Altera's FPGA device as the image processor LSI. The system architecture is shown in Figure 12 . A stereovision-based navigation algorithm is implemented on a mobile service robot with this system. It performs a visual navigation in a building hallway. Figure 13 illustrates the map and path information. Figure 14 gives the results, green and blue points are feature points that match with the map, yellow points are feature points that do not match with the map, the red point is an estimated current position. The system operates at 86 MHz, and the power consumption is only 10 W. Figure 12 : System architecture [12] Figure 13 : Map and path data [12] Figure 14 : Navigation experiment results [12] An EVS based on DSP and FPGA was described in [13] , where the DSP is used as the main processor and the FPGA is used as the coprocessor. Both the DSP and FPGA are driven in parallel for the execution of crucial parts of the vision algorithms. This method is called Resource Optimized Co-processor (ROC) here. Conventionally, co-processors are used to accelerate time-consuming calculations by substituting DSP routines. The DSP interrupts execution, starts the co-processor, and waits for co-processing to finish. However, the ROC is different, the execution flow is illustrated in Figure 15 . As the DSP and FPGA are running in parallel, the processing speed of the system is fast enough to satisfy the real-time requirements of high speed vision. As an example, the realization of an embedded vision sensor in robot soccer is shown in Figure 16 , where the usefulness and the powerfulness of the ROC approach are demonstrated. The algorithms realized are Bayer interpolation, background filtering for removing the playing field, HSV-based segmentation for a robust, color based classification and region-based detection to identify the ball and the position and viewing angle of each robot. It can be seen from Figure 16 that the original DSP version needs 16 ms for execution, the conventional co-processing technique needs 18.7 ms and only 8.6 ms is needed for ROC approach. [13] :(a) Without co-processor; (b) Conventional co-processing; and (c) Resource optimized co-processing c) Based on FPGAs and ANN: So far there has been little research on the systems combining traditional image processing techniques with ANN. A hybrid architecture with FPGAs and ANN was proposed in [18] . It combines programmable hardware for the image processing tasks and a digital hardware implemented by ANN for the pattern recognition and classification tasks. FPGA is fused or the preprocessing. The EVS with FPGAs and ANN has been successfully applied to real-time detection and recognition of road signs.
Feature: Development of the Embedded Vision System: A Survey
The main performance of the EVS with different architectures is listed in Table I . In the singular processor architecture, the processing power of EVS based on ASIC is the best, but its cost is high. For the multi-processors architecture, the performance of EVS based on the hybrid of DSP and FPGA is the best, and this architecture provides a good choice for the EVS.
B. EVS Networks
"More eyes see more than one," a network of cameras can do something complex by utilizing multiple views and collaborative processing. Of course, it is not only the simple multi-camera vision, as image processing module is designed into each camera. Therefore a lot of EVS would configure a network to make them cooperate with each other in distributed mode [19] , [22] - [24] . Its structure is shown as in Figure 17 .
In [22] , a smart camera network is applied to human pose estimation and is dedicated to the distributed algorithm design. Distributed processing of acquired videos at the source camera facilitates the operations of scalable vision networks by avoiding transfer of raw images. This allows efficient collaboration between the cameras under the communication and latency constraints, leading a good performance.
III. The Related Algorithms in EVS
As shown in Figure 1 , the EVS mainly consists of three modules (i.e., image sensor module, communication module and image processing module). The core of the EVS is the execution hardware such as image processing module and algorithms design. Although many algorithms are proposed in PC-based vision system, there are still many problems on the performance such as robustness and real-time ability when they are directly employed by EVS.
A. Image pre-processing
The image pre-processing is necessary in applications, which makes the successive image processing more accurate. The image preprocessing algorithms involve noise removal, binarization, smoothness, regularization, linear or nonlinear transformation, image enhancement, median filter and so on. They can be implemented in EVS.
B. Image segmentation
Image segmentation algorithms involve making classification based on image feature space, method based on region (such as region growing), method based on edge (such as edge detection/ active edge), method based on function optimization (such as Bayesian), the hybrid method considering both edge and region information. For simple segmentation algorithms, the EVS can do very well, such as threshold segmentation and edge detection.
C. Motion detection and tracking
Motion detection algorithms include background differentiating, frame differentiating, optical flow and the front view modeling. Motion tracking algorithms include tracking based on point, region, contour and model. For background differentiating, frame differentiating and some simple optical flow methods, the EVS can realize them well. It needs lots of calculations for motion tracking. Therefore it is a challenge when EVS is utilized for this application.
Driven by the research interest and application requirements, the hotspot of algorithms research includes improving the performance and robustness of the existing techniques, 2D and 3D tracking fusion, irregular detection and behavior prediction, video surveillance and biometric identification, and multi-sensor data fusion.
IV. Discussion
Although the EVS has made a great progress, there are still many challenges. On one hand, the nature of EVS inherently restrains the hardware configuration, such as image sensor, processor and communication format. On the other hand, some algorithms are executed well in the experiment environment, but are hard to have good execution in practice. More powerful processors can be developed to overcome the shortage of the EVS in processing power, and multiprocessors parallel architecture can also be designed to do it. Additionally, more optimized algorithms are needed for EVS. With the development of design technique and manufacture technology, the embedded processors are more and more powerful. The hybrid Figure 16 : The results [13] Fig.ure 17: Networks Structure Feature: Development of the Embedded Vision System: A Survey of FPGA and DSP, or FPGA and ANN could be a good choice in the parallel architecture design for EVS. The key of multi-processors parallel architecture is how the resources of various processors are used to distribute the tasks and harmonize the communication among the processors. Therefore designing good parallel algorithms is an utmost important task to make the processors do their best in their positions and make the performance of the whole system best. The optimized algorithms can be designed neatly with the nature of the system to exert the strongpoint and avoid the weak point. For distributed applications, a good and efficient distributed algorithm is very important, which refers to the data fusion and data communication. As we know, the wireless communication is limited by the bandwidth, therefore it is the key to design a system which does not transfer the raw images but the processed little data in each EVS. In addition, how to make good use of the data transferred by the distributed EVS is another problem. Further investigation on refining perfect fusion algorithms to achieve this purpose is therefore needed.
As the characteristics of EVS, low-cost, easy-to-install and easy-to-use, make EVS develop fast. In the middle or low-level vision applications, the processing performance of EVS catches up with the PC-based vision system. With the development of the embedded vision techniques, the processing speed will be faster, the precision will be higher, the operation will be easier and the cost will be lower. The EVS will be used in more high-level vision applications in the future. It will play an much important role in the fields of machine vision, industry, medical, military affairs and aviation.
V. Conclusion
In this paper, the previously related work and the latest technologies about the EVS are reviewed. The general architecture is also described in the aspects of processing capacity, cost and performance in details. Although much progress has been made in the design of EVS, there is a long way to create an intelligent EVS. The main existing problems of the EVS are discussed. Perhaps a promising approach is the hybrid of FPGA and DSP, or a hybrid FPGA and ANN, which exhibits a bright future of the EVS for diverse emerging real-world visual applications.
