We present a software-hardware cooperative method for multiprojector seamless tiled display system, and it mainly consists of two steps. Firstly, an ordinary PC (Personal Computer) is used to preprocess the display system with software. Secondly, one or more hardware image processors, which are based on FPGAs (Field Programmable Gate Arrays), are used to process the source video image without the aid of computers. And an optimized algorithm is used in the method. Experimental results show that the proposed method is effective and of low cost and high performance.
Introduction
Large screen display system is widely used as a display interface of multimedia applications, such as scientific visualization, immersive cinema, etc. Kinds of tiled display solutions have been proposed, as introduced in [1] , to overcome the limitation on geometrical size of the existing display devices. And the solution of multi-projector seamless tiled display system becomes more and more popular.
Multi-projector seamless tiled display system mainly consists of a large screen, a piece of video image processing equipment, and an array of projectors. Introduction of building the system can be found in [2, 3] .
PC cluster and complex hardware processor (usually consists of GPUs or FPGA arrays) are usually used as the image processing equipment of the tiled display systems. The solution of PC cluster uses PCs and software to process video images. And kinds of splicing algorithms could be implemented due to the high flexibility of software. However, this solution is usually faced with the problems of virus infections and system crashes, resulting in the reduction of the display system's robustness. The solution of complex hardware processor uses expensive hardware to process video images. Without software in daily projection, virus infections and system crashes could be avoided, and the system's robustness could be very high. However, the splicing algorithm of the hardware processor is usually fixed, resulting in the limitation of the display system's flexibility. No matter the PC cluster or the complex hardware processor is used, the cost and the power consumption of the display system would be very high.
FPGA is a kind of programmable chip which is applied in the field of data computing frequently. The computing capability of a single FPGA chip has been proved to be high enough for real-time image processing [4, 5, 6] .
In this paper, we propose a software-hardware cooperative method for tiled display system. A PC is employed to preprocess the system, and several FPGAbased image processors are employed for daily image processing. An optimized algorithm is used in this method. Benefiting from the software-hardware cooperation and the optimized algorithm, this method is able to process high-resolution video image with low cost and power consumption, and the display system could achieve high flexibility and robustness at the same time. In the next section, the mechanism of the proposed method is presented, and the hardware framework of the image processor is also proposed. In section 3, experiments are presented to illustrate our approach. Finally, the conclusions are drawn at the end of this paper.
Method
To build this seamless tiled display system, a large screen and an array of projectors should be placed steadily first. As shown in Fig. 1 , the projection regions of the neighboring projectors should be overlapped slightly. Then, the screen and the projectors could be used for image projection with the proposed method.
As shown in Fig. 2 , two key steps, preprocessing and image processing, are included in the proposed method. If the system has been preprocessed before, the step of preprocessing could be skipped, and the step of image processing could be executed directly.
Algorithm and preprocessing
As the projectors' optical axes might not be precisely perpendicular to the screen, the images displayed on the screen are usually distorted. The distortion would be very obvious when the screen is curved. The images transported to the projectors should be geometry calibrated to get a visually comfortable image on the screen.
The color across a geometry calibrated multi-projector display can vary significantly. This problem can be caused by inter-projector color variation, overlaps between projectors, etc. Color calibration is needed to solve the problem [7] .
A series of splicing algorithms for geometry calibration and color calibration have been proposed. Y. Chen et al. [8] proposed an automatic calibration method that relies on an un-calibrated camera, and a computer vision based geometric calibration method was introduced in [9] . [10] and [11] presented algorithms for color calibration. A multi-projector seamless tiled display system for the marine simulator was introduced in [12] . The computation loads of these algorithms are very high during real-time calibration, leading to high cost and power consumption. In our approach, an optimized algorithm is proposed to reduce the computation load.
A PC is firstly employed to output an array of reference images (I ra À I rz ) to the projectors, as shown in Fig. 3 . And several reference points are distributed in the reference images for analyzing.
In order to facilitate the calculation, an image could be considered as a twodimensional space, and the pixels of the image could be represented by the points in the space. For a single projector, we can separately define the image transported to projector and the image displayed on the screen as projection space S p and display space S d .
A Bezier surface function is used to describe the mapping function F 1 between S p and S d :
Where ; 2 ½0; 1,
And p ij (i ¼ 0; 1 . . . ; N; j ¼ 0; 1 . . . ; M) represent the control points of the Bezier surface. The coordinates of the reference points in S p and S d are measured by the PC and substituted into equation (1) . Thus the values of control points could be calculated out, and the mapping function F 1 between S p and S d is gotten:
If the source image is expected to be displayed on the screen without distortion, just the same as the source image, the mapping function between display space S d and source image space S s could be gotten:
Thus the mapping relationship between S s and S p is:
For an arbitrary point Pðx; yÞ in the projection space, we define its geometry calibration parameter ðx; yÞ as the corresponding point's coordinate in the source image. Value of ðx; yÞ could be calculated from (5) . To reduce the brightness of the overlapping regions, an intensity blending coefficient mðx; yÞ which represents the luminance weight of a pixel, is defined for each projection space. As shown in Fig. 4 , m declines smoothly from 1 to 0 in the overlapping region, which represents that the sub-images fade from 100% intensity to 0% intensity at their edges. For any pixel on the screen, the sum of m from different projectors is 1, and thus the combined intensity in the overlapping region is reduced to be the same with the intensity in the non-overlapping region.
Then, the problem of inter-projector color variation should be solved. The R, G, B channel color curves I ij ðCÞ (i ¼ 1; 2; 3 . . . ; j ¼ R; G; B) of the projectors are measured separately, where i represents the projector number, j represents R/G/B channel, C represents the R/G/B color value. As a common solution, a set of virtual color curves T j ðCÞ (j ¼ R; G; B) are fitted out to unify all the projectors. For an arbitrary pixel Pðx; yÞ of projection space S p , its corresponding point's coordinate ðx; yÞ in the source image could be gotten through geometry calibration, and its color value Cððx; yÞÞ could be sampled from the source image. Taking virtual color curves T j ðCÞ and the intensity blending coefficient mðx; yÞ into consideration, the pixel's intensity on the screen would be mðx; yÞ Á Tj½Cððx; yÞÞ. With the projector's real color curves I ij ðCÞ, pixel Pðx; yÞ's color C p would be:
Thus, the geometry calibration and color calibration are both involved in the algorithm above. But the algorithm is not perfect enough. Six operations, most of which are very complex, are needed for one output pixel in real time image processing. In particular, the results of mðx; yÞ Á Tj½Cððx; yÞÞ are fractional, with an infinite number of values, and the function of I ij ðCÞ is an irregular curve fitted according to a plurality of sampling points, leading to a huge real-time computation amount of function (6) and a high computation performance demand for real-time image processing equipment.
In our approach, FPGA-based hardware is used as the real-time image processing equipment. The algorithm above is further optimized to reduce the real-time computation load according to the feature of FPGA, which has a good performance on multiplication and lookup table operations, and an unsatisfactory performance on complex function operations.
Practically, the curves of I ij ðCÞ and T j ðCÞ are very similar to the curves of power functions. According to the feature of power function, equation (6) could be transformed as: Since all the possible values of x, y and C are all limited and countable, the operations of I À1 ij fmðx; yÞg, ðx; yÞ, and I À1 ij fT j ðCÞg could be calculated in advance to be three lookup table, named L 1a , L 1b and L 2 , respectively. The variables of I À1 ij fmðx; yÞg and ðx; yÞ, i.e. x and y, are totally the same, thus L 1a and L 1b could be combined into one lookup table, named L 1 . Equation (7) could be transformed as:
With the results of advance calculation, two simple lookup operations (L 1 and L 2 ), one sampling operation (C) and one multiplication operation are needed to be implemented in real time for one output pixel. The real-time computation amount for calibration is reduced significantly.
During the preprocessing step of our proposed method, the two lookup tables above are calculated out by PC software to be the calibration parameters, and then stored into a text file. Once the parameters are gotten, the preprocessing step is finished. For a steadily built tiled display system, the positions of the projectors and the screen won't be changed obviously in daily use, which leads to the steadiness of the parameters. Thus the preprocessing step could be omitted until the positions of the projectors and the screen are changed obviously.
Then, the operations of (8) will be real-time implemented by hardware image processors in the image processing step.
Image processing
Once the preprocessing step is finished, the PC becomes useless and could be removed from the system. As shown in Fig. 5 , one or more hardware image processors are connected with the projectors. Each of the processors could output several sub-images to the projectors. For a small display system with few projectors, one processor is enough to output sufficient sub-images to the projectors. When the number of the sub-images output from a processor is increased, the cost and manufacturing difficulty of the processor would rise rapidly. If the display system is expected to be extended with more projectors, more processors could be used together for image processing.
Each of the processors contains a SD card (Secure Digital memory card), which stores the parameter text file generated in the preprocessing step. A high resolution video image (source image, I s ) output from any device, such as a computer, a camera, etc., is transmitted to the processors through a one-to-many cable. According to the parameters stored in the SD card, each of the processors divides and processes the source image into several sub-images, which are then transported separately to the projectors. The images projected on the screen are spliced seamlessly and a visually comfortable high-resolution image is displayed on the screen.
The key components of an image processor are shown in Fig. 6 . A Xilinx FPGA is employed as the core device of the processor, controlling the other chips on board and processing the image data in real time. The Texas instruments TFP401 chip between the DVI (Digital Visual Interface) input interface and the FPGA is used to decode the DVI format source image data and then transport them to the FPGA in real time. Three pieces of Samsung SRAMs (Static Random Access Memories) are connected with the FPGA. SRAM I and II are used as the buffers for caching the decoded image data. Since SD card is able to store data without power and convenient to carry, a Kingston SD card is used to store the parameters generated during the preprocessing step. However, the working frequency of SD card is not high enough for real-time image processing. Thus SRAM III, whose highest working frequency could reach 250 MHz, is employed to temporarily cache the parameters stored in SD card. Several DVI encoding chips (TFP410) are connected with the FPGA and the DVI output interfaces, to encode the sub-images into standard DVI video signals for output.
The flow diagram of hardware image processing is shown in Fig. 7 . When powered on, the source video image data decoded by TFP401 are stored into SRAM I and SRAM II in a Ping-Pong mode, and the parameters L 1 and L 2 stored in the SD card are read out by the FPGA. Then, L 1 is cached into SRAM III, L 2 is cached in FPGA. And the FPGA starts to generate the output sub-images pixel by pixel. For each output pixel, its L 1 value is read out from SRAM III, the pixel value CðL 1b Þ is sampled from the source images stored in SRAM I and SRAM II, and L 2 ½CðL 1b Þ is read out. The multiplication result L 1a Á L 2 ½CðL 1b Þ is just the output pixel's color value of the sub-images we want. The sub-images are generated by FPGA in real time and transported to the TFP410 chips for encoding. Finally, the encoded sub-images are output to the projectors through DVI cables. 
Experiment results
To verify the proposed method, several dual-output image processors are manufactured. As shown in Fig. 8 , the image processor is realized on a PCB (Printed Circuit Board), with a length of 15.0 cm and a width of 10.0 cm. Through one DVI-input and two DVI-output interfaces, the processor is able to accept one DVI video and output two DVI videos.
A Xilinx Spartan-6 FPGA is chosen to be the FPGA chip of the processor. Pipelining and fan-out controlling techniques are used to improve the processing speed in the design of FPGA. The main cost of the FPGA resources is 1822 slice registers, 1301 slice luts, and 272 I/O pins. After testing, the processor is proved to be able to process any video that a single-link DVI cable can transfer. And the power consumption of a processor is 7.2 W (12 V, 0.6 A).
To test the image processing performance of the processor, a two-projector system is established. Two BenQ MP771 projectors are employed to project images onto the large screen and the projection regions of the two projectors are partly overlapped. We preprocess the system with an ordinary PC and the parameters are stored in the SD card of the image processor. Then the computer is removed and an image processor is connected with the projectors, as shown in Fig. 9 . A computer is used to output a source video image, with a resolution of 1600 Â 720, to the processor. When powered on, the FPGA-based processor could output two sub-images. And the sub-images, whose resolutions are both 1024 Â 768, are projected onto the screen by the projectors. Each of the video images mentioned above comes with a color depth of 24-bit and a refresh rate of 60 Hz. As shown in Fig. 10(a) , the processor is firstly shut down and the projectors project default blue-ground images onto the screen. The images displayed on the screen are distorted and partly exceed the scope of the screen. And the overlapping region on the screen is much brighter than non-overlapping regions. Then, the processor is powered on and employed to divide the source image according to the calibration and blending parameters. As shown in Fig. 10(b) , the sub-images are properly projected within the scope of the screen, and they are seamlessly spliced together without distortion. The brightness of the overlapping region is normalized. Finally, a visually comfortable high-resolution image is displayed on the screen.
If a higher resolution is needed, the system could be further extended with more dual-output processors. A four-projector system with two dual-output processors is established for high-resolution projection. And its working mechanism is similar to that of the two-projector system. Once the step of preprocessing is finished, two dual-output processors are connected into the system. One of the processors is connected with two projectors and the other one is connected with the other two projectors. As shown in Fig. 11(a) , the processors are firstly shut down and the projectors project default blue-ground images onto the screen. Then the processors are powered on and employed to divide the source image according to the calibration and blending parameters. The source video image is enhanced to 3000 Â 720 resolution, with a color depth of 24-bit and a refresh rate of 60 Hz. And it's duplicated into two video images through a one DVI to two DVI cable. The two video images are then transmitted to the two processors respectively. Each of the processor outputs two sub-images (1024 Â 768, 24-bit, 60 Hz) to two projectors. As shown in Fig. 11(b) , the four sub-images are projected onto the screen and a highresolution image is displayed on the screen.
The system is also competent if the projectors of the four-projection system are placed as a 2 Â 2 array, as shown in Fig. 12 . Table I , the proposed image processor (PIP) is compared with PCbased Marine Simulator [12] , software-based Blackbox [13] and complex hardware-based GRID [14] . From a functional point of view, each kind of equipment is able to support geometry calibration and color calibration, which are the most important functions of a tiled display system, and [12, 13, 14] have more auxiliary functions. The fifth line of the table shows the approximate minimum power consumption per output channel (MPCPOC). The power consumption of Marine Simulator is not given, but we can infer from [12] that the minimum power consumption per output channel is similar to the power consumption of a computer, which consumes tens of watts at least. The last line of the table shows the minimum cost per output channel (MCPOC). Since the cost of raw materials and the exchange rates are floating, the values in this line are approximate. Blackbox [13] and GRID [14] are commercial products and their costs are secrets, thus their prices are used instead in this line. From this table, we can find out that the proposed image processor's power consumption and cost are very low.
As shown in
Yamasaki, Masami, et al. also proposed a software-hardware cooperative tiledprojection display system in [15] , and significant differences exist between [15] and the system proposed in this paper. It can be inferred from [15] that PCs, the number of which is equal to that of the image processing hardware, and UDP-based network aids to process the video image in daily projection. Whereas the method we proposed could process video image merely with several simple hardware processors in daily projection, which reduce the cost and the power consumption of the system significantly. Compared to Fig. 13 of [15] , Fig. 7 of this paper shows a more streamlined algorithm for hardware. The optimized algorithm we proposed could reduce the real-time computation amount significantly, and thus low-end FPGA could be used in this system to reduce the cost. Lower real-time computation amount could further reduce the power consumption of the hardware processor. The algorithms calibration performance is proved to be very effective by the experiments above.
Conclusion
In this paper, we demonstrate a software-hardware cooperative approach for multiprojector seamless tiled display system. Two steps are included in this approach, i.e., PC-based preprocessing and hardware-based image processing. And an optimized algorithm is used in the method. In our approach, the PC is only used to generate the calibration parameters during the preprocessing step. Once the parameters are gotten, the PC becomes useless and could be removed from the system. In daily projection, the hardwarebased image processors, with low cost and power consumption, are employed instead to process high-resolution video images into a series of sub-images. Without the use of software in daily projection, the hardware-based processor is able to achieve high robustness.
Experiments show that high-resolution video images could be divided into several low-resolution sub-images by the proposed image processors, and the subimages are projected onto a large screen, presenting a seamless high-resolution image to the audience. An optimized algorithm is used in the method and the processors are realized based on low-cost Spartan-6 FPGAs. The cost and the power performance of the system are very low, whereas the calibration performance of the system is very effective. The system could be further extended with more processors and projectors if a higher resolution is needed. Overall, the proposed software-hardware cooperative method is effective and of low cost and high performance.
Acknowledgements
This work was partially supported by the NSFC (No. 61472350) and the 863 program of China (No. 2012AA011902). 
