Abstract-Wavefront aberration caused by turbulent or rapidly changing media can considerably degrade the performance of an imaging system. Adaptive optics can dynamically compensate these wavefront distortions and so provide corrected imaging. We developed an affordable adaptive optic system which combines CMOS sensor and LCOS display technology with the FPGA devices parallel computing capabilities. High speed and accurate wavefront sensor is fundamental part of any adaptive optic system. In this paper, an efficient FPGA implementation of the Sum of Absolute Differences (SAD) algorithm is introduced which accomplish correlation based wavefront sensing. This architecture was implemented on a Spartan-3 FPGA and is capable to measure the incoming wavefront at the speed of sensor data acquisition speed.
INTRODUCTION
Rapidly changing, turbulent media causes wavefront distortions, which results in random phase aberration to the imaging system. Wavefront sensors can measure these wavefront distortions and within an Adaptive Optic (AO) system they can be dynamically compensated using some actuator device [1, 2] . This way, the AO system provides an aberration corrected imaging. Adaptive optic devices can be applied not only in astronomic telescopes where it has to cope primarily with the atmospheric turbulences, but also in other diverse fields from ophthalmology to laser welding, or to telecommunication.
We have developed a simple, affordable adaptive optic system, which combines a high speed CMOS sensor, a Liquid Crystal On Silicon (LCOS) display, and Field Programmable Gate Arrays (FPGA) technology. Although, our project primarily aims to a special solar telescopic application, the introduction of an affordable adaptive optic system can open new scopes.
Here we apply the most frequently used Hartmann-Shack (HS) sensor. In an HS sensor the input image is projected on a lenslet and each lens of the array forms a miniature image of the source object where an area scan sensor measures it. Shift of these sub-images from the central position is proportional to the corresponding sub-aperture local wavefront slopes (wavefront is regarded locally tilted but flat).
For point source objects a simple quad cell sensor can be applied, but for extended objects these shifts can be measured by their correlation to a reference sub-aperture image. Assembling all these local slopes the wavefront of the whole pupil can be approximated. Although correlation based HS sensors are more efficient than quad cells from signal to noise point of view [3] they require considerably more computing resources. Even for first order aberration compensation (tip-tilt) high speed correlation trackers are frequently applied [4] .
In conventional applications the wavefront local slopes have to be measured by using numerous, relatively large resolution sub-apertures at a very high rate (turbulent media). This way high resolution, and a real time correlation based wavefront sensor are required [5] . Some parallel processing devices can fulfill these requirements. Although, other technologies [6] can also grant the required computation power, the application of FPGA technology is considered here. As the control of the sensor and actuator usually require a programmable logic device it seems to be advantageous to apply them to fulfill the required computation tasks also. This way communication bottlenecks can be avoided and higher speed can be granted for a closed loop system. Several FPGA based wavefront sensors and AO system architectures were introduced so far [7, 8, 9] .
In our AO system, according to the measured and digitally reconstructed wavefront distortions a built in LCOS device displays the corrections. Distortion of the incoming wavefront is measured by using a HS wavefront sensor.
Due to the FPGA device limitations and special constraint of the HS sensor a Sum of Absolute Difference (SAD) method is applied to implement the required correlation like processing. Several efficient FPGA implementations of the SAD algorithms have emerged [10, 12] , due to the claims of the real-time motion image compression algorithms. Application of SAD instead of cross correlation does not bias the slope estimates considerably, furthermore, by increasing the attainable speed much higher closed loop performance can be achieved. , ,
where S is the size of the sub-aperture, A is the size of the SAD value array, P is the sub-aperture pixel, and R is the reference pixel accordingly.
In this paper a new highly parallel FPGA based SAD implementation will be introduced which takes into account the special requirements of the correlation based wavefront sensor.
II. THE FPGA BASED ADAPTIVE OPTIC SYSTEM
Our FPGA based adaptive optic system contains three main components. The first part is the 1280×1024 pixels resolution Micron MT9M413 CMOS image sensor which is able to acquire 500 full image frames per second. This sensor is extended with an array of micro lenses, called lenslet (32×32 each of them cover 16×16 pixel sized sub-aperture on the sensor surface). The second part is the Philips DD720 LCOS display, its resolution is 1280×768 pixels with 20 μm pixel sizes and up to 540 full image frames can be displayed within a second. Using this device amplitude or phase modulation can be carried out by using appropriate wave plates and polarizers. The third one is an on board Xilinx Spartan-3 XC3S4000 FPGA is responsible for the control of the overall system and the calculation of the correction data. The FPGA has 4 million equivalent system gates. It is equipped with 96 18Kb BlockRAM and 18×18bit multipliers.
III. THE IMPLEMENTED ARCHITECTURE ON FPGA
Our primary goal is to implement a high speed and highly parallel architecture on the FPGA to determine the displacement of the sub-apertures in order to calculate the wavefront distortions. The block diagram of the architecture implemented on FPGA is shown in Fig. 1 .
The Micron CMOS sensor is controlled and image data is received by CMOS controller. The LCOS unit is responsible for the computation of the correction terms and sending them to the LCOS display. The sub-pixel resolution motion vectors are determined by the Xilinx MicroBlaze soft core processor. The reference image and the coordinates of the sub-apertures are defined by the host computer via the USB controller. 
A. The Shuffle unit
The Shuffle unit is responsible for ordering the pixels for the calculation of the SAD values. During the image capturing only pixels of the sub-apertures are used, therefore SAD values should be computed (to determine the wavefront) in these areas. The coordinates of the sub-apertures are defined by the host computer. Selection and proper ordering of the incoming pixels from the CMOS sensor is performed by the Shuffle unit.
B. The SAD unit
The main building blocks of the SAD unit are the Reference Register unit, the Absolute Differences unit, the SAD Controller unit and the Minimum Finder unit as shown in Fig. 2 . In addition to these elements, the SAD unit also contains an accumulator register array to sum the computed Absolute Difference (AD) values and BlockRAMs to store the partial SAD results.
The SAD unit is designed to compute (1) on the subapertures of the input image in real-time. Image data are sent by the CMOS sensor in a row-wise order therefore S×S pixel should be stored to carry out the computation of the SAD values. Additionally there are several sub-apertures in the row which further increases the memory requirements. This is impractical in the case of large sub-apertures. Instead of storing the sub-aperture windows, computation of the SAD values is rearranged to mach the incoming dataflow and all partial SAD values are computed in parallel. For example when the first pixel P 0,0 arrives, the first term of (1) The Reference Register stores the reference values and generates the appropriate reference window for the actual pixel of the sub-aperture. The A×A pixel sized reference window is moving left to right during the computation. Architecture of the Reference Register unit is shown in Fig. 3 in case of the 5×5 pixel sized reference picture. To make implementation simpler and to utilize the shift register resources in the FPGA the reference window is fixed and the reference values are shifted in our system. The required 9 reference values in the example are always placed on the upper left side of the reference array in a 3×3 sized window. Thus the Reference Register unit has three operation modes: 1) load, 2) left shift, 3) row shift. In the load mode the reference values are loaded into the bottom right register (Reg25) and shifted left in the register chain in a pixel wise manner. In the left shift mode the values are shifted circularly in the same row. Finally the row shift mode is very similar to the load mode except in the case of the lower right register (Reg25) which is loaded with the contents of the upper left register (Reg1). The Absolute Differences unit is responsible for calculating the absolute differences between the reference and the input image. This unit is built up from an array of processing elements. The number of processing elements is defined by the size of the sub-aperture. Thus all the AD values can be calculated in parallel. The structure of one processing element is shown in Fig. 4 .
The minimum of the SAD values and its 4-connected neighbors are determined by the Minimum Finder unit, which also specifies the locations of these values in the sub-aperture, and the SAD Controller unit controls the operation of the SAD unit.
According to the wavefront sensing algorithm, the size of the reference image, the size and number of the sub-apertures are configurable in the VHDL description of the unit. Additionally, several SAD units can work parallel by slicing the input image and using more Shuffle units.
IV. RESULTS
An FPGA based adaptive optic system was constructed and the architecture of the wavefront sensor system was implemented on a Spartan-3 XC3S4000 FPGA using VHDL language. Maximal size of the sub-aperture is limited by the area requirement of SAD units on the FPGA.
The number of applicable sub-apertures is bounded by the row length of the CMOS sensor. The 18Kb BlockRAMs used in the SAD unit is large enough to store the entire row of computed SAD values in our case. The size of the subapertures is important since it defines the required number of BlockRAMs and other logical resources on the FPGA. The performance of the SAD unit is investigated in case of different sub-aperture and reference image sizes. Static timing analysis of the placed and routed designs show that the speed of the SAD unit can reach the 130 MHz operating frequency in all cases. The Flip-Flop resource utilization of the implemented SAD unit on the Spartan-3 XC3S4000 FPGA can be seen in Fig. 5 .
According to the 2D arrangement of the processing units the Flip-Flop resource requirement of the SAD unit is increasing quadratically when the size of the sub-apertures is increased, but does not depend on the size of the SAD value array. The same behavior can be obtained in case of the 4-input LUT and BlockRAM resource requirement of the SAD unit. To achieve even faster performance several SAD units can be used in parallel or higher performance FPGA (Virtex-4 LX100) can be chosen. The number of realizable SAD units is determined by BlockRAM resources on the FPGA in case of SAD value arrays smaller than 12×12. For larger SAD value arrays the bottleneck is the number of available Flip-Flops on the device.
The SAD unit requires different number of clock cycles to calculate the SAD values according to the size of the subapertures as shown in Table I . in case of maximal SAD value array size; thus the number of real time manageable subapertures is different. One SAD unit running on 130MHz clock frequency can handle 2352 8×8 pixel resolution sub-apertures on each frame of the CMOS sensor in real-time. However this is only 45.94% of the surface of the CMOS sensor, but requires only 8.72% of the FPGA resources. Therefore using three SAD units on the Spartan-3 FPGA the whole surface of the CMOS sensor can be processed in real time. Using 32×32 pixel sized sub-apertures 137 sub-apertures can be handled in real time. This makes it possible to process the 42.95% of the entire surface of the CMOS sensor. Using higher performance Virtex-4 LX100 FPGA three SAD units operating on 230 MHz clock frequency can be implemented; thus also the whole surface of the CMOS sensor can be processed in real time using 32×32 pixel sized sub-apertures. Computing performance of the proposed architecture is compared to a correlation based wavefront sensor system described in [11] . The results are shown in Table II . The results show that the Area*Time (AT) parameter of our SAD based wavefront sensor system is considerably outperform the correlation based system described in [11] . In the case of 8×8 pixel sized sub-apertures, the AT parameter of our SAD unit is 17% better than the correlation based system. This ratio increased to 23% when 16×16 sub-apertures were used. Performance advantage of our system can be further increased by using more SAD units or higher performance Virtex-4 LX100 FPGA. Moreover our special purpose SAD architecture shows superior performance (16x16 sub-aperture: 10,186 slices and 496 clock cycles comparing to the published 9,478 slices and 1,600 clock cycles) with respect to the comparable motion estimation processor SAD implementations [12] .
V. CONCLUSION
A new wavefront sensor system based on the high speed and highly parallel SAD calculation unit was implemented on the Spartan-3 FPGA. This system can be used with wide range of applicable lenslets, because it is fully configurable with respect to the size and the number of sub-apertures and reference images. Performance of the system was tested by using different sub-aperture sizes. The results show that the resource requirement of the SAD calculation unit is increasing quadratically according to the size of the sub-apertures, but not depends on the size of the SAD value array. Considering 8×8 pixel sized sub-apertures the entire surface of the CMOS sensor can be processed in real time using three SAD units on the Spartan-3 XC3S4000 FPGA. In case of higher resolution sub-apertures, the real time processable CMOS surface is reduced. By using higher performance FPGAs (Virtex-4) it is possible to handle the whole area of the CMOS sensor in real time. Performance advantage of our system compared to a correlation based wavefront sensor architectures is 17% in 8×8 pixel sized case. This difference is further increased when larger sub-apertures are used. The cost of the proposed system is a fraction of the cost of the other adaptive optic systems.
