Abstract. Within the field of industrial image processing the use of colour cameras becomes ever more common. Increasingly the established black and white cameras are replaced by economical single-chip colour cameras with Bayer pattern. The use of the additional colour information is particularly important for recognition or inspection. Become interesting however also for the geometric metrology, if measuring tasks can be solved more robust or more exactly. However only few suitable algorithms are available, in order to detect edges with the necessary precision. All attempts require however additional computation expenditure. On the basis of a new filter for edge detection in colour images with subpixel precision, the implementation on a pre-processing hardware platform is presented. Hardware implemented filters offer the advantage that they can be used easily with existing measuring software, since after the filtering a single channel image is present, which unites the information of all colour channels. Advanced field programmable gate arrays represent an ideal platform for the parallel processing of multiple channels. The effective implementation presupposes however a high programming expenditure. On the example of the colour filter implementation, arising problems are analyzed and the chosen solution method is presented.
Introduction
The use of colour images (multi-channel images) in geometric metrology becomes increasingly interesting. By the further spreading of digital cameras for surveillance, digital video and still photography, not least for mobile phones, the number of colour sensors continues to rise. That affects also the industrial image processing and metrology. Price differences between colour and black and white cameras became substantially smaller. Since human operators prefer colour images, colour cameras are increasingly used. But there are not only aesthetic reasons. For the measurement of geometric sizes with image processing the detection of edges is significant. Edges constitute typically only about 5% of the picture content, represent however the largest part of information content [1] . According to Novak and Shafer [2] 90% of the edges in a colour image correspond also to grey value edges. The remaining 10% correspond to colour edges, which can be detected only by the evaluation of the colour information. That means, colour images offer also from view of geometric metrology an increase of the information content.
Edge detection in colour images
Why are geometric measurements more difficult in colour images than in grey value images (singlechannel images)? Typically methods [3] for subpixel-accurate edge detection presume that only one function value, i.e. the intensity respectively the grey value, varies within the image. But in colour images there is more than one value per pixel. In order to clarify this assumption, the technical structure of colour sensors must be understood.
Digital colour imaging techniques for machine vision cameras
Light-sensitive photo-detectors form the basic elements of image sensors. The output voltage from each photo-detector is proportional to the incident light intensity integrated for an exposure time. Image sensors capture thus pure intensity or grey-value images. To acquire colour information the technology was inspired by the nature. The human eye contains three separate types of colour receptors. Each type responds to different wavelengths, which correspond to the colour ranges red, green and blue. Highest sensitivity is thereby in green range. All other colours result from different mixtures of these three basic colours (Young-Helmholtz theory). To make image sensors sensitive to colour, some way of imitating these Red-Green-Blue (RGB) response of human eye is needed. For the realization there are three fundamental approaches: colour-wheel cameras, three-chip cameras and single-chip cameras with colour filter array [4] . Single chip cameras represent thereby the most economical variant. Different arrangements of the colour filters are possible, the Bayer pattern however as far as possible became generally accepted. On a colour image sensor with Bayer pattern, half of the pixels, arranged in a chessboard pattern, are coated with a green filter. The other half of the pixels is alternating coated with red and blue filters [5] , [6] . Thus each pixel is sensitive only for certain colour range. For a complete colour image with the same dimensions therefore the missing colour components for each pixel must be interpolated. For the colour interpolation different demosaicing algorithms are used. Simple algorithms interpolate the colour value from the pixels of same colour in the neighbourhood. In addition, for geometric measurements in [7] a appropriate demosaicing filter was already introduced. A digital colour picture consists thus of three colour layers, of which everyone contains the information of one colour channel (red, green or blue). Now the question arises, whether the advantages of colour cameras can be used and additionally the same accuracies, which can be obtained with grey value cameras of same resolution, are attainable? For this goal the use of all pixels for the measuring process and edge detection with subpixel precision is necessary. Accordingly the information from additional colour-channels must flow into the measuring process.
Methodology of edge detection in colour images
For the detection of edges in colour images different approaches are possible [8] . According to the position of the image recombination in the algorithm, colour edge detection algorithm can be classified as vector methods, multi-dimensional gradient methods and output fusion methods ( Figure  1. ).
Output fusion methods work on basis of the grey value edge detection algorithms. As a state-of-the-art method a RGB to hue-saturation-intensity (HSI) conversion is performed. The actual measurement is accomplished only in the intensity channel, which represents a single-channel image comparably to a black and white image. Thus only intensity edges are measurable. Hue and saturation channels are not analysed, therefore hue and saturation edges are not detected. Multi-dimensional gradient methods determine the orientation and strength of an edge for each point. The gradients of the image components are computed by evaluation of the first or second derivative of the image function. There are several approaches [9] , [10] , [11] to combine them into one result. With vector methods the problem is omitted to combine the information of the individual channels, since the gradient finding take accomplished in vector space. Representation and use of the vectors vary strongly [8] .
There are a lot of image processing algorithms for recognition/detection or colour measurement but only a view approaches for edge detection in colour images with subpixel precision. For recognition algorithms an exact edge localisation is less important than actual detection. For geometric measurements, a shift of the edge position through the detection algorithms is however problematic. Due to this fact, most image processing software use outputs fusion methods. This has the advantage that edge detection algorithms from the grey value image processing can be applied. Subpixeling methods in grey value images enable edges detection with a resolution larger than the pixel centre distance of the sensors. Depending on the quality of the image data, an increase of resolution of 1/10th to 1/100th pixels is attainable. As a disadvantage thereby effectively only one third of the sensor information are used.
Filter for edge detection in colour images with subpixel precision
For this paper the goal of application is clearly defined. Special attention is given to the measurement of geometric features in images acquired with multi-channel systems. Therefore the usual procedures for the highlighting of edges are not applicable. A method is needed, which considers also differences between the channels, without by opposite signs a compensation of such differences can occur (averaging). Colour, saturation and intensity edges must be equally detected. For the research, presented in this paper, a gradient filter was used. This is justified by the fact that gradient filters can be implemented more simply in existing image processing software than edge detection algorithms for several channels. After the filtering the image must have the same resolution and size as the original image. In addition it must consist only of one channel, which contains all edge information. Beyond that the filter should soften as little as possible and do not affect the edge position. A further important aspect is the direction-dependency. An absolute direction-independency cannot be guaranteed by the firm pixel grid, but the pixel structure, which is regarded, should be possibly circular. In order to fulfil these demands a new filter presented in [12] was developed. The filter follows the Roberts gradient filter, which fulfils the demand of little softening. In order to evaluate horizontal/vertical edges exactly the same as diagonal edges, additional computations are introduced, which are missing with the Roberts gradient filter. Contrary to scalar differences, the presented filter is based on a vector approach. The colour channels of the picture are understood not as layer, but instead each pixel is a vector. Each component of the vector corresponds to the intensity value in one of the colour channels. It is based on the concept from an algorithm for edge detection in colour images, described in [13] . The used filter method thus is an intermediate form of the vector method and the multi-dimensional gradient method. Conventional edge filters are conceived for the filtering of single-channel images. In case of a multichannel image, these filters can only be applied separately for each channel. With the difference vector filter however not only intensity, but also colour hue and colour saturation edges are represented in an single-channel image. Each pixel (C1 to C4 in figure 2 ) is regarded as vector, whose components represent the values of the individual colour channels (RGB). By this attempt the filter is suitable also for higher dimensional colour spaces. First the difference vectors of the 2 x 2 filter scope are formed. The horizontal and vertical difference vectors are summarized separately. The diagonal difference vectors are, due to the larger pixel distance, weighted with the factor 1/ . Then the norm of the four difference vectors are added and multiplied by a scaling factor f (equation (1)). The scaling factor is necessary to adjust the edge value k to the range of values of the filtered image. For the computation of the scaling factor the largest sum of the difference vectors of a complete picture is determined.
(1) A disadvantage of this algorithm is however that the entire image must be completely filtered. Filters require a several arithmetic function for every pixel. This leads to an increased computation task. In order to compensate the additional computation time, filters could implemented to pre-processing hardware components such as Digital Signal Processors (DSP) or Field Programmable Gate Arrays (FPGA). These arithmetic and logic units can be inserted in the data stream either directly integrated in a camera or by adding a pre-processing card in the computer. Hardware implementation offers the possibility to operate filters with minimal time delay in real time.
FPGA implementation of the gradient filter algorithm
Due to the increased computation time for such a filter, an implementation was considered on external logic. FPGAs are particularly suitable for the realization of filters. FPGAs are integrated circuits (ICs) that consist of a two dimensional array of general purpose logic blocks. These logic blocks can be configured to perform combinational functions or simple logic gates like AND and OR. In most FPGAs, the logic blocks also include memory elements, which may be simple flip-flops or more complex blocks of memory. This architecture allows the implementation of microprocessors, random access memory (RAM), read-only memory (ROM), digital signal processing (DSP) and logic functions in a single chip. In the presented implementation a low cost Spartan 3E FPGA (XC3S500E-5) from Xilinx was used.
Difficulties for implementation of digital filters on FPGAs
By the flexible programming possibilities of FPGAs very high system clocks can be realized. The fundamental programming conversion differs however from usual programming strategies. Arithmetic operations must be converted in hardware, which is not always possible without problems. Beyond that different programming can supply the same result, but exhibit substantial differences in resource consumption. Shifting operations, additions and subtractions can be implemented almost without restriction through basic logic elements. Multiplications are converted into multiplier networks or routed on one of the few existing hardware multipliers available only in a few FPGA. However divisions and modulo operations are usually not supported in hardware or only for terms of power-oftwo. Depending on necessary precision and obtaining speed, different algorithms can be implemented, in order to realize a division [14] . These implementations are highly resource-intensive. Especially squaring and extracting a root are costly in terms of resource consumption. These restrictions of hardware make a filter implementation difficult. It must be considered exactly, which hardware resources are used and how an existing algorithm can be simplified.
There are essentially three difficulties during the implementation of the filter on a FPGA. In order to determine the amounts of the individual difference vectors, the square root must be extracted from the sum of the squares of the components (equation (2)).
(2) In addition a division is necessary for the computation of the scaling factor f. Finally, in order to determine the largest sum of the difference vectors of the entire image, a complete image must be buffered.
Square root calculation
Four difference vectors must be computed for each pixel, this means that, with each pixel clock, four roots must be extracted. As mentioned in section 4.1, FPGAs do not support root operations in hardware. For the implementation of the filter three methods were analyzed, to accomplish the extraction of the root: look up table (LUT), Heron's method and the method of bisecting intervals. Equation (3) describes the rule of the bit widths valid for the calculation square roots:
In the image processing bit widths between 8 and 12 bits are typical, larger bit widths are however possible. For the experimental setup a camera with 8 bit wide data bus was used. In order to obtain higher accuracies the computations were accomplished internally with 10 bit values. For 10 bit wide square roots, therefore the radicands must be 20 bit wide. The LUT access with binary search is for all strictly monotonous functions usable, so also for a square root. The memory, where the table is stored, is addressed with the 10 bit output value. The necessary memory size, which contains the square for each index, can be calculated with equation (4).
(4) Block RAM 1 could be addressed accordingly. The programming effort to deposit each possible value is however enormously. The main problem is however, the LUT is limited to a firm bit width and thus is no longer scalable. The deposit of larger bit widths needs also increasingly more memory (table 1) . Thus the use of a LUT for the implementation is unsuitable. The second approach was the realization of Heron's method. This special case of Newton's method provide an approximation of the square root x of the radicand a (equation (5)). The procedure converges very fast, if a good approximation is already present. Since the Heron's method can be derived from the Newton's approximation method, the convergence order is 2. The number of the correct digits is doubled with each step.
1 Embedded Block RAM memory is available in most FPGAs, which allows for on-chip memory implementation [15] . (6)).
(6) This produces a convergent series of nested intervals witch converge to a unique point, which should therefore be √a. Each decision halves the number of possible results. The method of bisection intervals is simple and robust. Since the interval with each step is halved, the method of bisecting intervals is very effective in binary system. For the computation of the 10 bit square root thereby 10 iterations are necessary. For the implementation of the filter the method of the bisecting intervals was selected. The reasons for the choice were the numerous advantages: scalability, parallelizability and smallest computation expenditure of the examined methods. For faster convergence and optimal hardware consumption the method was linked with a shorten LUT, which already limits the initial value. Table 2 shows the attainable clock delays and necessary resource consumption for different combinations. 
Scaling of the filtered image
Since it is not possible, to buffer a complete image FPGA internally, and on the experimental hardware does not provide external memory, a compromise had to be closed. It was specified as boundary condition that D max changes only slightly from one to the next image. Thus the scaling factor of the current image can be estimated by the D max of the previous image added by a small security factor s. Since only one division per image must be accomplished, sequential algorithms can be used. The computation takes place according to each image during the rear vertical porch. The block RAM is used to buffer five image rows, which are needed for the demosaicing process. Thereby sensor resolutions with a maximum line length of 1024 pixels can be filtered at the moment. However the filter can be adapted by an enlargement of the line buffer even for larger sensors. With the current realization a real time filtering of images with VGA resolution at 60 frames per second was tested.
Results and discussion

Conclusion an outlook
Based on a new filter algorithm for edge detection in colour images with subpixel precision, a scalable and parallelizable architecture was developed and implemented on a Xilinx XC3S500E-5 FPGA device. The filter architecture has been presented and the implemented hardware design was evaluated with a scalable CMOS colour camera. By the use of a Xilinx Spartan III FPGA a cost effective preprocessing was realized, which reduces the computation tasks on the host computer. As visible in Figure 3 , the filter produces an image, in which edges are represented as black lines. For further optimizations of the filter architecture different attempts are applicable. For the next realization stage a filter computation with a higher processing clock is intended. Thus the clock delay can be minimized further. In order to adapt the filter better to the requirements of industrial metrology, an additional parameterization by the user is planned. The parameter for the scaling factor f is to be computed within a user defined area of interest (AOI). Thus edges, with special interest for a measurement task, can be strengthened. For realization, on the one hand, a delivery of the AOI coordinates from the measuring application to the hardware and, on the other hand, appropriate parameter fields for the filter must be implemented. In addition a parameterization of the safety factor s would be conceivable, in order to minimize disturbing environmental influences (lighting). Beyond that a new hardware platform, based on a Spartan 6 FPGA device (XC6SLX45), is planned. With the emigration on the new hardware the access to DSP48 slices is possible, which offers a better hardware support for filter operations. For further investigations to the attainable measuring accuracy at present studies with a new edge local criterion are accomplished.
