Introduction
Edge detection algorithms in images make it possible to extract information from the image and reduce the amount of required stored information. An edge is defined as a sharp change in luminosity intensity between two adjacent pixels. Most edge detection techniques can be grouped into two categories: gradient based techniques and Laplacian based methods. Techniques based on gradient use the first derivative of the image and look for the maximum and the minimum of this derivative. Examples of this type of strategies are: the Canny method (Canny, 1986) , Sobel method, Roberts method (Roberts, 1965) , Prewitt method (Prewitt, 1970) , etc. On the other hand the techniques based on Laplacian look for the cross by zero of the second derivative of the image. An example of this type of techniques is the zero-crossing method (Marr & Hildreth, 1980) . Normally edge extraction mechanisms are implemented by executing the corresponding software realisation on a processor. Nevertheless in applications that demand constrained response times (real time applications) the specific hardware implementation is required. The main drawback of implementing edge detection techniques in hardware is the high complexity of the existing algorithms. The process of edge detection in an image consists of a sequence of stages. Image segmentation is one step in the edge detection process. By means of the segmentation the image is divided in parts or objects that constitutes it. In the case of considering only one region the image is divided in object and background. The level at which this subdivision is made depends on the application. The segmentation will finish when all the objects of interest for the application have been detected. The image segmentation algorithms are based generally on two basic properties of the image grey levels: discontinuity and similarity. Inside the first category the techniques tries to divide the image by means of the sharp changes on the grey level. In the second category there are applied thresholds techniques, growth of regions, and division and fusion techniques. The simplest segmentation problem appears when the image is formed by only one object that has homogenous light intensity on a background with a different level of luminosity. In this case the image can be segmented in two regions using a technique based on a threshold parameter. Thresholding then becomes a simple but effective tool to separate objects from the background. Most of thresholding algorithms are initially meant for binary thresholding. This binary thresholding procedure may be extended to a multi-level one with the help of multiple thresholds T 1 , T 2 ,…,T n to segment the image into n+1 regions (Liao et al., 2001) , (Cao et al., 2002) , (Oh & Kim, 2006) . Multi-level thresholding based on a multi-dimensional histogram resembles the image segmentation algorithms based on pattern clustering. Binary thresholding techniques classify the pixels of the image into two categories (black and white). This transformation is made to establish a distinction between the objects of the image and the background. This binary image is generated by comparing the values of the pixels with a threshold T. That is to say, any value lower than the threshold value is considered to be an object whereas values greater than the threshold belong to the background.
where x ij is a pixel of the original image and y ij is the pixel corresponding to the binary image. In the case of a monochrome image in which the pixels are encoded with 8 bits the range of values adopted by the pixels corresponds to the range between 0 and 255 (L=256). It is usual to express the above mentioned range with normalized values between 0 and 1.
Thresholding techniques
A basic technique for threshold calculation is based on the frequency of grey level. In this case the threshold T is calculated by means of the following expression:
where i is the grey level, p i represents the grey level frequency (also known as the probability of the grey level). For an image with n pixels and n i pixels with the grey level i: 1 1 L ii i i pn n a n d p
Otsu's technique (Otsu, 1978) calculates the optimal threshold maximizing the variance between classes. For that it realizes an exhaustive search to evaluate the criterion of maximizing the variance between classes. One drawback of Otsu's method is the time required to select the value of the threshold. In the case of two-level thresholding the pixels are classified into two classes: C 1 , with gray levels [1, ..., t] , and C 2 , with gray levels [t+1, ..., L]. The distributions of probability of gray levels for the two classes are: 
The optimal thresholds t 1 *, t 2 *, ..., t M-1 * are chosen to maximize σ 2 B : Since the second term in equation (17) depends on the choice of thresholds {t 1 , t 2 , ..., t M-1 }, the optimal thresholds {t 1 *, t 2 *, ..., t M-1 *} can be chosen maximizing a modified variance between classes (σ B ') 2 , defined as the sum of the terms of the right side of equation (17). That is, the optimal threshold values {t 1 *, t 2 *, ..., t M-1 *} are chosen by 
According to the criterion of expression (12) for σ B 2 and equation (18) for (σ B ') 2 , in order to find optimal thresholds, the search region for the maximum σ B 2 and for the maximum
This exhaustive search involves (L-M+1) M-1 possible combinations. Furthermore, equation (19) is simpler than (13) because it don't requires the subtractions. In 1965 Zadeh proposed fuzzy logic as a reasoning mechanism that uses linguistic terms (Zadeh, 1965) . Fuzzy logic is based on the fuzzy set theory in which an element can belong to several sets with different degrees of membership. This contrasts with the classic set theory in which an element either belongs or does not belong to a certain set. Thus a fuzzy set A is defined as
where x is an object of the set of objects X and μ(x) is the membership degree of element x to set A. In the classic set theory μ(x) takes values 0 or 1 whereas in the fuzzy set theory μ(x)
belongs to the range of values between 0 and 1. Techniques that apply fuzzy logic to threshold calculation are based mainly on three types of measures of fuzziness (Forero-Vargas & Rojas-Camacho, 2000) : entropy, Kaufmann`s measure, and Yager's measure. The technique based on entropy consists of minimizing the dispersion of the system. This way the pixels of the image are grouped into two classes corresponding to the objects and to the background. Huang and Wang (Huang & Wang, 1995) consider that the averages of the data corresponding to each class are μ 0 and μ 1 . The membership function of each class is defined as:
The calculation of the threshold T is based on the entropy of a fuzzy set that is calculated using the function of Shannon:
The threshold will be that which minimizes the entropy of the data:
Kaufmann's measure of fuzziness is defined as (Kaufmann, 1975) :
This method is based on using the distance metric to set A. When w=1 Hamming's distance is used whereas if w=2 it is the Euclidean distance. Yager's method (Yager, 1979) is based on the distance between a fuzzy set and its complementary, and basically entails minimizing the following function:
( proposed a technique that, from a formal point of view, is based on calculating the average of the histogram of the image. One advantage of this technique is that the calculation mechanism improves the processing time since the image only needs to be processed once and the value of the threshold can be calculated directly. From the point of view of hardware implementation that enables low-cost circuit for fuzzy processing module as discussed in a later section The fuzzy system receives the input pixel and generates an output that corresponds to the result of the fuzzy inference. Once the image has been read the output shows the value of threshold T. Basically the operation carried out by the fuzzy system is that of calculating the centre of gravity of the image histogram with the following expression:
where T is the threshold, M is the number of pixels of the image, R is the number of rules of the fuzzy system, c is the consequent of each rule and α is the activation degree of the rule.
In order to produce the fuzzy inference the universe of discourse of the histogram is divided into a set of N equally distributed membership functions. Figure 1 shows a partition example for N=9. Triangular membership functions have been used since they are easier for hardware implementation. These functions have an overlapping degree of 2 in order to limit the number of active rules. The membership functions of the consequent are singletons equally distributed in the universe of discourse of the histogram. The use of singleton-type membership functions for the consequent allows the application of simplified defuzzification methods such as the Fuzzy Mean. This defuzzification method can be interpreted as one in which each rule proposes a conclusion with a "strength" defined by its grade of activation. The overall action of several rules is obtained by calculating the average of the different conclusions weighted by their grades of activation. This type of processing, based on active rules and a simplified defuzzification method, allows low cost and high speed hardware implementation.
The rule base of the system in figure 2 use the membership functions defined in figure 1. The knowledge base (membership functions and rule base) is common for any images, and the values can therefore be stored in a ROM memory.
It is possible to optimize the expression shown in equation (26) if the system is normalized. In this case the sum extending to the rule base of the grades of activation of the consequent takes value 1:
Then (26) transforms in:
For each pixel the system makes the inference in agreement with the rule base of figure 2. The output of the system accumulates the result corresponding to the numerator of (28). The final output is generated with the last pixel of the image after division by M.
Image segmentation
The technique presented in has the disadvantage that the rule base is predetermined and therefore the threshold does not fit to the characteristics of the image. It is a linear approximation. A mechanism to adjust the threshold to the characteristics of the image is to perform a nonlinear approximation. Figure 3 shows some examples of knowledge bases that give place to non-linear approximations. The figure shows five fuzzy systems (figure 3a to figure 3e). For each system there have been represented the membership functions for antecedents, the output function and the result of segmentation using the threshold generated in each case. In all cases the membership functions of antecedents constitute a family of functions. This family consists of triangular functions with an overlapping degree of two. This structure is determined by the hardware implementation requirements of the system as we will discuss in a later section. It may be noted that the base and the position of the membership functions change from one system to another giving rise to a nonlinear behavior. This approach allows to obtain thresholds adapted to the characteristics of the image or the requirements of the application. Table 1 shows the thresholds obtained in different images using the Otsu method, the grey level frequency method and usign the fuzzy systems of figures 3a to 3e.
Hardware implementation 4.1 Architecture description
The design goals of the fuzzy inference module (FIM) for calculating the threshold are: a low cost system and high processing speed. The architecture of the FIM circuit is based on the proposal described in (Baturone et al., 2000) shown in Figure 4 . The module consists of three stages: fuzzifier, inference and defuzzifier. The inference mechanism is based on active rules. This allows to process only those rules that are active and avoids to analyze the whole rulebase. This way the processing time is reduced. For it the overlapping degree of the membership functions is limited. Another architecture feature is the use of singleton consequents. This allows to apply simplified defuzzification methods which supposes a reduction of hardware resources. The first stage of the architecture corresponds to the fuzzificación stage. This stage receives the input data and generates for each input the pair (Label, membership degree) = (L, μ). MFC blocks (Membership Function Circuit) perform this task. There are several alternatives to the design of MFC blocks (Baturone et al., 2000) . One solution is to design the block as an arithmetic circuit that interpolates the right output for each input. This solution gives place to a simple and fast circuit. However it has as counterpart that limits the type of membership functions to triangular and trapezoidal functions. A more flexible solution is based on the use of a memory. In this case the input acts as a pointer to a memory location. This memory location stores the output values. This allows to have membership functions of any form. The shape of the membership function has no restrictions other than the selected precision and has no influence on the computational load. As opposed to this advantage, in situations of high resolution, memory requirements can become very large since the number of rows in the antecedents memory depends exponentially on the number of bits of the input. In the case of N membership functions, with P bits of precision for the input, and J bits of precision for the membership degree, the size of the required memory is given by the equation (29).
Since the overlapping degree of the membership functions is fixed, the number of output values of the fuzzification stage is limited. For example, in the case of limiting the www.intechopen.com overlapping degree of the membership functions to 2, and in the case of a system of 2 inputs, only 4 couples of values (Label, degree) exist, i.e. only 4 rules are activated. Therefore the inference stage is constituted by the block that selects each one of the antecedents of the active rules. A set of multiplexers controlled by a counter allows to select sequentially the different combinations of antecedents of the active rules. In each counter cycle the membership degrees are processed through the conjunction operator to calculate the rule activation degree, while the labels of the antecedents address the memory position that contains its corresponding consequent. The output of the inference stage corresponds to the pair of values (Consequent, activation degree) = (c, α). for each rule.
The last stage performs the defuzzification. On having used singleton consequents, the defuzzification algorithm only requires operations on the rules. The hardware resources required for implementing the Fuzzy Mean defuzzification method are: a multiplier, two accumulators and a divisor. This defuzzification method corresponds to the following operation:
where the summations are extended to active rules, c i is the consequent of each rule and α i is the rule activation degree.
In the case of having normalized membership functions and applying the product as Tnorm the denominator of the previous equation is 1. This means that a divisor is not needed and defuzzification operation is simplified according to the following expression: 
Design and implementation
From the general characteristics of the FIM architecture it is possible to specify a set of simplification options that allow a reduction of hardware resources and increased parallelism (and thus the processing speed). Regarding the design of the different blocks of figure 4 and according to the knowledge base of the threshold system the memory requirements are: a) the MFC memory requires 256x10 bits; b) the rule memory requires 7x8 bits. Figure 5 shows the system architecture. The FIM module receives input x corresponding to one pixel. MFC memory stores the data of the antecedent membership functions according to the scheme shown in figure 6a . Since the overlapping degree is fixed to 2, each row of memory only stores the value of a linguistic label and a membership degree (Label, degree)=(L, μ). The other label can be calculated increasing in a unit the stored value, since always the linguistic labels of both membership functions that are active are consecutive (L 2 =L 1 +1). While the other membership degree is calculated taking into account that the membership functions are normalized, by the operation μ 2 =1-μ 1 .
Fig. 5. FIM circuit for calculating the threshold
The rule memory is a dual-port memory. This way it is possible to access simultaneously to two active rules. This memory is addressed by the linguistic label that provides the MFC. This allows to eliminate the multiplexers and the counter of figure 4. The defuzzification stage receives both the consequents (c 1 and c 2 ) and the activation degrees of the active rules (μ 1 and μ 2 ). The last stage makes the accumulation of the result generated by each pixel and the division by the number of pixels of the image. In agreement with the described FIM scheme it is possible to make an inference in each clock cycle. In order to increase the operation speed of the system it is possible to process two pixels in parallel as shown in Figure 7 . For it the blocks of higher cost (the MFC memory and the divisor) are shared by both inputs. The MFC memory is a dual-port memory. This allows to reduce by the half the time required to calculate the threshold. The circuit of figure 7 has been implemented on a low cost FPGA Spartan3 device XC3S200 of Xilinx. The results of the required hardware resources on the Spartan3 FPGA circuit are shown in Table 2 . The table shows the resources needed in the case of the circuit with and without the divisor. This division block is that of major cost of the system. The circuit implemented on the Spartan3 FPGA operates at a frequency of 50MHz. In each clock cycle it allows to process two pixels. Thus the processing time of an SVGA image of 
Memory content 800x600 pixels is 4.8 msec. This allows to make a processing of 208 frames per second. In the case of an HD image (1920x1080 pixels) it is possible to process 48 frames per second.
Edge detection
This section presents an application of image segmentation to edge detection. The method is applied to the luminosity of the image. An image is a bidimensional matrix of pixels whose values belong to certain range of values. In this section each pixel is codified with 8 bits, which gives rise to 256 possible values of grey tones. An image is therefore a function of two variables (dimensions) in the range from 0 to 255. The process of edge detection in an image consists of the sequence of stages shown in figure 8 . The first stage receives the input image and applies a filter to eliminate noise. The second step applies a threshold in order to classify the pixels of the image under two categories, black and white. The resulting image is a binary image. Finally, in the last stage the edges are detected. 
The filter stage
The filter stage makes it possible to improve details of edges in images and reduce or eliminate noise patterns. The aim of the filter step is to eliminate all those points that do not provide any type of information of interest. The noise corresponds to undesired information appearing in the image. It comes principally from the capture sensor (quantisation noise) and from the transmission of the image (fault in transmitting the information bits). Basically we consider two types of noise: Gaussian and impulsive (salt&peppers). Gaussian noise has its origin in differences of gains in the sensor, noise in digitalization, etc. Impulsive noise is characterized by arbitrary pixel values that are detectable because they are very different from their neighbours. A way to eliminate these types of noise is by means of a low pass filter, a filter which smoothens out the image replacing high and low values by average values. The filter used in the proposed edge detection system is based on the bounded sum Lukasiewicz operator which is defined as: 
The behaviour of the bounded-sum is shown in figure 9 . It consists of a normalized function in the [0,1] range. An advantage of applying this operator lies in the simplicity of the hardware realisation.
Hardware Implementation of a Real-Time Image Segmentation Circuit based on Fuzzy Logic for Edge Detection Application 531 Fig. 9 . Bounded sum graphical representation.
The Lukasiewicz bounded sum filter smoothens out the image and is suitable for both salt&peppers and Gaussian noise. Figure 10 shows the effect of applying this type of filter. Fig. 10 . a) Input image with salt&peppers noise, b) Lukasiewicz's bounded sum filter output.
The filter has been applied using a mask based on a 3x3 array. For pixel x ij the weighted mask is applied to obtain the new value y ij , as is shown in the following expression: 
The segmentation stage
Techniques based on thresholding an image allow pixels to de divided into two categories (black and white). This transformation is made to establish a distinction between the objects of the image and the background. This binary image is generated by comparing the values of the pixels with a threshold T. That is to say, any value lower than the threshold value is considered to be an object whereas values greater than the threshold belong to the background. In this stage there is applied the previously calculated threshold T in order to obtain the binary image. a) b)
Edge detection stage
The next step is the edge detection. The input image for the edge detection is a binary image in which pixels take value 0 (black) or 1 (white). In this case the edges appear when a change between black and white takes place between two consecutive pixels.
where x and y are consecutive pixels, and x edge is the resulting pixel.
Edge generation consists of determining if each pixel has neighbours with different values.
Since the image is binary every pixel is encoded with a bit (black=0 and white=1). This edge detection operation is obtained by calculating the xor logic operation between neighbouring pixels using a 3x3 mask. Figure 11 shows an example of applying the xor operator on the binary image. Using the 3x3 mask it is possible to refine the edge generation by detecting the orientation of the edges. To this end the four orientations shown in figure 12 can be considered. This enables calculation of the xor operation on 3 pixels. For a horizontal orientation we will therefore have ,, Figure 13 shows the results obtained when edge detection was carried out on a set of test images. Fig. 13 . Test images and edge detection results.
Hardware implementation
The edge detection circuit has been implemented on a low cost FPGA device of the Xilinx Spartan3 family. Figure 14 shows the block diagram for the system. The image is stored in a double port RAM memory. The data memory width is 32 bits. This makes it possible to read two words simultaneously.
In the first phase there is realized the calculation of the value of the threshold T. Later the edge detection circuit initiates its operation reading eight pixels from the memory in each clock cycle (2 words of 32 bits). The edge detection circuit is thus able to provide four parallel output data which are stored in the external memory. Each data corresponds to a pixel of the edge image. This image is binary, and only one bit is therefore needed to represent the value of the pixel (0 if edge or 1 if background). The new image of the edges is stored in the above mentioned memory. The edge detection algorithm basically comprises three stages as shown in figure 8 , 2009 ). In the first stage the Lukasiewicz bounded-sum is performed. After the filter stage a thresholding step is applied producing a black and white monochrome image. The value of the threshold is obtained by means of a fuzzy system that calculates the threshold related to the image. In the third stage the edges of the image are obtained. For it the final value of each pixel is evaluated. Only those pixels that are around the target pixel are of interest (a 3x3 mask). Therefore if in the surroundings of a pixel the value is the same (all white or all black) this indicates no edge and the output value associates the above mentioned pixel with the background of the image. If a change is detected in any value of the surroundings of the pixel this indicates that the pixel at issue is in an edge, and it is therefore assigned the black value. Figure 15 shows the system processing scheme. Pixels 1 to 9 correspond to the 3x3 mask that moves through the image. The Functional Unit (FU) processes the data stored in the mask registers. Functional Unit (FU) Fig. 15 . System schema.
To improve image processing time the mask was spread to an 8x3 matrix as shown in figure  16a . Each Functional Unit (FU) operates on a 3x3 mask in agreement with the scheme shown in figure 15 . The data are stored in the input registers (R3, R6, R9, …) and in each clock cycle they move to their interconnected neighbours registers. In the third clock cycle the mask registers contain the data of the corresponding image pixels. The functional units then operate with the mask data and generate the outputs. In each clock cycle the mask advances one column in the image. Pixels enter on the right and shift from one stage to another outgoing on the left hand side. It is a systolic architecture with linear topology and it allows several pixels to be processed in parallel. Figure 16b shows the input/output ports in the symbol of the system. The system receives two input data of 32 bits (D1 and D2). These data come from a double port memory that stores the image. The memory access time makes it possible to read 8 pixels (each of 8 bits) in a clock cycle. The circuit also receives the previously calculated threshold (T) as input data. The input control signals are the following: the clock (CLK), the synchronous clear (Clear), and chip select signal (CS). The circuit generates as output the 4 bits (Dout) corresponding to the output values of the processed pixels stored in R5, R8, R11 and R14. The address of the pixel stored in R5 is also generated by means of the buses Row and Column. The output control signals Dvalid and EndImage respectively indicate the validity of the outputs and the completion of the image processing. The functional unit operates on the 3x3 mask and generates the output value corresponding to the centered element of the mask (pixel 5 in figure 15 ). A block diagram of a functional unit is shown in figure 17 . The circuit consists of two pipeline stages so that the data has a latency of two clock cycles. The first stage is the image filter. Then threshold T is applied. The edge detector, in the output stage, operates on the binary mask (black and white image). Fig. 17 . Functional Unit (FU) circuit schematic. Figure 18 shows the circuits corresponding to the different blocks of the functional unit (FU). As we can observe in figure 18a the filter based on Lukasiewicz's bounded sum receives the data stored in registers R1 to R9. These data are scaled by the factor 0,125 entailing division by 8, which signify a displacement of three places to the left. The sum of the data is compared (using the carry as control signal) with value 1. The segmentation circuit (figure 18b) compares the pixel with the threshold value. The output is a binary image (black and white) and only therefore requires one bit. Finally, the output stage receives a 3x3 binary image. It carries out the xor operation of the bits. If all the bits of the mask are equal the output pixel is in the background, whereas if some bit is different the output is an edge pixel. The state machine that controls the system is shown in figure 19 . This machine has four states. The mask moves through the image by columns. Whenever a row begins two clock DinR1(7:3) . . . Fig. 19 . FSM of the control unit of the edge detection system www.intechopen.com
Hardware Implementation of a Real-Time Image Segmentation Circuit based on Fuzzy Logic for Edge Detection Application 537 cycles are needed to initialize the mask registers (CYCLE1 and CYCLE2 states). In the next cycle (PROCESSING state) the data is processed and the data of the following columns being processed in successive cycles. Figure 20 shows the chronogram of the circuit. It can be observed that the operation of the system begins with the falling edge of signal CS. In the third clock cycle Dvalid signal take value 1, indicating a valid output. Input data are provided in each clock cycle. Once Dvalid has been activated the output data in the following cycles is also valid (since Dvalid=1). The system has been implemented on an FPGA of the Spartan3 Xilinx family. The circuit for edge detection occupies an area of 318 slices. The resources needed for the full system (which includes the thresholding circuit and the edge detection circuit) occupies 735 slides which mean a 38% of the selected FPGA device. Regarding processing speed, the system required 7.2 msec to generate the edge image of a SVGA (800x600 pixels) using a 50 MHz clock cycle. This mean it is possible to process 132 frames per second. For a HD image (1920x1080 pixels) it is possible to process 32 frames per second. 
Acknowledgments
This work was supported in part by the European Community under the MOBY-DIC Project FP7-IST-248858 (www.mobydic-project.eu), by Spanish Ministerio de Ciencia y Tecnología under the Project TEC2008-04920, and by Junta de Andalucía under the Project P08-TIC-03674.
Conclusion
In this chapter there has been described a mechanism for binary image segmentation based on the application of fuzzy logic to calculate the threshold. The described thresholding method allows to adjust the threshold value to the characteristics of the image. The main advantage of this technique is that it allows very efficient hardware implementation in terms of cost and speed. This makes it especially suitable for applications which require real time processing. This technique has been applied for edge detection in images. The designed circuit has been implemented on an FPGA device.
