This paper describes the design of an ASIC chip for thinning of graylevel images. The chip implements a Min-Max skeletonization algorithm and is based on a pipeline architecture where each stage of the pipeline performs masking operations on the graylevel images. The chip operates in real time at a frequency of 8 MHz and utilizes about 321 mils 410 mils of silicon area.
INTRODUCTION
here has been a recent trend to develop special architectures for image processing which exploit the inherent parallelism found both in image data and in image processing algorithms. Unless this parallelism is suitably exploited, the sheer number of operations to be performed per second for even low level image processing tasks precludes economic real time processing with currently available processing speeds. It is, therefore, desirable to produce compact, high performance image processors that can execute basic image processing functions needed in different applications in real time. By operating in real time, it is possible to utilize all available data and hence achieve the highest possible throughput and, in addition, the need for external buffer memories, which add to system complexity, cost and size, is avoided. Although several such VLSI ASIC chips have been developed [1] [2] [3] [4] for implementing various image processing operations, none has so far been reported for thinning or skeletonization of a digital image.
Thinning procedures play a central role in a broad range of problems in image processing. Before an object is recognized, it is necessary to process the input image and identify the required features. In low level image processing operations commonly employed, thinning of the input image helps in clearly identifying the image boundary.
The skeleton of a region may be defined via the medial axis transformation [5] . The [7] for thinning of digital binary images with the help of a set of 3 3 masks. The result of any operation is placed in the central position of the mask. The masks are shown in Fig. 1 and the order in which they are applied is: A1, B 1, A2, B2, A3, B3, A4,
B4.
Each of the masks consists of two subgroups, white and black, as shown by the unshaded positions and positions shaded black respectively in Fig. 1 . The positions marked by 'x' in the masks are 'don't care' positions and need not be considered for a particular mask. The operation of each mask can be divided into the following three stages: pixels. The memory unit has, therefore, been designed so that it provides a pixel in one column with three rows simultaneously. To obtain all the eight neighbours of a particular pixel simultaneously, a set of six registers (R R6) clocked by a 3-phase clock, generated internally, has been incorporated in the design.
CHIP ARCHITECTURE
The overall datapath for the complete operation is shown in Fig. 2 . The chip comprises eight blocks, with each block performing the operation corresponding to a mask represented by Goetcherian's algorithm. Each block has three RAM banks each of 512 bytes capacity. The algorithm specifies that the operation of the i-th mask should be performed on the output of the (i 1)th mask. The blocks implementing operations of the masks are, therefore, connected in a pipeline style. Each mask requires data corresponding to three consecutive rows for performing any operation and each of them extends the image on all four sides. A mask can start computation X X
x I x x 1 x A1 A2
x / x x x 1 1 A3 A4
(i) Find whether all the pixels in the white subgroup are lesser in graylevel value than the pixel on which the mask is currently centred.
(ii) Find whether the pixels in the black subgroup are greater in value than the central pixel. when it receives (i) data corresponding to the second row from the previous mask, (ii) the data corresponding to the first row which has already been stored in its memory bank, and, (iii) the second row of data which it is currently receiving.
The Datapath
The datapath can be followed with reference to the schematic diagram of a mask shown in Fig. 3(a) . There are two sets of registers (R, R2, R3) and (R 4, R 5, R6).
These two register sets are clocked by the phases PHI2 and PHI3 of a three phase non-overlapping clock. The inset in Fig. 3(a) shows the phases of the clock. The three outputs of the memory unit go to the first set of registers as inputs while the outputs of the latter are connected to the second set of registers. The outputs of the second set of registers (R4, R 5, R6), which are clocked by PHI3, provide the (j 1)th column pixels of the rows (i -1), and (i + 1). The outputs of the first set (R 1, R 2, R3)
clocked by PHI2, give the jth column pixels of the same three rows. 
Implementation of Mask Operations
The mask operations have been implemented with the help of the module Find-Max shown in Fig. 3(a & b) .
Since the white subgroup consists of three pixels in each mask and the black subgroup of two pixels ( Fig. 1) The image must be extended at its four sides for operations centering on pixels constituting the boundary of the image. Since 3 3 masks are used, we need to extend each row by one pixel on either side. Similarly, a whole row of extension pixels needs to be added before the first row and after the last row. In the reported chip, extended pixels have been assigned the graylevel value of zero. Image extension has been carried out without introducing new pixels into the image, since that would have resulted in operations centred on those pixels. Four control signals, C 3, C4, C5 and C6 (Fig. 3a) are used for extension of the image at the four sides. The pixel values have been multiplexed to zero at the appropriate time instants in a clock cycle.
The Memory Unit
The memory unit (Fig. 4) is organized as three memory banks each of 512 words with 8 bits per word. The nth row is stored sequentially as it arrives in the first memory bank, (n + 1)th row in the second and (n + 2)th row is being written into the third bank. The masks get the three pixels in the required sequence as follows. Let the jth column of the (n + 2)th row be the current data at the input to the chip. These data are latched on by clock PHI1. By reading out simultaneously the jth location of banks 1 and 2, we get the jth column pixels of the nth and (n + 1)th rows. The jth column pixels of the (n + 2)th row is currently available at the output of the latch and can be sent through a suitable bus driver to the different masks. Two sets of registers are used as line delay generators for synchronization so that the different masks receive data from three consecutive rows simultaneously. FIGURE 6 Control signals generated by CONTROL 2.
The
and, when an EOLN signal is detected, it moves to AX. Fig. 5 illustrates the state transistions through one complete frame and the control signals generated by CON-TROL are shown in Fig. 6 
