Abstract: This paper addresses hardware implementation of a power efficient retinal encoder system used in visual prosthesis equipment for blind people. The hardware architecture is inspired from the retinal receptive fields in mammalian vision system. The captured image is passed through several filter banks in different filtering stages in primate vision system. As the photoreceptors in the primate vision system are arranged in hexagonal fashion, the hexagonal tessellation scheme was found to be the most suitable sampling scheme for retinal image processing. In this work, the hardware implementation of different retinal filtering stages and final edge detection scheme is performed in Altera Cyclone II FPGA and in Synopsys Design Vision. An area efficient Cellular Logic Array Processing (CLAP) algorithm is used as the final stage edge detector. Both rectangular and hexagonal edge detection schemes were tested and their performance was analysed using Berkeley Segmentation Dataset and ROC performance scheme.
Introduction
It has been found that human visual system process the visual stimulus from the external world in a robust, efficient and in a faster way compared to any other digital computer vision systems. Private visual system has some organisation principles which are found to be helpful in developing efficient image processing algorithms. A retinal encoder module converts visual input to neuromorphic pulses inorder to stimulate the visual cortex of the human brain to enable the sensation of vision. This is the primary module in any visual prosthesis applications (Sousa et al., 2005; Morillas et al., 2007) . In the retinal signal processing, the photo receptors are distributed in the hexagonal fashion. Hence, for the modelling of biological vision based operations like retinal encoder and vision chips etc., if we incorporate the hexagonal sampling based operations and the corresponding hardware algorithms and architectures, the hardware utilisation can be enormously reduced. A bioinspired implementation architecture will lead to the development of a low cost, power and hardware efficient visual prosthesis system which was the major motivation for this work. Many papers have been published in relation to the hardware implementation of edge detection algorithms (García Holguín, 2012; Obili, 2013; Singh et al., 2014; Sujatha and Selvathi, 2014) . But hexagonal sampling based edge detection has not been performed yet. Hence an attempt is made in this work to model the edge detection operation related with retinal signal processing using hexagonal sampling scheme.
In this work, the digital hardware simulation of mammalian visual system is performed by analysing the signal processing aspects and impulse responses of each cellular layers associated with it. The input image is passed through various filter banks which are placed in cascade. The processing of visual stimulus in the retina is very complex and not known in detail yet. Therefore, the main focus here will be the design and approximation of impulse responses of spatial digital filters which are called as receptive fields (RF) in neural aspects (Romeny, 2002) . At initial stages, Gaussian filter, Laplacian of Gaussian filter and Gabor filter are utilised to model photoreceptor cells, ganglion-horizontal cells and simple cells respectively. The utilisation of Hex-Gabor for efficient image enhancement is proposed by Veni and Narayanankutty (2012) . An addressing scheme has also been proposed for general image processing applications such that the neighborhood information of each pixel are retained and saved in the memory for its processing. Hardware Description Language (HDL) based hexagonal mask has been built for each filtering stages such that the calculation is performed by running the mask throughout the image. The work reported by Razak and Taharim (2009) demonstrates the application of Gabor Filter technique to enhance the fingerprint image. The behavior of the layers and different channels of the neuromorphic retina has been successfully modeled by cellular neural/nonlinear networks (CNN) (Vörösházi et al., 2009) . Cellular automata based edge detection algorithm known as Cellular Logic Array Processing (CLAP) (Rajan et al., 2004) has been used for the final stage. This algorithm reduces the area overhead of the system along with a promising boundary detection performance. The rods and cones in the photo receptors are tightly packed in a hexagonal fashion which is the motivation of this work (Thiem and Hartmann, 2000) .
Hence, this works starts with the physiologic understanding of the retina, and the analysis of several representative retinal models. Then it presents the simulation of the models, and finally the hardware implementation and verification. For Field Programmable Gate Array (FPGA) implementation, Cyclone II EP2C20F484C7 FPGA is used and Application Specific Integrated Circuit (ASIC) implementation was performed using synopsis software.
The edge detection performance of the encoder module is evaluated using Berkeley Segmentation Dataset images and Receiver Operating Characteristics (ROC) plot (Bowyer et al., 1999; Martin et al., 2001) . The output of the module is compared with manually specified ground truth edge detection outputs of the dataset images to find the percentage of true positives, true negatives, false positives and false negatives. The encoder performance in hexagonal grid is found to be better compared with rectangular sampling grid which is evident from the ROC plot.
The paper is organised as follows. Techniques used in the hexagonal pixel grid based image processing are discussed in Section 2. Section 3 elaborates various retinal filterbanks and their corresponding signal processing responses. Very Large Scale Integration (VLSI) implementation architecture is proposed in Section 4. The implementation results are discussed in Section 5.
Hexagonal tessellation scheme
Low power and hardware efficient implementation architectures are the pre-requisites of any bio-medical VLSI systems. While it comes to visual prosthesis application the image processing resolution cannot be traded off for these prerequisites. So it is necessary to consider utilisation of other sampling lattices. Hexagonal sampling scheme is a promising solution to attain a hardware efficient image processing architecture without compromising the sampling efficiency of the system. Mersereau (1979) found that while largest possible rectangular lattice provides 78.5% sampling efficiency, hexagonal lattice has a sampling efficiency of 90.8%. Later Vitulli et al. (2002) analysed the Shannon criteria and Nyquist constraints for image sampling and concluded that hexagonal grid guarantees the efficiency same as that of rectangular grid with 13% less samples. It has been also found that for a particular resolution capability of the sensor, hexagonal sampling grid yields smaller quantisation errors than rectangular grid. Golay (1969) developed a parallel architecture based hexagonal image processing framework. He also proved that hexagonal sampling pattern can be efficiently utilised for the studies of patterns which can be presented to the computer in any arbitrary orientation. Consistency in its neighbourhood connectivity helps hexagonal tessellation better for region and boundary definitions. Equidistance property and better angular resolution also makes hexagonal processing better for obtaining boundaries (Middleton and Sivaswamy, 2005) . Higher symmetry for hexagonal structure is also found to be helpful for developing better image processing algorithms. Hexagonal grid has a three-way symmetry while rectangular grid has only two-way symmetry. Accuracy of processing technique gets improved with a higher symmetry (Middleton and Sivaswamy, 2005) .
A simulated hyper pixel method for resampling was suggested by Wuthrich and Stucki (1991) to evaluate the hexagonal tessellation scheme. A hexagonal pixel was simulated by using a group of square pixels and hence called as hyper pixels. This approximation using many rectangular pixels cause a heavy data loss and hence results in a much lesser screen resolution. One of the re-sampling methodologies is to suppress alternate rows and columns (Rajan et al., 2004) of rectangular sampled image such that the central pixel will be surrounded by six pixels. This also leads to dataloss through suppression of alternate pixels. The resampling method used in this work is the half pixel shift scheme proposed by Staunton and Neil (1989) . This approach has been chosen as it is suitable for hardware implementation. Validation of results is possible by this method with the help of address generator and neighborhood modules for hexagonal sampling schemes. 
Receptive fields of retina cellular layers
The visual system is considered as our most important and rather complex sensory perception which involves millions of cells that are of different types. It is found that about one quarter of the nerve cells in our central nervous system is related to the sense of vision. The retinal layers in our visual system undergo different pre-processing stages which are shown in Figure 2 . The first stage of filtering is by light sensitive receptors, i.e. the rods and the cones. They are collectively called as Photoreceptor cells. The visual stimulus is applied on the photo receptor cells and the sampling of the input image is performed at this stage. These photoreceptors are located in the back of the eye behind the other layers. Receptive field (Romeny, 2002 ) is a concept to define the functionality of each cell layers in the visual system. Impulse response of the spatial digital filters which can approximate the functionality of each cell layers is termed as the receptive field of that particular cellular layer. As our eye can be visualised as the cascade of different cell structures such as photo receptor cells, ganglion cells, horizontal cells and simple cells the impulse response profile of each stages have to be studied in detail.
Photoreceptor cells
The main function of photoreceptor cells is to convert the light incident on the surface of the eye to visual stimulus signals. It will stimulate the biological image processing which will lead to the sense of vision. The photoreceptor cells also have a functionality of noise reduction such that high frequency signals are smoothened from the input image. Thus the receptive field of the photoreceptor cells can be approximated by using the Gaussian smoothening filter (Thiem and Hartmann, 2000) . Proper selection of standard deviation is essential in order to avoid blurring of the image.
The Gaussian kernel can be expressed for one dimensional, two dimensional and N dimensional vector space by the equations (1), (2) and (3) respectively as given below:
Notations are given as per Romeny (2002) . The parameter σ determines the width of the Gaussian kernel. It is considered as the standard deviation in statistics point of view and the square of it is considered to be the variance. The constant
 
12    is the normalisation constant. The smoothening operation is performed with a low sigma value to avoid blurring and sufficiently high to de-noise the image. (Romeny, 2002) . They have found that the ganglion cells receptive field can be approximated by the Laplacian of Gaussian function (LOG). This sensitivity can be either on-centre off-surround or off-centre on-surround in characteristics, i.e. the central frequency value can either increasing at the centre or decreasing to a minimum negative value. LOG function can be further approximated as Difference of Gaussian function (DOG) (Romeny, 2002) . Suitable selection of parameter σ is important in deciding the characteristic features of the LoG filter.
Ganglion cells

Simple cells
The simple cells in visual cortex are found to have a feature selection property in different orientations. Its response to oriented edges and gratings in different orientations are found to exhibit some special properties. Each cell in the layer is tuned to different frequencies and orientations with different phase relationship. This response in different orientations is found to have utilised in different edge detection schemes so as to attribute depth to edge and line detections (Thiem and Hartmann, 2000) .
The simple cells can be modelled mathematically using digital Gabor filters. Gabor filters are basically Gaussian filter modulated by sinusoidal functions. It is found to have unique properties like selectivity in space, spatial frequency and orientation. Gabor filter has been found effective for feature extraction in image processing application. It also finds application in detection of different textures oriented in different angles in an image. The Gabor filter can be expressed mathematically as follows: (notations as per Romeny (2002) 
where σ is the standard deviation of the elliptical Gaussian and  is the radial frequency. The presence of such a filter which analyses the input in different orientations makes the primate vision system a robust pattern recognition system.
Cellular automata based edge detection
The output retinal pre-processing stage and visual cortex simple cell stage is fed to an efficient edge detection scheme such that the bionic pre-processing stage improves the edge detection further. As we are considering the VLSI implementation of the retina prototype and edge detector module an area efficient scheme of edge detection has to be considered first. The edge detection based on the cellular automata is found to have better area efficiency while compared to other conventional edge detection schemes. Cellular Logic Array Processing (CLAP) algorithm based edge detection proposed by Rajan et al. (2004) has been used for the final edge detection considering its performance, area efficiency and flexibility of implementation in FPGAs. In CLAP based edge detection scheme, the values of interest can be filtered out from the set of scan window using various basis structures. Basis structures can be expressed as a set of convex polygons which encloses the particular pixel under consideration. Figure 3 shows eighteen different convex polygons that should be taken under consideration for hexagonal edge detection. In the 7 pixel-hexagonal grid, considering every pixels of the window itself is one such polygon and is known as the supremum basis structure. Subsequent basis structures are obtained by removing the corner pixels of the grid in various combinations. All the basis structure polygons form a lattice with A as the supremum and D1,3,5, D2,4,6, C1,4, C2,5 and C3,6 as infimum basis structures respectively. It can be seen that second layer has been derived from the first layer which contains only the supremum by removing a single corner neighbourhood. So is the case of subsequent layers. The infimum polygons are derived from every other basis structures. The existence of the infimum basis structure is a necessary condition for all other polygons. So the comparison using window can be restricted approximately to the infimum polygons D1, 3, 5, D2,4,6, C1,4, C2,5 and C3,6 . Scanning of the entire image by the hex window is being done so as to detect the edges. The window is moved by placing the mask such that the pixel under consideration is at the centre surrounded by six neighbours. On each movement of the mask, the 7-pixel hex sub-image is covered by the window. The window checks for the existence of any of the five basis structures. The intensity values in the scan window are checked to see whether the gray-distance, i.e. the difference between the maximum and the minimum gray scale value, is less than or equal to a particular threshold value. If the gray scale distance is less than or equal to T, then the central cell is assigned the gray-value of 0; Otherwise the central value assigned is 1. This procedure is continued till the entire image is scanned. This leads to the formation of a binary image with boundaries of the input image.
Thus the entire edge detection in CLAP algorithm is performed by comparing the values in the window. Here we can notice that unlike the conventional edge detectors which are checking for the boundaries using convolution process which involves number of multiplier blocks, CLAP algorithm manages to do an efficient detection just by a set of comparators. This reduction in hardware requirement is the key feature which makes the algorithm efficient to be utilised in the final stage of the retinal encoder.
Hardware architecture
The VLSI architecture for prototyping retina is being proposed as a cascaded digital filter bank structure. Each stages of the cascaded structure represent a particular cellular layer in the primate vision system. Each filter is having their impulse response such that the receptive field of the cellular layer is approximated to that digital signal processing module. The input image is smoothened in photoreceptor stage and is undergoing image enhancement in the ganglion cell layer. The output of ganglion cells is fed to three Gabor filter bank oriented at 0°, 60° and 120° since we are following hexagonal sampling scheme. Each Gabor filter extracts the features of the image in each of the three angles and edges are detected. The edge detection performed by this method was found to have a better efficiency in segmentation process (Section 5). VLSI implementation strategy of this operation is shown in Figure 4 .
The entire system is a cascaded filter bank with convolution filters in each stages of operation. The final edge detection is having a comparator based edge evaluation using CLAP algorithm. The image is stored into the 'in-system memory' before it is fed to the retinal encoding system. In Altera DE1 Board on which the implementation is targeted to, 512-Kbyte Static RAM memory chip is organised as 256K  16 bits serial memory (Altera, 2006) . The image is stored serially regardless of the position of the pixel value in the two dimensional plane. Due to this conversion of two dimensional data to one dimensional signal, the neighbourhood information of each pixel value is lost. This will adversely affect the edge detection scheme. So a proper addressing scheme is required so as to input the data in the memory such that neighbourhood is preserved.
Neighbourhood selection module
The pixel values of the image under consideration are stored in the Static Random Access Memory (SRAM) module as a one dimensional array regardless of the neighbourhood definitions of the image. For the two dimensional convolution scheme the neighbourhoods of the considered pixel have to be selected properly. Address has to be generated parametrically considering the size of the image. For the N  N image, the information is inputted with the addresses as 0, N, 2N respectively for each position during first three clock cycles. During next set of three clock pulses it takes the addresses as 1, N + 1, 2N + 1 and this sequence is followed for the consecutive clock cycles. If the input to the retinal module is in this fashion the neighbourhood details will be preserved. The Verilog simulation waveform of the neighbourhood definition module for a 64  64 image is as shown in Figure 5 . The hardware module of the neighbourhood selection is shown in Figure 8 The input pixels are selected from the memory with reference to the addresses that are called using neighbourhood selection module. For a 3  3 mask based filtering the first output will be generated after generating nine addresses from this module. Subsequent outputs will be generated after the generation of three addresses.
Kernel mask implementation
A parameter based kernel mask is also a requirement such that the window which contains coefficient information has to be used to scan entire image. The pixel information taken using suitable address generation can be given into the mask which is necessarily an N  N register. The window will be fully populated only by N 2 clock pulses in first iteration. During next iterations it needs only N clock pulses for getting the required output. The entire pixels are passed through the window and corresponding central value in the window is replaced by the newly calculated values as shown in Figure 6 . The hardware module of the filter mask is shown in Figure 8 
Convolution filter implementation
The implementation of the convolution filter is designed as a look-up table based floating point multiplier-adder module.The final module requires a cascade of two such modules along with three parallel Gabor convolution modules. The Gaussian, LoG and Gabor filtering schemes are the same except in its convolution coefficients. A floating point multiplier-adder module has been implemented for gray scale image filtering.
This implementation requires lesser hardware overhead compared to conventional floating point arithmetic modules. 
CLAP algorithm based edge detection
The CLAP algorithm is basically a comparator scheme which compares the values in the scan window infimum positions among each other as discussed in the section 3.4. The maximum and minimum gray scale values in the basis structure have been found out. The distance between the maximum and minimum is said to be the distance value. If the distance value is very much less compared to a typical threshold which is arbitrarily chosen to 32 units, the central position value is chosen as zero or else the window detects the presence of a boundary pixel and gives a value of one. The output of edge detection is a binary image. Here first and fourth inputs are having difference greater than 32. It will result in the output as high while others are found to be low. The clock dividers are also implemented as the serial input.
The demultiplexer is used to demultiplex serial data to parallel set of data with respect to the size of the window and it is implemented in Verilog HDL. The proposed final architecture for the retinal module is shown in Figure 7 . The various filter stages are proposed in a reconfigurable architecture such that the hardware reconfigures itself as various filter modules according to the counter module.
In the first few clock cycles the whole system configures itself as a Gaussian filter. After the arrival of nine responses from Gaussian module the multiplexer selects the LoG kernel. This continues for subsequent three Gabor kernels. The output of the three Gabor kernels is added together and is given to the edge detection module. In the final edge detection module the pixels in the masks are compared in five parallel stages. If any of the five basis polygons are found to be present as mentioned in the section 3.4 the central pixel is considered as an edge pixel. The hardware implementation modules of the clock generator, pixel router and comparator are shown in Figure 8 
Results and discussions
Each filtering stages and final edge detection module were modelled using Verilog HDL. Simulation of each module was performed using ModelSimSE 6.5 and the synthesis was performed using Altera QuartusII. The target FPGA selected was Altera Cyclone 2 EP2C20F484C7. For the performance evaluation of the edge detection scheme Berkeley Segmentation Dataset (BSDS) is used. The retinal edge detection output is compared with ground truth edge detection to find the number of true Positives, true Negatives, false positives and false negatives in the output image. Using these observations, a ROC curve is plotted and examined for both rectangular and hexagonal tessellation schemes which is elaborated in Section 5.2.
Verilog HDL simulation results
The results of various retinal processing stages of cascaded filterbanks is discussed in this section. For processing ModelsimSE 6.5 is used and the file display is performed in Matlab. The input image is a half pixel shifted mimic hexagon image.The first stage is a gaussian smoothening filter to remove high frequency noise and the LoG filtering stage succeeding it is meant for image enhancement. The three parallel Gabor filteing in angles 0°, 60°, 120° are shown. The final feature extracted image and edge detection output is shown in Figure 9 .
ROC based edge detection performance evaluation
In signal detection theory, ROC is a graphical plot which illustrates the performance of a binary classifier system as its discrimination threshold is varied. Plotting ROC curve is one of the most effective method for evaluating the performance of edge detectors. Tables 1 and 2 given shows the number of true detections and false detections i.e pixel counts after edge detection process. Ten images from Berkeley Segmentation Dataset is taken along with its ground truth detections. The percentage of true detections is found to be better using the proposed architecture (Retinal edge detection module). The original aim of ROC analysis was to focus on positive test results both true positive and false positive. From the ROC curve (Figure 11 ), it was observed that for hexagonal sampling grid the percentage false positive values is lesser for every changes in unmatched ground truth edge values. One of the properties of ROC is the closer the curve follows the left-hand border and then the top border of the ROC space, the more accurate the test. Thus the performance of hexagonal grid can be found better while compared to the rectangular grid based edge detection from the ROC test. 
FPGA implementation results
Synthesis of the architecture was performed using QuartusII software targetting Cyclone II EP2C20F484C7 FPGA. The details of all the hardware modules were elaborated in Section 4. The results of the convolution filter module are given in Table 3 . It is found from the results that the convolution module for hexagonal sampling based operation requires less hardware utilisation compared with the rectangular sampling based operation 
ASIC implementation results
ASIC solution is a collection of silicon, IP and designed methodology tightly defined together and focused at shortening the design cycle and minimising the developement cost for complex systems. This is done by addresssing the areas in the construction of an ASIC which have greatest impact to the design schedule. Hence an attempt is made in this work for the ASIC implementation of neighborhood selection module, CLAP based edge detector and convolution filter module using Synopsys Design Vision. The target library used for analysis was a 90 nm library sc_max. The area, delay and power of each modules were analysed in this library. The ASIC implementation gave better results than FPGA implementation. The details of area utilisation, clock frequency and power utilisation of neighbourhood selection module, CLAP edge detector and convolution filter module are given in Tables 4-6 respectively. The implementation details of the final Retinal encoder module is given Table 7 . These analyses were performed in order to validate efficiency of the ASIC design of the proposed architecture in terms of less power in terms of micro watts and clock period in terms of nanoseconds. 
Conclusions
Hardware architecture for bio-inspired retinal encoder has been proposed in hexagonal lattice which has not been proposed yet. The mimic hexagon structure using half pixel shift scheme reduces the data loss problems during resampling stage. A hardware efficient method for realising convolution filter in hexagonal grid has also been proposed. Cellular automata based edge detection scheme reduces the hardware overhead problems in edge detector implementation. The edge detector perfomance has been evaluated using ROC plots in Berkeley Segmentation Dataset images and the performance was found to be better for hexagonal image processing scheme. Hardware utilisation of the encoder was also found lesser for hexagonal based operations. The retinal encoder can still be improved by extending the architecture to multi layer hexagonal mask for convolution filtering.The implementation can still be improved if the limitations in capturing and displaying images in hexagonal lattice is resolved.
