The implementation of a reaction-diffusion (RD) cellular automata (CA) model on an FPGA is presented. The model generates Turing-like patterns, e.g., striped and spotted patterns observed in marking patterns over animal skins, human fingerprints, etc. Moreover, this model has simple dynamics and generates striped or spotted patterns at its equilibrium that is reached within few cycles, which implies that the model is suitable for hardware implementation. To this aim, a digital processor architecture based on the RDCA model is proposed. Finally, the self-organized patterns generated on the FPGA by the implemented processor are presented.
Introduction
Self-organized spotted or striped patterns appear in nature, e.g., the surfaces of animals and fish, fingerprints, in colonies of bacteria and other systems [1] [2] [3] [4] [5] [6] . Alan Turing advocated the concept of "diffusion-driven instability" [7] regarding these patterns and their forming processes. This concept indicates that transitions are developed from homogeneous states to spatially inhomogeneous stable states in these systems. The time development of states is described by a sum of reaction and diffusion terms in a system. Reaction represents the production or execution of the state in a local area by summing activators and inhibitors. Diffusion represents a transport process that tends to inhibit any inhomogeneity in the neighboring region. RD equations which are described as partial differential equations exhibit striped or spotted patterns at the equilibrium state.
Various approaches which exploit RD dynamics while reducing nonlinearity to limit computational resources have been investigated. As examples, discretized RD space with cellular automata, discretized parameters and values of dynamics, or both have been presented [8] [9] [10] [11] [12] [13] [14] [15] . In this paper, a RDCA model [16] which has time-dependent diffusion dynamics and discretized reaction dynamics has been used, which has simple dynamics and is suitable for HW implementation. Moreover, we propose a digital processor architecture which generates striped or spotted patterns. Subsequently, we demonstrate that the processor implemented on an FPGA generates self-organized patterns.
This paper is organized as follows. Section 2 describes the RDCA model. Section 3 explains the architecture of the digital RD processor based on the RDCA model. FPGA implementation results are presented in Sect. 4 . Section 5 is devoted to the summary and discussion of an application example using the RDCA model.
A reaction-diffusion cellular automata model
In the RDCA model, each state of a cell is determined by a sigmoid function and a weighted-sum computation that takes interaction with four adjacent cells into account. The weighted-sum computation implies that activators and inhibitors diffuse in individual diffusion fields, and they are convoluted in each of the cells. Each state in the cells is computed as the difference between the states of activators, u, and inhibitors, v, for each cell (x, y) in the field. For the digital HW implementation, the diffusion dynamics is discretized as 
where n represents the time step, β represents the measure of steepness of the function, and c represents an offset value. 1(c) , that corresponds to the difference of activators and inhibitors. Finally, this difference is amplified by the sigmoid function in Fig. 1(e) . "One update" is defined as the application of the resulting amplified output waveform as input of a new processing.
Architectures of a digital RD processor

One-dimensional RD processor architecture
The architecture of a digital RD processor that is suitable to FPGA implementation of the RDCA model is presented in the following. A one-dimensional RD processor architecture if first presented in Fig. 2 , which handles 8 bit value and 120 addresses. As indicated in Figs. 1(b) and 1(c), an input signal is diffused by different diffusion coefficients. Therefore, two blurring filters which have different step functions are implemented in the processor. After double filtering, the blurred input signals and offset value c are subtracted. Finally, the amplified result updates the main memory value. Wave patterns are generated from repeated updates.
The nearest-neighbor kernel shown in Fig. 3 (a) is used and expanded to develop the blurring filters. Expanding the kernel range enables emulating a stronger intensity of the diffusion coefficient. The outputs out 0 in kernel 1 and 2 are respectively given by
where a represents a cell such that a 0 is centered within the filter kernel. These equations correspond to Eqs. (1) and (2) which explain the diffusion dynamics of activators (u) and inhibitors (v), respectively. Figure 4 shows the architecture of the filter module that uses filter kernels 1 and 2. The values originating from cells which are read from the main memory are shifted right, and stored into registers (Z −1 ), one by one and obey a clock sequence. The input signal is blurred by filters with kernel 1 or 2 which share some registers in order to save area. Because the blurring filters need different numbers of delay cells, the final subtraction process needs timing adjustment registers for kernel 1. Then, these outputs of the filters are subtracted. This subtraction process corresponds to the reaction dynamics (Eq. (3)) without amplification by a sigmoid function and subtraction of an offset variable c which is set to 0. Subsequently, the subtracted signal is amplified by a LUT operation, which functionally performs the sigmoid operation in (Eqs. (3) and (4)). Finally, the amplified signal is stored into the main memory. Hence, the proposed architecture has almost an identical number of controllable parameters as the RDCA model, and only differs from the precision of calculations, i.e., the RDCA model treats floating-point numbers and the FPGA implementation treats fixed-point numbers. Figure 5 shows the subtraction timings, where the output of kernel 1 is delayed by two clock cycles. Bus input represents streaming data which is read from a main memory at each falling edge of clock CLK. The kernel 1 valid signal defines the timing of the valid output data from the filter kernel 1. When kernel 1 valid signal is low, data streaming out of kernel 1 is invalid. As shown in Figs. 3(b) and 3(c), the center cell of kernels 1 and 2 are shifted in a proportion that corresponds to two clock cycles, in hardware. Therefore, the delayed output of kernel 1 bus has its timing adjusted for subtraction from the output bus of kernel 2. The kernel 2 valid signal defines the timing of the valid output data from the filter kernel 2. The outputs of kernel 1 and 2 are subtracted when the output of kernel 2 is valid. Subsequently, the filter module valid signal rises. Finally, the output of filter module bus carries data from the blurring filter. 
Two-dimensional RD processor architecture
The architecture of a two-dimensional RD processor which is an expanded version from the onedimensional RD processor is presented in the following. Figure 6 (a) shows a two-dimensional blurring filter kernel which involves four nearest-neighbor cells. The two-dimensional kernel is implemented in the form of two one-dimensional kernels which handle the blurring process into the x and y directions, respectively, as shown in Figs. 6(b) and 6(c). This technique allows to apply the abovementioned onedimensional RD processor to the x and y kernels. In order to generate a blurred 2D image, the results from blurring filters into x and y directions are added. Figure 7 shows the architecture of a two-dimensional RD processor. Using a true dual-port SRAM enables simultaneous reading of two values from different memory addresses. To blur an image data, two one-dimensional blurring filters are required, which are connected to the SRAM outputs and handle the x and y directions. Subsequently, the filtered results are stored into temporary memories to adjust the timing of the 2D blurred image generation process. The sum of values in these temporary memories are stored into the main memory as one update, as shown in Fig. 7 . By repeating updates, 2D striped patterns are formed. The architecture implements two operational steps. First, two values are simultaneously read from the main memory. After blurring and amplifying these values, the filtered results are stored into their temporary memories, respectively, as shown in Fig. 8 . In order to read the memory in the x direction, a row address counter is incremented. On the other hand, to read the memory in the y direction, a column address counter is increased by N which is determined as the width of an image, in pixels. When the memory address reaches the bottom of a column, its address changes to the top of the next column. These read operations are repeated until reading the final pixel.
Second, values are read from each temporary memories (x and y) in the x direction by using the same address counter which increments addresses, as shown in Fig. 9 . After adding these values, the results are stored into the main memory. A finite-state machine controls the address counters that generate the read and write memory addresses, and the read and write enable signal governing the memories operations. Figure 10 shows the timing chart pertaining to one update, which consists of a parallel blurring process and summation of blurred data on the 2D RD processor. The blurring process signal starts the blur filtering process and the update signal triggers the addition of blurred data in x axis and y axis. The Data out main memory signal represents streaming data which consists of a 45 × 45 pixels image initially stored in the two-port SRAM. The image size is supported by the internal FPGA memory capacity. Fixed boundary operation conditions are set. Data in memory x and Data in memory y signals identify the blurred data streams that are stored into x and y memories. When the last (2025th) pixel is stored into these memories, the blur filtering completes as shown in Fig. 8 . Subsequently, the adding process starts as shown in Fig. 9 . The Data in main memory is a data stream which is formed by summation of corresponding pixels in memory x and y. One update process is completed when the data is entirely stored into the main memory.
Implementation of the RD CA processor
We implemented the RDCA processor on an Altera Cyclone IV platform operating at a frequency of 5 MHz. A one-dimensional RD processor based on the above mentioned architecture has first been implemented. Table I Figure 11 shows temporal evolution results of the RD process, which generates waves from a step function input. The x axis represents the memory address, and the y axis represents the update number. The wave pattern in the first update corresponds to the wave pattern in Fig. 1(e) . By repeating updates, wave patterns spread from the center cell.
Subsequently, a two-dimensional RD processor has been implemented. Table I shows the specifica- tion of the 2D RD processor. To blur an image in parallel, the 2D RD processor requires two filters and amplification units. Therefore, the total amount of logic elements, and combinational functions are approximately doubled with respect to the 1D RD processor resources. In addition to the main memory, two temporary memories are required, that are used to adjust summation timings. Figures 12(a)-(c) show results which are obtained from an impulse input located in the center of the image. Figure 12(d) shows the 3D result in the second update. The pixel values are represented by the intensity (contrasting density) in the z axis direction. First, an impulse is applied to the center as an initial input. Because the amplifier has steep slope to generate RD patterns in a small area, dotted patterns are formed by updating several times. Therefore, from an impulse input, striped patterns will be observed in a RD processor which has extended RD space and a smooth slope amplifier module. Figure 13 shows the results which are obtained from a step function input ( Fig. 13(a) ). Though the initial input covers all memory addresses, values corresponding to one half of 8 bit are invalid and treated as such after the first update. By repeatedly updating, a wave pattern expands into the right and left directions as shown in Figs. 13(b)-(d) . Figure 14 shows the results which are obtained from a triangle step function input. By updating several times, a wave pattern expands to diagonal directions as shown in Figs. 14(b)-(d) . After the fifth update, the wave pattern stabilizes into a state as shown in Fig. 14(e) . Figure 15 shows the results which are obtained from a random input. By updating several times, a random wave pattern is gradually formed as shown in Figs. 15(b)-(d) . After the fifth update, a wave pattern stabilizes into a state as shown in Fig. 15(e) . The formation of wave patterns by the implemented RDCA processor is confirmed from these experiments. Figure 16 shows a comparison of RD final results (Figs. 13 and 14) , operating the RDCA formal model (second column) and the proposed hardware architecture from two identical initial conditions. The mean squared error (MSE) metrics is adopted to compare the results in a quantified manner and is presented in the fourth column. The MSE is defined as
where n represents the number of pixels in each image,Ŷ represents the estimated image and Y represents the ground truth image. 8bit values within valid 27 × 27 pixel sub-windows are considered in the calculation (a full-window size is 45 × 45 pixels). Hence, the maximal MSE value equals 65,025. As observed from null MSE values in the first and second rows, identical RD patterns are generated by the RDCA model and the RD processor. Figure 17 shows a comparison of RD trials towards reaching stable state, and operating the RDCA formal model as well as the proposed hardware architecture from two identical initial condition patterns consisting of random noise (Fig. 15) . The RDCA simulation results (Fig. 17(a)-(e) ) and the FPGA measured results (Fig. 17(f)-(j) ) are presented as a function of the number of RD updates. After the first update, the MSE is observed at 27,942. By repeating RD updates, the MSE increases. Increasing from the fourth update, the MSE is observed at 29,882 which corresponds to an approximate 46% of the maximal theoretical MSE value of 65,025. This difference arises from a discrepancy in the number representation and calculation between the computer simulation (32bit) with respect to the proposed model implemented on FPGA (8bit). Moreover, errors originate from the floating point representation and calculation used in computer simulations that contrasts to the fixed point calculation used on the FPGA; these errors are further expanded by the sigmoid function. Consequently, significant errors (e.g., (255 − 0) 2 or (0 − 255) 2 ) in the sigma term in Eq. (7) may occur when striped patterns slightly shift from their high-resolution geometrical expected location. Nevertheless, the nature of the pattern as well as its general and local shapes are preserved.
Summary
The hardware architecture of a Reaction-diffusion processor which uses an RDCA model is presented and has been implemented on an FPGA. The RDCA model which is considered in this paper has simple dynamics and reaches an equilibrium of striped pattern within few updates. The model had earlier been implemented in analog CMOS to repair fingerprint images, and has been confirmed suitable for hardware-based applications. A novel steganography method based on Reaction-diffusion had been earlier shown an efficient application using the reaction-diffusion phenomenon [17] . In RD steganography, the RD process is applied to improve data security after hiding a plain text within a random-dot image. The 2D RD processor presented in this paper is suitable to support the realization of RD-based steganography. Furthermore, this hardware can process large images much faster than software processing, thereby opening the way to real-time usage of 2D RD applications.
