The real-time encryption of pictures is an important subject for many applications, e.g. television broadcast stations, network security, etc. The paper shows how the previously introduced SCAN encryption method can be easily implemented using binary neural network autoassociative memory.
Introduction
For many applications, e.g. pay TV broadcast stations, cheap real-time encryption of pictures is an important subject. Here, the transformation of parallel accessible picture elements (pels) to a sequential TV signal can be used to encode it by a special arrangement of the pel sequence. This corresponds to a special scan order of the picture. Since we have for a picture of n-m pels (n.m)! scan orders, an important feature of the encryption is simplicity. For this kind of problem many schemes have been proposed. for instance a method based on the SCAN context free language for pyramid data structures [Alex951 0-8186-7728-7196 $5.00 0 1996 IEEE In previous papers, the language has been mathematically defined [Alex891 and a parallel implementation has been proposed [Bour89] . Here we investigate the implementation by specific proportions of a neural network model.
The SCAN method
Let us shortly review the SCAN methods for picture encryption. The main idea consists of deviding the picture into subpictures. Each subpicture is treated as a picture element (pel). On the next level, each pel can be subdevided into smaller pels, and those again, repeating the devision until the pels are reduced to pixels. Hereby, every devision defines a level i of squared subpictures with ni=ai.ai pels of a certain size. In figure 1 an example of three levels is shown with al=2, %=4, %=2.
On each level, we have a certain scan order for the pels, marked by dotted arrows in figure 1. If we denote each order by a symbol L,, we achieve for the set of scan orders a set of symbols, an alphabet ( Li} for the SCAN language. A complete pyramid of N layers is denoted by the expression L,al#L2%#..#LNaN with ai being a power of 2. In our example of figure 1 this is denoted as B2#A4#X2.
The sequential algorithm passes recursively from the top level of the pyramid to the bottom and back again, scanning the pels at each level by the appropriate strategie L,. Thus, the iterative application of a scan order from one level to the next higher level generates a certain sequential order of pels, and finally, of pixels.
In our example of figure 1, level 1 has nl=4 pels (index 0 .. Since this ordering is deterministic, a parallel scheme can be devised by succesively expanding the scan ordering of pels into a scan ordering of pixels [Bour89] . Here, the scan ordering is performed by the parallel uansform of the whole set of specially ordered pel indices of one level to the relative pixel index at the next level. At the highest level, the relative pixel indices are the ones of the whole picture, and thus are absolute.
The neural network model
For a real-time application like the encoding of TV signals, the index transformation mechanism must be implemented by a simple, high-speed module. Principally, this module must be able to produce a sequence of arbitrary stored numbers on the input of a keyword, e.g. "X" or " R . The conventional solution would provide a high-speed signal processor working on a RAM. For a 1024.1024 (=224 sized picture we will have to deal with 220! possible encodings. Since the processor can not store this huge number of sequences in RAM, it have to be produced by an PROM-based algorithm which involves additional overhead and necessary performance speed of the processor to produce a sequence with a constant bit rate. For a picture of 220 pixels and a non-interleaved refresh rate of 50 Hz this comes up to 20ms for lo6 pixels or 20 nanoseconds per pixel address. Since this in the order of the picture RAM read cycle time, the necessity for a cheap, non-processor based hardware solution is obvious. This paper proposes a new approach for this kind of problems, using binary artificial neural networks as base address sequencing modules.
The autoassociative memory
The base mechanism of the sequencer is the well-known one used in associative correlation memory, see CKoh841. Let us review shortly this model.
Here, the task consists of storing M tupels
.,x,) and desired output patterns y(t)=(yl,..,ym), xi,yie 31 in such a way that the output is recalled whenever the input key pattern is fed into the system.
The correlation associative memory consists mainly by a matrix W of real-valued weights (w..) between input lines x. and output activity lines zi. Very often, the suppression of activity noise is obtained by a nonlinear output function y,=S(zI).
The storage is obtained by the learning rule for the weight matrix after the t-th presentation of the tupels After storing the patterns, the recall process can take place. If we encode all key pattems orthogonally (i.e. x(k)Tx(p)=a if k=p, else zero), on the input of a pattern x(k) the activity z will become If we choose appropriate constants c( t)= [x( OTx( t)]-' in eq.( 3. I), we will obtain directly the output pattern y(k) associated to the input x(k).
The binary thresholded model
This was the base function of the model. Now, if we have n components in the input, we can have at most M=n orthogonal base vectors or input-output tupels. This is not much. We can increase the number of tupels, if we consider also tupels x which are not orthogonal to all the already stored ones. Certainly, by the linear activity eq.(3.3) all non-orthogonal components will result in cross-talk or noise between the output activity lines. For binary input and output xi,yi€ { 0,l) we can suppress the additional activity by a suitable threshold si 1 2. > s.
Yi=s(zi)=[ 0 zi * -< si (3.4) which have to be activity-dependend for arbitrary input x (see [Bra88] ). In case of Ixl=const the noise suppression gets more easy and we can choose a constant threshold si=lxl. By the introduction of a threshold the recall process becomes a classification: a set of different input patterns is mapped on the same output.
In the case of binary weights Palm [Palm801 even showed that in the limit of sparsely coded tupels the associative memory has a capacity of 69% of the ordinary RAM equivalent which is quite effective. Since each binary weight is saturated by only one non-zero contribution, the storage equation (3.2) becomes an OR relation instead of the sum each one representing a number in the range O..d-I, is transformed to k ones on ked activity lines. Since the number of binary lines grow up considerably by this measure, the step from dense coding to sparse coding can be done only on the chip level. Thus, the kernel memory with its binary weights will be located on chip accompanied by the encoders. This design is shown in figure 3 with additional feedback lines, explained in the next section.
Sequence generation by autoassociative memory
The associative memory presented so far can also be used to obtain a a sequence of patterns synchronized by a clock cycle. For this, the input pattern x(O)=(key, y(0)) has been associated to the output pattern ~( 1 ) .
At the clock cycle, the input x(0) produces the output y(1). Now, this is feed back, so the next input pattern will be x(l)=(kcy, y(1)). If we have already associated and stored this to y(2), the output will become y(2) at the next clock cycle and so on. We can close the sequence and make a pattern cycle by defining y(O)=(O..O), because the last pattern y(M) will produce no output, resulting in
We see that it sufficies to store tuples of the form (key+y(t),y(t+l)) for all t to generate a sequence by a feedback associate memory. This has been already proposed by Kohonen [Koh841.
The encryption system
Now we can easily implement the encoding system discribed in section 2 by the mechanism introduced in section 3.
For a better understanding of the underlying mechanism let us use a simple example, e.g. a B2#R2 scheme. In figure 4 the four pels of the B2 scheme are scanned in the order (0,1,3,2), denoted in the right upper comer of each pel, whereas within each pel the four pixels are scanned in the order (0,1,2,3), denoted in the right hand lower corner of each pixel. The resulting pixel index and its corresponding binary equivalent is shown in the table on the right hand side of the figure. Note that the four bits of the pixel addresses are devided into two groups: two high bits for the rows and two low bits for the columns. This can be generalized: since the number of pixels is nxn and n is a power of 2, the whole binary pixel address is a multiple of 2 in each row and we have a multiple of 2 of rows in the whole picture. Since we have n lines, we have log2(n) bits for the rows (the high bits) and log2(n) bits for the columns (the low bits) to form the log2(nxn)=210g2(n) address bits for all image pixels. Each row (and each column) is devided into al segments. With al also a power of 2, the highest log2(al) bits will denote the address bits of the first scan level, whereas the next lower log2(%) bits denote the bits of the second scan level, and so on.
In figure 5 an overview is shown over the whole system design for the example of figure 1. The system consists of three stages, one for each encoding level. Since we choose the number ni=af of elements at each level to be a power of two. each pixel address can consist of three distinguished parts. Each part is generated in parallel by the binary output pattem of each stage. Please note that the correct sequential pixel address has to be obtained by regrouping the column and row parts of the output into the two domains in order to form the pixel address.
As the clock cycle reference and synchronization signal the pixel clock is used. This is one scan step, corresponding to the innermost loop of the sequential algorithm shown in figure 2 . All outer loops are generated by the subsequent binary devider stages which generate one pulse after a loop has finished, resetting the associative memory by activating the appropriate scan pattem scheme number and triggering the next memory step of the next upper level.
Naturally, the necessary binary and analog circuits for sending and receiving the encoded signal and loading or activating the scan pattern numbers are not of interest and therefore not shown here.
Simulations
For a simulation of the parallel, binary hardware operations we can use the parallel hardware build in ordinary CPU's. Since the binary operations of eq. (3.5) can be implemented by logical operations, we have chosen a highly hardware independent, but very efficient program implementation in the C language. In figure 6 , an implementation kernel of the readout operation of the binary associative memory is shown which breaks all binary vectors of input, output and storage matrix rows into the length of machine words. All bits in each word are processed in parallel by the CPU. Since the basic operations are very simple, they can be held in the instruction and data caches without problems.
For the simulation, a real picture has been encoded and decoded in the described sequential manner. In figure 7 , the original 400x400 picture is shown. In figure 8 , for the partially encrypted picture, coded by the B2 scheme for the high level pels, is shown. We see that the first scan scheme just order the picture on the level of whole pixel blocks. The next figure 9 shows how each block is mixed up by the next level, in this case A5 defined by the order { 0,1,6,5,2,7,12,11,10,3,8,13,18,17,16,  15,4,9,14,19,24,23,22,21,20} . Finally, figure 10 shows the fill B@#A5#15#R8 encrypted picture. Note that in this case also a non-standard picture with a sidelength not equal to power of 2 was shown to be encrypted; this is possible for all cases by using the G-SCAN (general Scan) version of the language [Bour86,Bour93].
Discussion
This paper showed how the Scan encoding language can be efficiently implemented by ordinary feedback associative neural networks. Furthermore, simulations show that the binary version of these sequence-generating modules can also be implemented by conventional computer hardware using boolean operations, or with microprogrammed ASICs [Bour95] . The short programs can be held in the cache buffer, allowing very fast operations in ordinary RISC computers. Nevertheless, for real-time operations this is still too slow and special VLSI implementations should be considered. They will represent the core of a more complex system which sends and receives the encryption/decryption keys along with the pixel stream. Since the associations are evoked in only one clock cycle, the keys can rapidly change (for instance during the synchronization time period of ordinary TV frames) without causing any delay in the encodingldecoding process. Thus, the corresponding declyption cards are very flexible and cannot be copied by conventional reverse engineering approaches without the special neural network chips which pritically prohibits pay TV decoder piracy. 
Y++;
) P end readout */
Fig. 6
P next output bit */ P next word of output */
Fig. 7
The original 400x400 picture m m Fig. 9 The picture, scrambled by B2#A5 Fig. 8 The picture, scrambled by B2 Fig. 10 The picture, scrambled by B2#A5#15#R8
