Abstract-The Sparse Distributed Memory (SDM) proposed by Kanerva provides a simple model for human long-term memory, with a strong underlying mathematical theory. However, there are problematic features in the original SDM model that affect its efficiency and performance in real world applications and for hardware implementation. In this paper, we propose modifications to the SDM model that improve its efficiency and performance in pattern recall. First, the address matrix is built using training samples rather than random binary sequences. This improves the recall performance significantly. Second, the content matrix is modified using a simple tri-state logic rule. This reduces the storage requirements of the SDM and simplifies the implementation logic, making it suitable for hardware implementation. The modified model has been tested using pattern recall experiments. It is found that the modified model can recall clean patterns very well from noisy inputs.
I. INTRODUCTION
I N the human brain, the long-term memory can store large quantities of information for very long durations (sometimes a whole life span). For example, we can remember telephone numbers for many years through repetitiontraining; this information is said to be stored in long-term memory. It has been found that long-term memory encodes information semantically [1] . The memory is also associative [2] , recalling data when an input pattern is sufficiently close to the stored pattern. There is therefore a long-standing research interest in associative memory models.
The Hopfield neural-network model [3] is attractive for its simplicity and its ability to function as a massively parallel, autoassociative memory. However, it is not suitable for human memory as it is quite limited in its ability to store sets of correlated patterns [4] .
Kanerva's Sparse Distributed Memory (SDM) [5] [6] was developed as an abstract mathematical model of human longterm memory. It is a simple content-addressable memory, with some architectural similarity to the structure of the cerebellum, and is able to store randomly distributed input data quite effectively. However, its efficiency in handling non random data, is poor. In order to improve its performance, the SDM model has been treated and modified as a neural network [7] [8] [9] [10] [11].
Hely et. al. [12] introduced an alternative SDM, the SDM signal model, which retains the essential characteristics of the original SDM, whilst providing the memory with a greater scope for plasticity and self-evolution, including [7] . The most significant modification in [12] is that the address matrix is modified by training samples instead of random initialization. The SDM model has been successfully applied in many applications such as number or letter recognition [13] , and handwriting character recognition [14] . It may also be implemented in hardware [15] [16] because of its parallel architecture and simple writing and reading operations.
Recently, the capability of storing and recalling patterns containing rank-order information has been studied. It has been found that the modified SDM model could efficiently store and recover rank-ordered codes [17] [18] [19] [20] .
In this paper, we try to use the SDM model to extract some patterns from the sample dataset. In comparison with other memory model or data retrieval methods, the pattern or feature recalled by the SDM model is not exactly same as one of the input data. Based on the idea of [12] on using training samples in address matrix, we introduce a more direct method to create the address matrix from training data. We also introduce the tri-state logic rule in the content matrix, which reduces the storage size of the matrix and simplifies the learning rule. The new model is consequently much more suitable for efficient hardware implementation.
The rest of this paper is organized as follows: Section II introduced Kanerva's SDM model, which forms the base for the modified model. Section III introduces the modified SDM model where we use several methods to improve its performance. Some simulation results are presented in Section IV. Section V concludes the paper.
II. SDM MODEL
Kanerva's SDM model is illustrated in figure 1. In this model, there are two main matrices: the address matrix, A, and the content matrix, C. In the original SDM model, matrix A is a collection of random (possibly sparse) binary vectors. The content matrix C in the SDM model can be represented using the equation 2.
where Cij (i == 1,2, ... , N, j == 1,2, ... , M) are integers.
Therefore, matrix C has size N x M. In addition to these two matrices, there are some control parameters in the model. The SDM model has several operational steps.
A. Initialization
The two matrices are initialized as follows: C is zeroed, and A is randomly initialized. The ratio of number of 1 compared with number of 0 during the initialization of A is denoted as p (typically set to 0.1 or 0.5). This random initialization of the address matrix A works very well if the stored data is random, or at least fairly uniformly distributed across the input space. However, the performance on nonrandom data is very poor, as only a small part of the address matrix actually get used.
B. Writing Operation
Like a computer memory, the writing operations save data in the memory model. The SDM model modifies its content to store this information. There are two input binary vectors for writing: address vector x == (Xl, X2, ... , XM) and data
For the given input address X == (XI,X2,··· ,XM), the
Hamming distances H (x, ai) between x and every row a.;
The content of the matrix C is updated for the active locations based on the results of vector 8 in equation 6 . A '1' in the input data d increases by 1 the values in the corresponding locations and a 0 decreases the value of the counter in the corresponding locations.
where I is defined as equation 5:
III. MODIFIED SDM MODEL
A number of modifications to the original SDM model have been proposed, to address limitations of the model. One significant issue is the address matrix initialization. The performance of the original SDM model is very poor for nonrandom data. Some methods have been proposed to deal with this problem such as using a genetic algorithm [21] . Another disadvantage of the SDM model in comparison with RAMbased neural networks [22] is that the content matrix C is integer valued; this makes it relatively expensive to produce hardware implementations of SDM.
To read a datum according to an input address x, the memory works similarly to the writing operation except that input data register d is not used.
For a given input address x == (Xl, X2, ... , XM), the Hamming distances H (x, ai) between x and every row a.; of matrix A is calculated and thresholded. The vector 8 is defined based on equation 3. Then, a new vector h in equation 7 is created by summing the related elements in the content matrix C. These sums are then thresholded at zero, which generates a 1 in the jth bit if the jth sum is greater than or equal to zero, and a 0 if the sum is smaller than zero. The thresholded value will be the output data in the output register o. That is,
(1) In binary logic the two levels are logical high and logical low, which generally correspond to a binary 1 and 0 respectively . Signals with one of these two levels can be used in boolean logic for digital circuit design or analysis. In threestate logic , an output device can also be high impedance. This is not a logic level, but means that the output does not control the state of the connected circuit.
The tri-state logic between two bits u and v can be represented in the equation 10
A. Sample-addressed address matrix A
Hely et al. [12] introduced the use of sample data in the address matrix; a similar idea was proposed in [8] . This greatly improves the performance on non-random input data. A randomly populated address matrix is able to memorize inputs across the whole address space . However, this space is typically very large , and in most applications only small regions are actually populated with data. We therefore require a very large number of hard locations to support reasonably fine discrimination between patterns.
We use some training samples to initialize the address matrix in a very straightforward fashion, copying the training samples x = ( Xl , X2 , ... , X M ) directly into the hard locations aj ' For each training sample, we first check whether the exact same pattern is already stored in the hard locations, and if so discard it (so that there are no duplicate rows in the trained matrix A) .
In the reading operation, the threshold is set slightly higher because the minimum value in content matrix C is O. In this case , the storage for the content matrix C is at least 8 times smaller than the original SDM model.
B. Hard binary for the content matrix
One of the most costly elements in the original SDM model is the use of integer counters in the context matrix. These require both relatively high storage and expensive increment/decrement logic , when considered for parallel hardware implementation. We have therefore simplified the operation to use a more compact and cheaply implemented approach.
The first approach we tried was to use a binary content matrix C . The data vector d was copied into the active locations of the content matrix during training. The initial content matrix is empty. When a training sample is input, we calculate the Hamming distance between the input and the sample-addressed address matrix A . For the active locations, we replace the content in the active row by the input data binary vector. In this case, during the writing operation, instead of using equation 6, we use the following equation 9.
A. Experiments on dataset 1
During the read operation, value Z is ignored in the sum in equation 7. For the storage of content matrix C, every clement only occupies 2 bits . It is somewhat smaller than an integer counter, that might typically use 8 or 16 bits, and is much more suitable for parallel implementation as the logic operations involved are very simple.
IV. EXPERIMENTAL RESULTS
In order to compare the efficiency and performance of our Tristate SDM model with the original SDM model, we create two simple pattern datasets. Both of these data sets were created based on clean patterns, but all the training and testing samples have some added noise. Consequently, the inputs of the SDM model are always noisy patterns. In order to keep the experiment simple, for every input sample we let the input address vector be exactly the same as the input data vector.
2,J
C. Tri-state logic
The use of "hard" binary in the content matrix C greatly reduces the storage requirements, but also discards the counting information available from the training samples. To capture some of this information, but retaining a highly efficient structure, we have introduced the use of a tri-state rule .
In digital electronics three-state, tri-state, or 3-state logic allows output ports to have a value of logical 0, 1, or Hi-Z. A Hi-Z output puts the pin in a high impedance state, effectively removing the pin from its influence on the circuit. oto 1 or from 1 to O. Figure 3 shows 100 testing examples.
On each image, white pixel stands for value 1 and black pixel stands for value o. In the following experiment , the dimension of the binary pattern is M = 64 and the number of hard locations is N = 4000. These experiments clearly show several facts. First, the original SDM model has some limitations in recalling the non-random patterns correctly (we experimented with a number of different thresholds , and figure 4 represents the best results achieved). Second, figure 5 shows that we can extract clean patterns from noisy inputs if we fill the address matrix with training samples. Third, figure 7 showed that we can get similar results if we use tri-state rules in the content matrix C . In this case, only 2 bits are needed to store the elements of the content matrix, and the addressing logic is simple. Finally, figure 6 showed that we lose some performance if we further reduce the storage of the content matrix into a purely binary matrix.
B. Experiments on dataset 2
In dataset 2, the pattern is more complex than that in dataset 1. The four patterns are 'rectangle' , 'cross', 'plus' and 'circle' . The size of the image are still 8 x 8, so the dimension of the binary pattern is M = 64. We kept the :iI'
• :.
• :. w iM .. II n:
... This paper proposes a modified SDM model with several changes to the original SDM model. First, the address matrix has been initialized by training samples in a straightforward fashion. This significantly improves the recall performance. Second, two different methods have been used to implement the content matrix, in order to save storage for the hardware implementation. The tri-state technique reduces the storage requirement significantly while keeping relatively high performance. The new model is ideal for hardware implementation of a sparse distributed memory. It also can be treated as a pattern extraction machine to extract clean patterns automatically from noisy inputs. We are currently working on experiments to mix different type of patterns together in the training and testing. Our future work will focus on the FPGA implementation of a powerful and parallel SDM model as a pattern extraction and recognition machine . number of hard location at N = 4000, and added 5% noise to every sample as that for dataset I. Figure 8 shows the 4 clean patterns. Figure 9 shows 100 testing examples.
The experimental results are shown in figures 10, 11, 12 and 13 respectively. From these figures, we can clearly see that the results on dataset 2 are quite similar to those on dataset 1. The results in figure 13 are not as good as those obtained by the integer-valued SDM model using a trained address matrix, but are still impressive. The correct patterns are retrieved, but the exemplar reconstruction is imperfect. The recall output of the sample-addressed SDM model. The address matrix is filled using training samples; the content matr ix is integer valued . 
