The 
Introduction
Natural language is a connected system of symbols with several levels of symbol dependencies. The language comprehension and language production abilities require the realization of these levels in a sequential order. The ability to learn these sequences is accomplished through the activation of cortical structures in a sequence determined by the evolution and an individual development [1] . HTM is a machine learning algorithm that mimics the functionality of pyramidal neurons of human brain neocortex responsible for sequence learning. The analog hardware implementation of such algorithms compatible with edge devices is an open research problem.
In this paper, we show that HTM algorithm can be used for symbol order recognition and learning sequences from character images. To test the performance of the method, we perform system-level simulations by comparing the sequences of character images. Testing this similarity of character sequences from images, that we refer as word recognition, is the step towards using the HTM algorithm for solving sequence learning tasks such as spell checking.
Current HTM hardware implementations based on CMOS-memristor hybrid circuits are proven to be useful for Face Recognition and Automatic Speech Recognition tasks [2, 3] . In these architectures of HTM, pixel values are taken as the sensory input for both face and speech recognition (extracted MFCC images) tasks and deterministic learning approach is used for algorithm implementation. Following the algorithm described in [2, 3] , we extend this work to the sequence (word) recognition task. The description of the proposed HTM architecture and the estimation of overall power and area consumption specific to image-based word recognition task are provided in Section 3.
HTM for sequence learning
The HTM algorithm for sequence learning is divided into two main parts: Spatial Pooler (SP) and Temporal Memory (TM). The HTM SP is used for the identification of spatial patterns of input sensory data and encoding them to the sparse distributed representations (SDRs) via activation of neuron columns. Originally, spatial pooling is implemented with four steps: (1) initialization, (2) overlap, (3) inhibition, and (4) learning [4] . The spatial pooler (SP) design is implemented based on the learning rules and algorithm proposed in [2] . The HTM TM stores learned synaptic values over time and the patterns that are likely to follow each other are memorized. The synaptic weights of HTM TM are adjusted according to the spatial variations reflected in the training set.
The block diagram of HTM algorithm used in this work for sequence learning is illustrated in Fig. 1 . The character images are applied as an input to the data controller, where the initial pre-processing is performed for forming the image sequence representing words. These words are further processed by HTM SP for feature extraction, encoding and converting the images to SDRs. The output data controller along with the HTM TM processes SDRs, to either update the weights in TM during the training stage or compare the SP outputs of words during the testing stage. The sequence of digits "0123456789" is used to represent the sequence of letters, which is equivalent to a word that HTM learns to recognize. The image sequences forming the words used in this paper are created from MNIST database [5] and an example set of generated word sequences are shown in Fig.  2(a) . To create the set of words for HTM training, 100 words are created using images from MNIST. HTM is trained to recognize the words considering spatial variance and differences in writing styles.
In the testing stage, the accuracy of word sequence recognition is tested on the word sequences with the different order of the characters shown in Fig. 2(b) . The testing set of images consists of six categories of a word sequence belonging to the same class with each category having 60 samples representing different levels of sequence errors, leading to the total of 360-word sequences with variable errors. First 60 samples from the first category represent the ideal word sequence order without errors, while other categories of word sequences have one, two, three, four, and five character sequence errors. To introduce the error into the sequence several characters are replaced randomly with other characters, and this changes the original order of the sequence. 
Hardware implementation
The overall hardware implementation of the proposed HTM word recognizer consists of HTM SP, HTM TM, and Memristive Pattern Matcher. The analog hardware implementation of the HTM SP is shown in Fig.4(a) . Receptor Block (RB) performs the selection of randomly generated synaptic weights represented as memristors with the random state. In RB, the input sequence of character image features represented as pixels is applied to 10 different sets of the random weights using the averaging method. The output of each set is fetched to the Receptor Block Mean circuit, which produces the final value of the RB . This corresponds to the calculation of the overlap value in each HTM column. In the next step, the calculation of threshold value for Inhibition Block (IB) is performed by the averaging memristive circuit and is obtained. The inhibition of the RBs is performed by the comparator that compares the output of each RB with the obtained threshold value. The comparator output is inverted and the final binary output of a single IB is produced. With this operation, important spatial features of the image are extracted and irrelevant features are inhibited. Figure 4 (b) presents the analog hardware implementation of the HTM TM. As shown in the first block of the figure the output of the HTM SP is applied to the comparator that consists of two inverters. The comparator performs the comparison of the pixel as a feature of a new training image with a stored training sequence samples. The comparator decides whether to increase the pixel value of the sample training image (if the pixel of the new image is white, 1V) or to reduce it (the pixel of the new image is black, 0V). The comparator output is either +∆ or −∆ value. The summation of the pixel of the previous image and the ∆ value is performed by the summing multiplier. The summing multiplier consists of the averaging circuit, that calculates the mean of the inputs, and amplifier, which multiplies the mean value by 2 to perform the summation operation. The new temporal pattern is saved in a multi-level memory that stores discrete analog values. When the training cycle is complete, the output of the HTM TM is binarized by the thresholding circuit to perform the word recognition.
In the word recognition stage, the matching of the HTM TM pattern and new HTM SP output is performed by the Memristive Pattern Matcher shown in Fig.4 
Results
To present the functionality of the proposed HTM system, system level simulation as in Algorithm 1 and hardware level simulation for SP and TM were conducted. Similarity of the original sequence to the sequences with errors was evaluated at the system level, and with hardware simulation tools area and power consumption of HTM were calculated.
Mean values of the sequence similarities for each category to the original sequence are provided in Table 1 . The mean similarity to the original sequence gradually decreases with the increase of error in the sequence. Fig. 3(a) shows the histogram of similarity score for each category of testing images, while Fig.  3(b) shows the distribution curve of the histogram values. The curve of the category with 5 errors is skewed to the left side representing the decreased level of similarity score, while the curve of the sequence with no errors is most skewed to the right, representing a higher level of similarity scores. Table 2 shows the on-chip area and power consumption for sequential and parallel processing of HTM configuration blocks. Parallel processing involves the concurrent computation and simultaneous execution of the SP and TM operations for all input pixels. This optimizes the performance of HTM in terms of speed, but it is not efficient in terms of area and power consumption. Sequential processing in turn ensures the reduction of on-chip area and power computation with the limitation of reduced processing speeds. 
Conclusion
In this paper, we demonstrated a practical application of HTM circuits for sequence learning from handwritten character images. The system level simulation of HTM circuits illustrated the possibility to differentiate between the correct sequences of digits and its mismatch. The proposed application with HTM circuits can be used to perform intelligent back-plane information processing in CMOS image pixel arrays.
