In this paper, we introduce a memristor based pattern matching circuit for realising a hardware oriented approach to isolated speech word recognition. As distinct from algorithmic solutions, our approach takes advantage of the memristor to memorise and match speech templates in a fast, parallel, low-power and scalable architecture. When tested for isolated word recognition by simulations on the benchmark TIMIT database, recognition accuracies of 94.5 % on clean words and 88.0 % on words with a signal-to-noise ratio of 5 dB were achieved.
Introduction:
The inability of the present computing hardware architectures to learn and process temporal patterns presents a major bottleneck in mimicking the intelligent information processing of human brain. Speech recognition is one such task that exposes the limitation of hardware, demanding extensive requirements of memory and speed in the existing von Neumann architecture. To overcome this limitation, innovation is required in terms of developing new pattern matching architectures and devices that could process and memorise multiple information states in a manner resembling the functionality of biological neural networks.
The hardware implementation of pattern matching units is proved to be quite complex in complimentary metal-oxide-semiconductor (CMOS) technology, mainly due to scalability issues, leakage currents, parasitics and device mismatch issues [1] . Memristor offers excellent scaling prospects and operates analogously to the biological synapse in the human brain that could overcome several of the limitations in the conventional CMOS devices [2] . The ability to memorise states and implement threshold logic similar to biological neurons makes the memristor a functionally suitable device to implement pattern matching functions [3] .
In this paper, we propose a memristor-pattern-matching cell that uses the inherent ability of memristors to implement isolated word recognition in speech, in a fast, parallel, low-power, and scalable hardware architecture. A single input bit is compared to a memorized bit to produce a single output bit. The proposed memristor matching system is verified on a speech recognition task by computer simulations of a word-recognition problem on the standard Texas Instruments and Massachusetts Institute of Technology (TIMIT) speech database [4] .
Proposed System Design: In the proposed system, we build templates of the gallery data by memorising each bit in a memristor array. Applying test data to the trained memristors gives the score indicative of how close a test word is to a gallery word. Template matching, which is the most fundamental classification technique, has been proved successful in software implementations. However, template matching requires a large number of memorized templates to perform well on difficult tasks, which can considerably slow down the recognition process. To address the problem of speed, to enable portable applications, and to enable improvement in the recognition rates by increasing the number of memorized templates, we propose a hardware implementation of the template matching through memristors.
The hardware template matching system, proposed in this paper, utilizes a memristor cross-bar array as found in the NOR memory arrays but its use is novel and different from both NAND and NOR configurations. The fundamental difference is in the proposed use of the memristor cell as both a processing and a memory unit, which departs from the Von Neumann architecture that limits the role of memory units to data storage. The approach is to perform an operation such as AND in time using a one-input cell rather than in space using the traditional two (or more) inputs. Using the principles shown in this paper, as well as standard circuit design, a trained engineer will be able to design a complete system for any pattern recognition problem.
Proposed Pattern Recogniser: In the proposed cell, a single memristor is used to remember as well as perform a single bit matching between features. Memorisation and recognition is implemented using the following logic sequence: during the training (memorisation) phase, Fig. 1 : Basic hardware realisation of the single-bit memristor cell incorporated with programming circuits. The switches S W will close during writing phase so that the training voltages V T R bring the memristors to their required resistance states of high resistance R H or low resistance R L . Switches S R will close during reading phase.
when the input to the memristor is logic 0, the resistance (R i ) of the memristor is trained to a high resistance state (R H ). When the input is 1, the memristor is trained to a low resistance state (R L ). Now, when during testing (recognition) phase, an input of 0 arrives at a memristor trained to R H , the output remains as 0. Also if an input of 1 arrives, still the output remains 0. When a logic 0 arrives during testing phase at a memristor trained to R L , the output is 0, and only when an input of 1 arrives does the output rise to 1. Table 1 shows that the proposed logic is conceptually similar to a logical AND operation, however, it has only a single input, while the second bit used for comparison is memorized in the memristor unit. Note that the bits arrive at two different points in time, making this a temporal AND similarity operator. 
As an illustration, let us take a 4-input template vector as P = [1,1,1,0] which will train the corresponding four memristors to [R L ,R L ,R L ,R H ] as per Table 1 . Now if a test vector T = [0,1,1,1] appears at the input of the trained memristor cell, the output vector will be [0,1,1,0], so that the similarity score for this particular test vector with the trained template vector is 0+1+1+0 = 2. For test vector T = [0,0,1,1], the output vector is [0,0,1,0] with similarity score 0+0+1+0=1, and for T =[1,1,1,0] (the template itself), the similarity score of 1+1+1+0=3 is the highest as expected. Figure 1 shows a single-bit memory element incorporating a memristor and reading and writing switches. Here, writing to the memory is the training (or memorisation) phase and reading from the memory is the testing (or recognition phase). The switches S W will close during WRITE phase so that the training voltages V T R bring the memristors to their required resistance states of high resistance R H or low resistance R L . Switches S R will close during READ phase when required inputs such as V T generated through test input voltage are applied to the cell.
To apply the speech data to the proposed memristor cell array, we encode the analogue value x to binary of B bits using a simple digitisation process [5] :
where X is the digitized matrix, k is the index denoting the coding bit (k = 1, 2, 3, . . . , B) , and thresh is a vector containing the thresholds for each bit level:
The memristor model 1 used for simulation was proposed by Kvatinsky et al. [6] . This device has a large R OF F /R ON ratio (10 3 ) while still Figure 2 shows that when voltages 3.5 V and -3.5 V are applied across the positive and negative terminals of the memristor, respectively, we get a low resistance state (R L ) and if the polarity is reversed, we get high resistance state (R H ). These voltage levels, ± 3.5 V, are used to set high and low resistance states of memristor and are represented as ±V T R , the training voltages.
TIMIT database was used for word data filtering from sentences sa1 and sa2, adding white Gaussian noise to replicate a more realistic scenario, and feature extraction using standard signal processing techniques to convert the input waveform into a Bark-scale spectrogram [5] . Specifically, the waveform was divided into sections of length 25 ms, windowed with a Hamming window, with an overlap of 10 ms between adjoining sections. Each frame was discrete Fourier transformed using 512 frequency bins, and the frequency spectrum was re-scaled using a Bark scale filter bank [7] , converting the speech waveform into 21 frequency "channels". Mean-variance normalization was applied on the entire sentence using a moving-window filter of size [1 x 200 (2 s)] [8] . The utterances were segmented into words using data provided in the TIMIT database. Zeros were inserted after each word to ensure every template was 100 frames in length (1 second of speech) and all 21 x 100 sized templates were digitized using the technique described above.
Results and Discussion: Figure 3 shows the results showing the efficacy of the proposed method in recognising isolated speech words across various Signal-to-Noise ratios (SNRs). Also it compares favourably with other methods reported in literature [9] . The main contribution of this paper is thus born out of the low-power, parallel, scalable memristor cell arrays, combining both storage and processing in a single memory element. As we have also shown in [10, 11] that logic formed by using memristors contribute to significant improvements in on-chip area and power dissipation as compared to CMOS only configurations, the proposed logic not only achieves better recognition accuracies, but does so in a truly hardware configuration, increasing the speed of operation as compared to state-of-the-art software solutions.
Since memristors can be stacked in a simple cross-point layer-by-layer array architecture, they enable low-power high density 3-D memories serving as both storage and processing elements. To build the circuit in Very Large Scale Integration (VLSI) technology, the memristor array architecture proposed in [2, 12] could be used to separate the logic elements from the data routing network by lifting the configuration bits, routing switches, and associated components out of the CMOS layer and making them a part of the interconnect. The results in [2, 12] show that memristors crossbars can be fabricated directly above the CMOS circuits, and serve as the reconfigurable data routing network.
Conclusion:
The proposed memristor pattern recognition cell represents a practical application of quantized resistive memory devices in the design of automatic speech recognition systems in hardware. The proposed logic is inspired by the cognitive functionality of the human brain, and is an example of mimicking neuronal logic circuits to memorise and process temporal data. In this approach we take advantage of the flux induced memory of a memristor to memorise templates in a fast, parallel, low-power and scalable hardware architecture. The main advantage of such an architecture is the time saved by removing the unnecessary fetch-decode-execute cycles of the traditional von Neumann computing architecture, thus significantly improving the speed of operation, not just over software-based systems. In addition, since the switching devices are silicon based, the integration of the proposed logic with CMOS logic gates is practically feasible, and can be used in combination to improve the performance of existing solutions.
