Abstract-Active Learning Method (ALM) is one of the powerful tools in soft computing that is inspired by human brain capabilities in processing complicated information. ALM, which is in essence an adaptive fuzzy learning method, models a MultiInput Single-Output (MISO) system with several Single-Input Single-Output (SISO) subsystems. Ink Drop Spread (IDS) operator, which is the main processing engine of this method, extracts useful features from the data without complicated computations and provides stability and convergence as well. Despite great performance of ALM in applications such as classification, clustering, and modelling, an efficient hardware implementation has remained a challenging problem. Large amount of memory required to store the information of IDS planes as well as the high computational cost of the IDS computing system are two main barriers to ALM becoming more popular. In this paper, a novel learning method is proposed based on the idea of IDS, but with a novel approach that eliminates the computational cost of IDS operator. Unlike traditional approaches, our proposed method finds functions to describe the IDS plane that eliminates the need for large amount of memory to a great extent. Narrow Path and Spread, which are two main features used in the inference engine of ALM, are then extracted from IDS planes with minimum amount of memory usage and power consumption. Our proposed algorithm is fully compatible with memristor-crossbar implementation that leads to a significant decrease in the number of required memristors (from O( ) to O( )). Simpler algorithm and higher speed make our algorithm suitable for applications where real-time process, low-cost and small implementation are paramount. Applications in clustering and function approximation are provided, which reveals the effective performance of our proposed algorithm.
INTRODUCTION
Nowadays, various scientific areas, including artificial neural networks and fuzzy logic, are trying to discover and simulate the way in which human brain processes information.
On one hand, artificial neural networks investigates the neural network of living organisms; fuzzy logic, on the other hand, studies the functional properties of the human brain. These tools have been widely reported with numerous successful applications in function approximation, time series forecasting, and data mining; however, fuzzy logic has special status [1] . The term fuzzy logic was first introduced by Lotfi A. Zade in 1965 [2] . In system modelling, classic methods describe the system by complicated mathematical equations; whereas fuzzy logic employs a different approach and describes the system with a set of linguistic "IF-THEN" expressions. With this approach, fuzzy logic not only relieves the accurate computation burden, but also performs better in the presence of uncertainty [3] .
In 1997, Shouraki et al. introduced Active Learning Method (ALM) which is one of the powerful algorithms in soft computing [4] , [5] , [6] . ALM is a modelling and control algorithm which is based on learning capabilities and expertise of the human brain and tends to find a logical cause-and-effect relationship between events. It was first employed to control an inverted pendulum [7] . Despite its counterparts in modelling (fuzzy algorithms like Sugeno-Yasukawa [8] and Takagi-Sugeno [9] and neuro-fuzzy algorithms like ANFIS [10] ), ALM enjoys a simpler structure and faster training phase [5] . Avoiding time consuming, iterative and complicated computations along with fast learning, stability and noise resistance [11] are some advantages of this algorithm. ALM has been successfully employed in numerous applications in control [7] , [12] , [13] , robotics [14] , [15] , modelling [11] , soft computing and artificial intelligence [16] , image processing [17] and real-time processing [15] . ALM is inspired by some hypotheses which claim that the human brain interprets information in pattern-like images rather than numerical quantities and tends to break down complex problems into some simpler subproblems. The main idea of ALM is to approximate a Multi-Input Multi-Output (MISO) system with several Single-Input Single-Output (SISO) subsystems. Each of these subsystems, which is modelled by an Ink Drop Spread (IDS) plane, represents the relationship of the output with respect to one of the inputs of the main system. Two main features called Narrow Path and Spread are extracted from each IDS plane. These two features from all IDS planes are then fed to the inference engine in order to approximate the output. The number of required IDS planes is determined by the number of inputs and the number of fuzzy partitions on each as well as the complexity of hardware implementation. Therefore the bottleneck is an efficient hardware implementation of IDS operator. Different implementations have been proposed, Murakami in [18] proposed a digital implementation which requires large amount of memory and a high-level controller to manage data
Fast IDS Computing System Method and its
Memristor Crossbar-based Hardware Implementation
Sajad Haghzad Klidbary, Student Member, IEEE, Saeed Bagheri Shouraki, and Iman Esmaili Pain Afrakoti transfer. Pipeline implementation by Firouzi [19] suffers from digital computing problems such as overflow and finite precision. Tarkhan's analog implementation, however, suffers from high power consumption and high complexity of each IDS plane [20] . Memristor is a nonlinear passive two-terminal electrical element whose resistance is controlled by voltage and can act as a memory resistor. Because of the compatibility of IDS planes with memristor-crossbar structure, the implementation of IDS planes on this structure is efficient and of low cost. For this reason, Merrikh-bayat proposed the first memristor crossbar hardware implementation of IDS plane [21] and then, a more efficient implementation was proposed by Esmaili [22] ; however, both implementations suffer from large hardware problem. These hardware implementations suffer from large amount of memory required to store the information of IDS planes and consequently higher power consumption. For instance, for an IDS plane, memory cells (memristors) are required, where is the resolution of the IDS plane. This large number of memristors are enough to implement the information of only one IDS plane, and as the number of inputs or their fuzzy partitioning increases, the number of IDS planes increases inevitably. In addition, the hardware required to implement IDS operator and inference engine also add to the complexity of the overall hardware implementation.
In our proposed algorithm, a novel approach to describe IDS planes is employed, which eliminates the computational complexity of IDS operator and requires only memory cells (memristors) for each IDS plane. This results in significant reduction in the hardware complexity of the memory unit from ) to ), where is the resolution of an IDS plane on each of its axis. In our proposed algorithm three rows of memory cells with resolution of memristors are assumed, two of which store the upper-bound and the lower-bound of the input-output relationship in an IDS plane. The difference between these two rows (or vectors) is inversely proportional to the degree of belief by which the associated input value is observed, and the last row keeps the values of the output. In the learning phase, when a new training sample is observed, each element of these three vectors will be updated by applying a coefficient of their distance to the new observed sample.
Following are justifications of our proposed algorithm:
1) The large number of required memory cells to store the information of IDS planes as well as partial usage of the whole memory in traditional approaches reveals the need for a novel approach, particularly in the way IDS planes are described.
2) The large hardware required to implement IDS planes and IDS operator should be optimized to make it suitable for real-time applications.
3) Hardware should be reduced in size and complexity to optimize the power consumption.
4) In traditional approaches, there was a need for a diode with low inverse current to be serried with memristor and to reduce the feedback effect. The proposed approach eliminates this need.
5)
In the traditional approaches, the memristive implementation was based on approximated mathematical equations; whereas, the proposed hardware is implemented with accurate mathematical equations.
Therefore the proposed algorithm enjoys a significant reduction in the large amount of required memory and in the computational complexity of the IDS operator. Due to the analog mechanism of our proposed algorithm, it out performs digital implementations in terms of learning and test speed. In addition, the proposed hardware is implementable with considerably lower computational complexity.
The remainder of this paper is organized as follows. The main concepts of ALM and IDS operator are reviewed in section 2, followed by a brief description of the forth fundamental electrical element called memristor in section 3. Section 4 illustrates our proposed algorithm and its memristorcrossbar hardware implementation is presented in section 5. The evaluation of our proposed algorithm is presented in section 6. Eventually section 7 concludes our paper and provides suggestions for future work.
II. ACTIVE LEARNING METHOD (ALM)
Processes and computations in the human brain are believed to be qualitative and imprecise in their essence. This observation has led to a research field called soft computing, which deals with uncertainty and simulates the human brain. Among those, fuzzy systems, which is inspired by human brain capabilities, can be employed as a tool to deal with uncertainty and to provide stability in real-world problems. ALM has adopted a fuzzy approach and has its basis in hypotheses which claim that humans interpret their surrounding environment in an inexact manner and rather than dealing with quantities and numbers, a general characteristic of the environment is learned. Facing a new problem, humans tend to avoid delving into details; instead, a general understanding of the problem is preferred and, if required, minor problems are overcome first. Therefore, in dealing with complicated problems, humans first attempt to find simpler or familiar concepts and then by discovering logical connections among these concepts, obtain an inexact definition and a general understanding of the problem without much effort. ALM is an adaptive fuzzy learning method that obtains a clearer understanding of the original problem by splitting a complicated problem into several simpler ones. ALM approach to splitting a MISO system into several SISO subsystems is illustrated in Fig. 1 . ALM has two main steps. The first step is updating IDS planes when training samples are observed and the last step in which inference process is done. These steps are discussed in the following subsections.
A. Ink Drop Spread (IDS) Operator
The heart of ALM is the IDS operator which somehow models fuzzy interpolation. Despite traditional learning algorithms in which system behavior is represented by complicated mathematical equations, ALM tries to simulate human brain functionality by providing a qualitative and behavioral description of the system. In ALM, the imprecise and fuzzy characteristics of the human brain in learning from events is modeled by IDS operator. The function of the IDS operator is inspired by the fact that experiences in the data space are continuous in essence. In other words, the space of learning is not confined to those observed samples; moreover, in the vicinity of an observed sample, the learned features are still valid, though, probably with less certainty as we move away from that observed sample. If the domain of all inputs and output are quantized, then an IDS plane associated with a SISO subsystem is a gridded plane that depicts the projected samples of input and output for a specific domain of variation of other inputs. The effect that the IDS operator has on an observed sample is analogous to instilling an ink drop on the coordinate of that sample. The IDS operator is applied to all observed samples of input and output on the IDS plane. Each IDS unit is comprised of two main parts. A 2-D plane that captures the relationship and the Feature Extracting Unit that extracts useful information from the formed pattern on the IDS plane as shown in Fig. 2 Fig. 2 . The structure of an IDS unit. Initially the IDS plane is white (empty) and when a new sample is observed the associated ink drop will spread on the IDS plan, as an ink drop spreads on a sheet of paper. Narrow Path and Spread are two features that will be extracted from the pattern formed on the IDS plane by feature extraction unit. These features are then fed to the inference engine. In this figure a pyramid membership function is assumed.
For simplicity the number of IDS planes is assumed to be equal to the number of inputs (no partitioning is performed on inputs domain). When a training sample is observed, where , and is the size of the training dataset. The plane, which is associated with the relationship between the output and the th input, is updated according to the projection of that sample on this IDS plane. For instance, assume two distinct samples ( ) and ( ) are observed. In this twoinput single-output system, two white (empty) IDS planes and are assumed. As shown in Fig. 3 The Narrow Path function for the formed pattern has been shown with red color and the inverse of Spread around the Narrow Path reveals the degree of certainty regarding the occurrence of that observation. One of the advantages of ALM is that only one epoch is sufficient to learn the training set and no initialization is required. The order of observing training samples does not affect the final result and the algorithm gradually learns through interaction with the system. When a new training sample is observed only local regions, rather than the whole plane, are updated [11] . It is worth mentioning that the radius, an ink drop is permitted to spread, is an important parameter that affects the performance, convergence speed and output error. This user-defined parameter is inversely correlated with the size of the training set and the density of samples. This parameter is defined according to the resolution of the IDS plane as well as the density and distribution of training samples. In the original version of the ALM this parameter is chosen through an iterative trial and error approach which is time consuming. In the figure provided, it can be seen that the IDS plane can be treated as a 2-D memory unit, in which each cell stores the darkness of the associated grid on the IDS plane. 
B. Inference Engine in ALM
After the IDS operator has been applied to all IDS planes and the Narrow Path and Spread for all IDS planes have been extracted, these features are then fed to the inference engine of the ALM. The Narrow Path function in the reveals the relationship between the output and the input and the value of Spread in this plane, compared to spread in other planes, is an indication of the importance of the input in approximating the output. As shown in Fig. 4 , the Spread in some input domains is wider than that of other domains. Wide Spread in some input domains, implies that in those input domains the output is more affected by other inputs rather than , because the output has varied a lot, while the input held almost the same value. This is the justification for how a MISO can be approximated by several SISO subsystems in ALM.
If the Spread is wide for all input domains on an IDS plane, fuzzy partitioning the universe of discourse of other inputs is proposed. In this case, for each partition a separate IDS plane is required and by splitting the input domain more knowledge can be extracted from IDS planes, which leads to lower approximation error. It should be noted that if the datasets is small, excessive partitioning the input domain will result in high approximation error.
Various approaches have been proposed to find the Narrow Path including maximum operator and averaging operator. In order to measure the Spread, the width of the ink drop spread around each point can be employed. The following is an approach proposed in [11] . It should be mentioned that because of the fuzzy and inexact nature of the IDS operator in ALM, different approaches for extracting these features do not vary that much in final results and the selection among different approaches depends on the hardware implementation and processing speed considerations.
For the dataset that contains training samples:
Where each input is in dimensional space and we will have:
The first step of ALM is quantizing the input and output domains in each IDS plane. For simplification, quantization levels for both input and output can be the same and equal to . By this choice the resolution of the IDS plane is , where . Therefore, we have:
Where and is the number of IDS planes. The quantized values are as follows:
If is a projected point to the space and we assume that its darkness is indicated by and its membership function is a Gaussian with maximum of one and appropriate variance, then observing such a sample entails updating the grid on the plane as follows:
Where indicated the radius of ink drop spread and is the shape of ink drop spread. With regard to definitions of Narrow Path and Spread, these two features are calculated as follows:
Where and are Narrow Path and Spread on the plane respectively. The first equation implies that the Narrow Path value on the plane for any given quantized input is , if the sum of the grids darkness values above the grid is approximately equal to the sum of the grids darkness values below the . The second equation implies that the Spread value on the plane for any given quantized input is proportional to the effective width of the formed pattern on the column of grids on the coordination of . In this equation, the parameter indicated the minimum acceptable darkness of a grid on the IDS plane to measure the Spread, and is defined by the user. When a new test sample is observed, , where is the dimension of the test sample, the Narrow Path and Spread values for this sample from all IDS planes are extracted and then summed to approximate the output as follows:
Where is the number of IDS planes (if no partitioning is performed, also the number of inputs). As it can be seen, the inference in ALM is done by weighted sum of Narrow Paths where is the weight associated with the Narrow Path, which is the normalized value of Spread inverse or any descending function (the wider the Spread in plane the lower we believe in the Narrow Path of th plane and vice versa). For those values of whose grid on the IDS plane is white, and values are assumed to be equal to ⁄ . Fig. 5 shows the flowchart of ALM algorithm. In the first step of this algorithm for those IDS planes where data samples are sparse, data sampling is performed intelligently. Initially no partitioning on inputs domain is performed. After IDS operator is applied and Narrow Path and Spread are extracted, the most effective inputs will be identified. The model is then built and if the approximation error is within a user-defined range, the algorithm stops; otherwise, inputs domain partitioning will be performed. Even if by partitioning the inputs domain, samples on IDS planes are still sparse and the algorithm is not successful to decrease the error, the algorithm returns to the first step and samples more data and rebuilds a new model to decrease the error. In the next step, if the Spread in an IDS plane is greater than a user-defined threshold, the algorithm intelligently identifies that IDS plane and more data sampling is performed for that particular IDS plane. Various methods for partitioning the input domain have been proposed, one option is to double the partitioning in each step [11] .
In Aristotle's logic, in order to enhance knowledge, it is proposed that more details should be extracted, this is in stark contrast with ALM which is inspired by human brain functionality. When faced with a new learning situation, the human brain tends to discard details and learn the overall behavior. In ALM, if desirable and sufficient knowledge is not achieved, then by splitting the original system into some simpler subsystems, more knowledge is aimed to achieved within each SISO subsystem, because each of which only deals with a particular domain of inputs. Eventually, in the inference unit of ALM, the extracted features, Narrow Path and Spread, from all IDS planes are aggregated to approximate the output. The number of fuzzy rules in inference unit is equal to the number of IDS planes. In this algorithm the most effective inputs in approximating the output are intelligently identified and the algorithm tries to build a model with the minimum complexity and the number of inputs. In this flowchart, is the desirable error threshold and is the threshold that defines the minimum acceptable data density on IDS planes.
III. MEMRISTOR
In addition to the three previously known fundamental electrical elements: resistor, capacitor and inducer, in 1971, Leon Chua mathematically proved and introduced the fourth circuit element relating the electrical charge with the magnetic flux [23] , [24] . This element named Memristor as a combination of "Memory" and "Resistor". Prior to 2008 no successful implementation of this element was reported, this was mainly due to the fact that the memristive characteristic is only observable in Nano scale. In mid-2008, in HP research laboratory the first memristor was successfully realized [25] . Memristor has various applications such as implementing nonvolatile RAM [26] , spiking neural networks [27] and human learning algorithms [28] , [29] , digital circuits [30] , [31] , programmable analogue circuits [32] , [33] , [34] , [35] , pattern recognition and signal processing [36] . Because of the powerful implementation capabilities, low power consumption and stability of stored data even without power, memristor has received considerable attention [37] . Fig. 6 A. Physical structure and equations Memristor is a passive two-terminal element which relates the electrical charge with magnetic flux as follows:
Equation (13) can also be written as follows, which indicates that the unit of memristance is Ohm.
Memristor can behave as a dynamic resistor whose resistance changes with respect to the voltage applied to or current passes through its terminals. If the characteristic of the memristor is considered linear, it behaves as a simple resistor with resistance . Fig. 7 shows the physical and circuit model of the first memristor realized by HP [25] . As it can be seen from Fig. 7 , memristor is comprised of a very thin layer of Titanium Dioxide ( ) with width sandwiched between two platinum ( ) contacts. The semiconductor itself is comprised of a doped and an undoped regions. The width of doped region is and the resistance of this region is lower than the other region. The variable width of the doped region makes the memristor a dynamic resistor. By applying voltage to the two terminals of a memristor, the border of two regions displaces and this causes the total resistance to change. The resistance value changes between two extremes; and . When the applied voltage amplitude is so that the doped region extends to the full width ( ) the memristance tends to reach its minimum value ( and inversely when the applied voltage amplitude makes the undoped region extends to the full width ( ) the memristance tends to reach its maximum value . In the mathematical model of memristor proposed in [25] , as in (15) and (16), the electrical field in the memristor is assumed to be uniform.
(15) (16)
Where is the initial width of doped region , is the average ion mobility and is the net electrical charge passing through the element. These equations also reveal that the passing electrical current in one direction increases the memristance; whereas, the opposite direction of current decreases the memristance and if no current passes through the element, the memristance remains constant and behaves like a simple resistor. Thus, polarity and amplitude of the voltage signal as well as the duration of this signal are the main parameters affecting the memristance. In order to read the value of a memristor, it is sufficient to apply a current signal with amplitude less than a threshold for a short period of time and read the voltage over the terminals.
Various algorithms and computational frameworks for simulating the computational capabilities of the neural system of living organisms have been proposed. Almost all these frameworks suffer from a major drawback that is the lack of compatibility between hardware and the nature of the problems in hand such as implementation of Neuromorphic systems on FPGAs [38] . The main focus, though, is on hardware implementation which is not efficient that makes the large scale implementation of these algorithms infeasible. This problem has been resolved to some extent since the realization of memristor which complies with the synaptic behavior of a biological neuron and can be implemented in small size. Numerous studies have been shifted to this kind of implementations. In the next section the implementation of the IDS plane on the memristor crossbar structure will be presented.
B. Memristor crossbar implementation of IDS plane
A memristor crossbar is comprised of a series of horizontal wires passing over vertical ones, and at each intersection, a memristor is connected so that by applying proper voltage over any pair of vertical and horizontal wires, the memristor at that intersection can be accessed. In fact, the memristor crossbar is analogous to an array of analogue memory cells.
The analogue values are stored as memristance of memristors.
The advantages of such an implementation are its nanoscale implementation and low power consumption.
As discussed in previous sections, the IDS operator in ALM requires spreading ink drops on 2-D memory planes. Because of the close resemblance between memristor crossbar structure and IDS planes, this structure has been employed to implement IDS operator. Fig. 8 shows such an implementation which was proposed in [21] and includes Memory Unit and Computation Unit as well. In this structure each memristor plays the role of a pixel in a 2-D IDS plane and its memristance, which can be set by applying proper voltage, is set to be the value of the corresponding pixel (pixel darkness). This structure was first proposed by Merrikh-bayat in [21] which is designed for spreading ink drops on an IDS plane with resolution . Resistors with constant resistance are employed to perform the spreading of ink drop (blurring). In the training phase when the CLK signal is active, resistors become part of the circuit.
Memristor crossbar structures are able to store the data in analogue form and thus have higher capacity compared to digital memory structures. Furthermore, these structures preserve data without energy consumption, and in comparison with digital memories, faster reading and writing are possible. In order to model a system with inputs with ALM, at least 2-D IDS planes or memristor crossbar structures are required. In addition to memory units, a number of computational circuits are required to extract Narrow Path and Spread. However, by proposing a novel view to inference step of ALM and compromising the accuracy, Esmaili in [22] proposed a novel approach which relaxed the large computational hardware requirement, yet the large number of circuit elements was apparent. In the next section, by incorporating a novel perspective on ink drop spread on IDS planes, we fully describe our contribution which aims at reducing the complexity of computation and hardware implementation.
IV. PROPOSED ALGORITHM
One of the main challenges in the ALM algorithm, in particular IDS unit, is its hardware implementation and large amount of memory required to store the information of IDS planes. This becomes even worse when partitioning the inputs domains is desirable. In the original IDS unit, the size of the required memory is as large as the whole IDS plane grids. In [5] an alternative hardware implementation of IDS unit was proposed that employs two memory vectors where any pair of Euclidian adjacent points are replaced by their mean. In this approach the size of the required memory space diminishes seven times. A digital parallel hardware implementation was also proposed in [19] that suffers from digital problems such as overflow and limited precision. Therefore, seeking an alternative approach to decrease the complexity of IDS computations is necessary.
The most critical step in proposing a more efficient alternative IDS algorithm is to find a proper mapping in order to represent the data space. The new mapping should deal with computational complexity, complicated equations and hardware limitation to obtain the highest possible precision. In addition, new mapping should be suitable for real-time applications. Having smaller and denser structure compared to other circuit elements, memristor implementation also operates with lower power consumption. All these advantages make memristor implementation a suitable structure for our purpose. In the proposed algorithm three memory vectors are employed to describe and store all valuable features of an IDS plane, two of which are responsible for storing the lowerbound and the upper-bound of the output. And the third one stores the Narrow Path. In the proposed algorithm three separate units are required namely Storage, Learning and Features Extraction units. Moreover, a high level controller is required to generate electric pulses used in learning phase and reading the memristance of memristors. Inputs domain determination and partitioning are done by the human operator and then appropriate electric pulses are generated in Chip Programmer to perform the learning phase in the Memristor Chip. Fig. 9 shows the systematic structure of the new IDS unit (FAST IDS). These units will be discussed in more details in following subsections. Like all learning algorithms, there are learning and test phases, which will be discussed in following subsections.
Host Computer

IDS
A. Initialization and Learning Algorithm
Let , where and are the number of samples and the number of independent variables respectively. In our proposed algorithm, there are three assumed vectors that are denoted by that describe IDS planes. All these vectors are long.
and denote the output lower-bound and upper-bound respectively. is a coefficient of each output sample and is equivalent to Narrow Path in original IDS and eventually converges to it. The difference between. and acts like the Spread in the original IDS and specifies the degree of certainty around each input point. Thus, the initial values of describing vectors are as follows:
The learning algorithm is as follows. When a new training sample is observed, values stored in three describing vectors are updated with regard to their distance to the output value of the observed sample. As mentioned before, the output value of each sample is quantized between 0 and . For each observed sample, its output distance to all describing vectors are calculated. These vectors are then updated locally. If is a point on the x axis and is the training sample, updating rules of describing vectors are as follows:
⍺ [ ] and for we will have:
Where is a Gaussian function with variance . There are three parameters used in updating rules ⍺ , ⍺ and . Learning rates ⍺ , ⍺ are employed to specify the extent to which describing vectors are curved toward the observed samples. These two parameters also affect the convergence of describing vectors toward training samples. Variance specifies the local neighborhood of an observed sample that updating rules are allowed to expand their impacts. This parameter is set to be large for sparse datasets to cover more regions and is set to be small in dense datasets. The parameters setting plays an important role in performance as well as convergence speed. In sparse datasets to increase the output precision, these parameters can be chosen to be large. With small parameters, multiple epochs learning, like artificial neural networks, is also available in this algorithm that can result in higher precision. These parameters can be set either by trial and error or by optimization algorithms such as Genetic Algorithm. Fig. 10 shows an example of updating describing vectors in a scenario where two training samples are observed.
The proposed algorithm provides a novel description of the IDS plane by introducing three vectors. Despite original IDS algorithm that requires a memory matrix to store the information of each IDS plane, in the proposed algorithm three vectors with length are sufficient. This results in considerable decrease in memory usage. In the test phase, it is sufficient to read the associated elements of three describing vectors and then fed these values to the inference engine of ALM algorithm. In this section learning phase and convergence of the algorithm is presented through an illustrative example. As it was discussed in this section, the (24)).
ink drop spreading process has been removed and thus its computational complexities have been avoided. In the rest of paper, we use the term "Fast ALM" in order to refer to ALM algorithm whose IDS unit is replaced by Fast IDS unit. The next section deals with Memristor-CMOS implementation of our proposed algorithm and advantages of our proposed algorithm will be presented by comparing its circuit complexity with that of traditional implementations.
V. HARDWARE OF THE PROPOSED ALGORITHM
In this section the hardware implementation of our proposed algorithm is presented. The proposed hardware has units for updating IDS planes (learning phase) as well as Feature Extraction Unit (inference or test phase). As mentioned in previous sections, memristor is a memory element that is suitable for storing analogue values. Fig. 11 shows a memristor crossbar structure with two rows of memristors for storing two describing vectors and . Despite hardware implementation proposed in [21] (Fig. 8) , there is no need for resistors. Like hardware implementation proposed in [22] , in order to provide symmetric blurring effect, there is no need for resistive ladder structure that leads to some complexities in hardware implementation. In our implementation, in order to change the values of memristors for each input point it is sufficient to apply voltage with amplitude proportional to the desirable memristance change. Based on the user-defined variance, adjacent memristors also take effect by applying this voltage signal.
Learning algorithms are comprised of two main phases; learning phase and test phase. The proposed hardware should have circuits to perform learning and test phases as well. In the next subsection, circuit required to perform learning phase is presented.
A. Initialization
In learning algorithms, the first step is to learn from the training set. As it can be seen from Fig. 12 the proposed circuit is symmetric where the Fig. 12(a-1) and Fig. 12(a-2) circuits store the upper-bound and the lower-bound of the output respectively. Storage Unit also has a demultiplexer with address line for choosing input and its neighbors (m memristors as shown in Fig. 12 ) to apply the blurring effect. Like other learning algorithms, parameter initialization is the first step of our proposed algorithm; thus, the initial memristance of all memristors is set to be between the maximum and minimum resistance . In what follows, initial resistance of each memristive memory array is computed. Each memristor in this structure is connected to the negative terminal of an Op-Amp. This Op-Amp computes the weighted sum of input signals. This weight or gain is equal to the negative ratio of feedback resistor of the Op-Amp to memristor impedance. The feedback resistance in the first stage is equal to , therefore the gain of this stage is . If the output voltage of this stage is we have: (25) , and for the second stage we have: (27) Initially, the output node should be set to its maximum value, which corresponds to the lowest degree of certainty. This entails that node should have the maximum voltage and node should have the minimum voltage, zero. If in the Fig. 12(b-1) circuit and the initial value of all is assumed to be equal to , then because . . . (21) and (22) equals to .
, the initial voltage of node is equal to . By decreasing the resistance of toward , if , then the initial voltage of node becomes zero. In Fig. 12(b-2) if and if the initial value of all is assumed to be equal to , then the initial voltage of node is equal to zero. By increasing the resistance of toward , then the initial voltage of node becomes . Therefore, all memristors of the first and second rows are initialized to and respectively. Note the polarity and the connection of memristors in Fig. 12 . . This circuit has its own learning rate , which is specified by ratio as defined in (23) . In Fig. 13 , the Narrow Path Extraction Circuit is shown. In the proposed circuit, the voltage of the output node should be initialized to the half of the resolution of the IDS plane. This entails that the voltage of the node should be equal to ⁄ . In the Fig. 13(b-3) , if , and and we assume that the initial resistance of is equal to , then the voltage of node is equal to ⁄ . Increasing or decreasing the resistance of increases or decreases the voltage of node respectively. Fig. 14 shows the Connector Circuit which is comprised of Flip-Flops, AND gates, Triangular signal generator, Multiplexers and negating Op-Amp circuits. The latter circuit receives a triangular signal with constant frequency and amplitude, which is stored in a capacitor, and then generates appropriate squared signals required in the learning phase. The width of squared signals for central memristor and m signals of its neighbors (for simplicity ) are adjusted in this circuit. Fig. 15 shows the signals generated in the Connector Circuit. These signals are selected based on the polarity of the capacitor and are applied to memristors in the learning phase.
PWM
B. Learning Phase
The learning phase in the proposed hardware is as follows. When a new training sample is observed, its distance to all elements of three describing vectors are calculated and then multiplied by learning rates and to update three describing vectors, as shown in Fig. 10 . The process involved in reading memristors values and updating them is the same for each describing vector. In order to read memristance of a memristor, signal ̅ is activated and as a result, the MOSFET switch is turned on. By applying appropriate voltage signal to the targeted memristor by address line of the demultiplexer, the reading process will be started. Voltage reaches the output node with amplification gain . The output of each Op-Amp stage is as follows:
For voltage y proportional to each sample, proportional voltage should be applied, which is computed as follows:
If the MOSFET switch ̅ is activated, the capacitor holds the voltage . In the next step, by activating the MOSFET switch , the stored voltage in the capacitor is transferred to Connector Circuit. Based on user-defined variance , which defines the number of neighbor memristors to be involved in the learning process when the targeted memristor is chosen to be updated, the appropriate output signals are applied to these memristors. The Voltage obtained in (31) is equivalent to (21) , where is proportional to the output value of training sample and it is proportional to which is itself proportional to the value stored in the memristor array . This equivalency is shown in (33) .
As shown in (33) , the output voltage stored in the capacitor is proportional to the distance. Thus the proposed circuit implements the algorithm with acceptable approximation (for two other vectors we do the same as (33)).
C. Modeling or Test Phase
In the previous subsection the learning phase of the proposed hardware was presented. In this subsection, the required hardware for the test phase, when it should model previously unseen test samples, is proposed. In order to perform the inference in the proposed algorithm, it is first required that the Narrow Path and Spread values in the coordination of the test sample be extracted from three describing vectors of each IDS plane. The proposed circuit for extracting Spread is shown in Fig. 12(c-1) . We have:
The proposed hardware acts like a differential amplifier that amplifies the difference between and . In Fig. 13 (b-2) the output voltage is proportional to Narrow Path for given coordination of the test sample. These values are used in the inference engine of ALM. In the test phase, for each input value , Narrow Path and Spread are available for any time. For each input , only the associated features are extracted and the computation and storage of all values of Narrow Path and Spread are not required.
The proposed circuit performs well only when appropriate signals are applied to the memristor circuit, so the value of memristor changes in the desirable direction with desirable amplitude. Therefore, in order to read each memristor, it is sufficient to apply voltage to the column of the memristor associated with test sample and apply appropriate squared signal to its row and read the output voltage. The frequency of such a signal should be high and its width should be so negligible that by applying it to the memristor, it acts like a resistor and its memristance can be read without any change. In order to write a new value in a resistor the appropriate squared signal should have two main characteristics; the amplitude of such a signal should be greater than the threshold of memristor and it should be wide enough to change the memristance.
For higher precision more quantization levels are required. This is achieved by adding more memristor to the Storage circuit. Table I Compares the circuit complexity of our proposed algorithm (Fast ALM) and ALM [21] . Unlike ALM, in Fast ALM the number of Op-Amps is independent of the resolution of the IDS plane. Table I Also reveals that in terms of size and power consumption our proposed hardware is considerably more efficient than the hardware proposed in [21] . Furthermore, the proposed hardware has low circuit complexity and faster process time without compromising the precision.
Another advantage of our proposed hardware compared to traditional hardware is that the need for so many multipliers and adders is satisfied to a great extent. It should be mentioned that in this comparison only shared circuit elements in both hardware implementations have been taken into account and those circuit elements such as Flip-Flops, AND gates, etc. of each specific hardware are excluded.
VI. SIMULATION
In this section the functionality and performance of our proposed algorithm is evaluated on various applications. Function approximation and classification are two problems considered in this section. All simulations are performed in MATLAB 2013 on Core i5 processor, 2.4GHz, 4GB RAM. The quantization levels in all simulations are considered .
A. Function Approximation In this section, the performance of Fast ALM algorithm and its hardware implementation is evaluated on function approximation problem. Two 2-input single-output functions are considered as follows.
Fig . 16 and 17 show two functions as defined in (35) and (36) . In this proposed hardware the period of the input triangular signal in PWM generator is 10milliseconds ( ). The output of PWM generator is 3 volt amplitude and 80 percent duty cycle ( ). Simulation have been conducted in HSPICE software and memristor's SPICE model has been obtained from [39] . Finally memristor's parameters were set as follows:
, , and . To evaluate the precision and approximation error, a metric employed by Sugeno in [8] was used, which is called Fraction of Variance Unexplained (FVU). This metric is directly related to MSE so that if FVU is zero, MSE will be zero too. In (37) , the higher the precision, the closer FVU is to zero. Table II are the average output of 100 runs. In the simulation, first Fast ALM splits the two-input single-output system to two single-input single-output subsystems and the behavior of each subsystem on IDS plane is captured in describing vectors. According to desirable precision, partitioning the inputs domain can be done.
As it can be seen from the Table II , the proposed algorithm performs well in approximating complex functions. Like ALM, in Fast ALM when the training set is small (low knowledge) or IDS planes become sparse as a result of over partitioning, the approximation error rises. Whereas, when the training set is large enough, partitioning the input domain increases the precision. To illustrate more, Fig. 18 shows the learned pattern and convergence of the Narrow Path and Spread for a typical IDS plane in IDS and Fast IDS. In order to evaluate the process time and the convergence speed of Fast ALM compared to ALM, function with 2500 training samples is considered. In this experiment for two partitions on each input domain, Fast ALM and ALM converge in 0.1367 and 2.5990 seconds respectively.
Increasing the partitions to four on each input domain results in 0.2475 and 5.2281 seconds respectively. Fig. 21 shows the convergence speed with respect to the number of training samples. 
Number of Training Samples
Original ALM FAST ALM Fig. 21 The convergence speed of Fast ALM and ALM in approximating function . The convergence speed of Fast ALM is considerably higher than that of ALM. In this experiment 8 partitions are considered and the convergence speed is plotted with respect to the number of training samples. Variance for both algorithms is the same and equal to 12. In Fast ALM ⍺ ⍺ .
As it can be seen from simulations, Fast ALM achieves high precision in approximating functions. Next subsection examines the performance of FALM on classification problems.
B. Classification
In order to further examine the functionality of Fast ALM, its performance on classification problem is presented. To do so, two well-known classification problems called Two-Spiral [40] and Three-Centered-Ring are considered. Two-Spiral dataset has two classes labelled 0 and 1. When no partitioning is performed, both algorithms classify randomly and do not show acceptable result; however, with appropriate partitioning, both algorithms can learn to correctly classify the dataset. Samples of this dataset are obtained from (39) . Where is radius, is angle in radian, is the radius of spiral ring, is the initial radius and is the number of spiral rounds. Fig. 22 shows Two-spiral dataset with 196 samples overall. To evaluate Fast ALM on 2-Spiral classification, first 400 training samples are introduced to both algorithms and their performances are evaluated in classifying 600 unseen test samples and the results shown in Table III .
As another example, the problem of Three-Centered-Ring is considered. Samples are positioned on three centered rings with different radiuses. Equation (40) defines the equations from which samples of each class are obtained. In this dataset three class labels 0, 1, 2 are considered. With regard to simulation results, it is apparent that the proposed algorithm employs simpler hardware and software, while performing the learning process with higher speed and comparable precision as ALM. It should also be mentioned that because of small hardware requirements, the proposed algorithm is implementable.
VII. CONCLUSION
One of the processing tools in soft computing is ALM-IDS which is inspired by some human brain behavior. Despite its great performance in various applications such as function approximation, classification, etc., it suffers from high computation and hardware complexity that stall its wide popularity. In this paper a novel learning algorithm based on IDS operator was proposed that employs three describing vectors for each IDS plane. The proposed algorithm shows the same performance as ALM , yet with considerable decrease in The associated output of a typical input is fuzzy and any arbitrary membership function can be assigned to it with regard to describing vectors. The center of membership function is located on the Narrow Path and it is expanded according to the Spread value. In (b) for each input a Gaussian membership function is centered on the Narrow Path and its variance is considered to be proportional to the Spread value.
process time and hardware size. In this paper, the proposed algorithm was first described in full details and then two applications evaluated its performance. With regard to simulation results it can be mentioned that the proposed algorithm shows great performance and the proposed hardware enjoys smaller and simpler implementation compared to traditional implementations. Therefore, the advantages of the proposed algorithms can be summarized as follows.
1) Appropriate precision and speed. 2) Low implementation cost.
3) Considerable decrease in the number of memristors from to . 4) Qualitative description of IDS plane is replaced by quantitative one. 5) Despite artificial neural networks whose learned knowledge is not easy to represent, Fast ALM, like ALM, is pattern based and provides useful and easy to understand representation from learned knowledge. 6) Because the CMOS circuit and the memristor crossbar structure are isolated, the proposed hardware is compatible with the memristor / CMOS platform which means it can be implemented with CMOL technology. 7) Nano scale implementation of memristors consumes low power to change their memristance. 8) FPGA or ASIC implementation of ALM are not efficient in terms of hardware size and power consumption. 9) Like ALM algorithm, Fast ALM provides fuzzy output for each IDS plane, but because the upper-bound and the lower-bound are extracted, the fuzzy membership function can be defined on output values as shown in Fig. 24 .
Fast ALM satisfies the drawbacks of ALM to a great extent. The proposed algorithm provides a promising tool for various applications such as function approximation, classification, etc. In noisy datasets, multi epoch learning can be employed, which was not available in ALM. By performing multi epoch learning, each sample is introduced to learning algorithm more than once and this reduces the impact of noisy datasets.
Like ALM, Fast ALM also suffers from need to partitioning the inputs domain. Further research in this area is required to propose a novel approach to find the optimal number of partitions and their positions. Optimization algorithms like Genetic Algorithm has been used to perform such partitioning [41] .
Another challenge facing researchers in memristive systems is the high computational cost of simulating large memristor crossbar structures. Although the proposed algorithm considerably reduces the required number of memristors, if high resolution process is required, larger memristor crossbar structure requires more powerful processors to simulate such a structure. Because of the high parallelization potential of the proposed algorithm, GPUs and multi-core processors can be utilized to perform parallel processes and reduce the simulation time and computational complexity.
