Neuromemrisitive Architecture of HTM with On-Device Learning and
  Neurogenesis by Zyarah, Abdullah M. & Kudithipudi, Dhireesha
1Neuromemrisitive Architecture of HTM with On-Device
Learning and Neurogenesis
ABDULLAH M. ZYARAH and DHIREESHA KUDITHIPUDI, Neuromorphic AI Lab, Rochester
Institute of Technology, USA
Hierarchical temporal memory (HTM) is a biomimetic sequence memory algorithm that holds promise
for invariant representations of spatial and spatiotemporal inputs. This paper presents a comprehensive
neuromemristive crossbar architecture for the spatial pooler (SP) and the sparse distributed representation
classifier, which are fundamental to the algorithm. There are several unique features in the proposed ar-
chitecture that tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM
synapses is identified and a new Z-window function is proposed. The architecture exploits the concept of
synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark spots caused
by unutilized crossbar regions and supports rapid on-chip training within 2 clock cycles. This research also
leverages plasticity mechanisms such as neurogenesis and homeostatic intrinsic plasticity to strengthen the
robustness and performance of the SP. The proposed design is benchmarked for image recognition tasks using
MNIST and Yale faces datasets, and is evaluated using different metrics including entropy, sparseness, and
noise robustness. Detailed power analysis at different stages of the SP operations is performed to demonstrate
the suitability for mobile platforms.
CCS Concepts: • Computing methodologies→ Neural networks; •Hardware→ Emerging technolo-
gies;
Additional Key Words and Phrases: Hierarchical temporal memory, Spatial pooler, Sparse distributed repre-
sentation, Memristor, Neurogenesis
ACM Reference Format:
Abdullah M. Zyarah and Dhireesha Kudithipudi. 2018. Neuromemrisitive Architecture of HTMwith On-Device
Learning and Neurogenesis. ACM J. Emerg. Technol. Comput. Syst. 1, 1, Article 1 (January 2018), 25 pages.
https://doi.org/10.1145/3300971
1 INTRODUCTION
Mammalian brains process massive amounts of multi-model data for learning, memory, perception,
and cognition. All of this information is either spatial, spatio-temporal or spectro-temporal. Model-
ing such behavior in information processing algorithms can facilitate solutions to complex real-life
tasks. Hierarchical temporal memory (HTM) [12, 15] is a theoretical framework that processes
Authors’ address: Abdullah M. Zyarah; Dhireesha Kudithipudi, Neuromorphic AI Lab, Rochester Institute of Technology, 1
Lomb Memorial Dr, Rochester, NY, 14623, USA, amz6011@rit.edu.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee
provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored.
Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires
prior specific permission and/or a fee. Request permissions from permissions@acm.org.
© 2018 Association for Computing Machinery.
1550-4832/2018/1-ART1 $15.00
https://doi.org/10.1145/3300971
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
ar
X
iv
:1
81
2.
10
73
0v
1 
 [c
s.E
T]
  2
7 D
ec
 20
18
1:2 Abdullah M. Zyarah and Dhireesha Kudithipudi
spatial and temporal information by emulating the structural and algorithmic properties of the
neocortex. HTM offers features such as online learning, multiple simultaneous predictions, sparse
distributed representations, and noise robustness [13]. These properties make the algorithm attrac-
tive for a wide range of applications such as regression and classification [24, 32, 39], prediction [26],
natural language processing and anomaly detection [21, 25]. At a high level, HTM is a sequence-
memory algorithm that learns and recalls patterns of multi-variate time series data. At its core, this
is achieved through three key components: encoder, spatial pooler (SP) and temporal memory. The
input encoder constitutes the binary distributed representation of input data, whereas the SP and
temporal memory continuously transform the input data into sparse distributed representations
(SDR) and learn transitions between sequences, respectively.
Deploying the HTM algorithm on mobile and embedded devices can enable real-time prediction and
anomaly detection tasks. Specifically, the SP of HTM has critical features including fast adaptation
to changing input statistics and noise robustness that can be adopted in hardware. There are few
research groups that study the digital and mixed-signal architectures for HTM. However, HTM has
been continually evolving and most of the published architectures focus on the earlier deprecated
versions of the algorithm. The first wave of architectures were published circa 2007, that focused
on the first generation of the algorithm (Zeta). In 2007, Kenneth et al. realized Zeta-HTM on FPGA
for image recognition [32]. The model has 81 parallel computational nodes arranged hierarchically
in 3 layers and offers 148x speedup over the software counterpart. A Verilog implementation of the
single fundamental unit in HTM, a node, is proposed in 2013 [37]. The second generation of the
architectures were investigated circa 2015. Zyarah et al. [43], designed a scalable design with 100
mini-columns and demonstrated for classification with SVM. The authors also proposed a temporal
memory design for prediction [42]. In 2016, nonvolatile memory based SP implementation is
presented by Streat et al. [34], considering the physical constraints of the commodity NVRAM.
Later, a memristor-based implementation of SP is proposed by James et al. [17]. Although the
proposed design is power efficient, it lacks reconfigurability which is important for learning and
making predictions. Recently, Truong et al. presented a memristor-based crossbar to model the
SP of HTM algorithm [36]. However, due to the fact that HTM is dominated by dynamic sparse
connections, using the traditional crossbar structure leads to dark spots (unused regions) in the
crossbar. Additionally, most of these research studies do not include the hardware classifier design
which is integrated with the SP. Therefore, designing an overarching HTM SP architecture and its
associated SDR classifier for energy-constrained platforms with on-device learning supported by
dynamic interconnects is still an open research area.
This paper presents a comprehensive memristor crossbar architecture of the HTM-SP and its
associated SDR classifier. The proposed architecture incorporates several unique features that
tightly link with the HTM algorithm. A memristor that is suitable for emulating the HTM synapses
is identified and a new Z-window function is proposed. The architecture exploits the concept of
synthetic synapses to enable potential synapses in the HTM. The crossbar for the SP avoids dark
spots caused by unutilized crossbar regions and supports rapid on-chip training within 2 clock
cycles. This research also leverages plasticity mechanisms such as neurogenesis and homeostatic
intrinsic plasticity to strengthen the robustness and performance of the SP. The proposed design is
benchmarked for image recognition tasks using MNIST and Yale faces datasets, and is evaluated
using different metrics including entropy, sparseness, and noise robustness. Detailed power analysis
at different stages of the SP operations is performed to demonstrate the suitability for mobile
platforms.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:3
The rest of the paper is organized as follows: an overview of HTM is presented in Section 2. Section 3
and 4 discuss the design methodology and the hardware implementation. The experimental setup
and SP evaluation are described in Section 5 and 6. Section 7 demonstrates the experimental results.
The paper is summarized in Section 8.
2 OVERVIEW OF HTM
HTM is a sequence memory algorithm that aims at emulating the foundational principles of the
neocortex. HTM is structured from ascending hierarchical regions of cellular layers that enable
the network to capture spatial and temporal patterns. The cells in HTM are a simplified model
of the common excitatory neuron in the neocortex, known as the pyramidal neuron. Similar to
pyramidal neurons, HTM cells have hundreds of synaptic connections that enable them to recognize
independent patterns of cellular activities. The cell synaptic connections are assigned to three
integration zones, namely proximal, basal, and apical [8, 14]. Each zone is composed of either one
proximal segment or several dendritic segments. A segment, either proximal or dendritic, comprises
multiple synapses to capture the cellular activities of the space to which it is linked. The proximal
dendritic segment defines the cell’s receptive field in the input space (feed-forward input) and
sufficient activities detected on the proximal dendrites lead to the generation of a somatic action
potential. The basal and apical dendritic segments hold the synaptic connections with nearby
cells and other cells in higher levels in the hierarchy. Therefore, the basal and apical segments
are dedicated to observe contextual and feedback inputs. It is important to note that the activities
detected on the basal and apical dendrites enable the cells to make prediction via depolarizing it
slightly without causing the generation of an action potential [13].
Min-column
Proximal  
connection
Level-2
Level-1
Cell
Region
Distal 
connection 
Fig. 1. High-level architecture of HTM with two levels. The first level has two regions, and one region is
confined for the second level. Each region is structured by columns of vertically stacked cells.
The cells, in each HTM region, are arranged in a columnar organization called a mini-column.
In a given mini-column, cells share the same proximal synaptic connections, i.e. they share the
same feed-forward receptive field and stimulated by the same input. Basal segments, on the other
hand, allow for the interaction among cells within the same region as such cells learn and recall
sequences. The learning in HTM involves adjusting the synaptic connections strength which is
defined by a positive scalar value called permanence. However, this process occurs in an online
fashion which enables the algorithm to learn not only the spatial features of the input, but also
the temporal correlation between them [27]. The HTM algorithm is composed of two core phases,
namely spatial pooler (SP) and temporal memory, which are discussed in the following subsections:
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:4 Abdullah M. Zyarah and Dhireesha Kudithipudi
2.1 Spatial Pooler Model
In HTM, learning the spatial patterns in sequential data is performed by the SP. When an input is
presented to the network, it gets encoded into a set of sparsely distributed active mini-columns using
a combination of competitive Hebbian learning rules and homeostasis [7]. The sparse activation
of mini-columns represents the core feature that grants HTM algorithm appealing properties,
such as distinguishing the common features between inputs [11], learning sequences, and making
simultaneous predictions [1]. Generally, each mini-column is connected to a unique subset of
the input space using a set of proximal synaptic connections. When the synapses are active and
connected to a reasonable number of active bits in the input space, the proximal dendritic segment
becomes active. The activation of the proximal dendritic segment will nominate that mini-column to
compete with its neighboring mini-columns to represent the input. By using the k-winner-take-all
computation principle, the mini-column with the most overlapped active synapses and active inputs
inhibits its neighbors and becomes active (winner). The output of the SP is a binary vector, which
represents the joint activity of all mini-columns in the HTM region in response to the current input.
This binary vector is also known as an SDR vector. The operation of the SP can be divided into three
distinct phases: initialization, overlap and inhibition, and learning, as described in Algorithm 1.
During the initialization phase (Algorithm 1, lines 2-5), which occurs only once, all the parameters
of the regions are initialized including mini-columns’ connections to the input space, synapse
permanences, and boosting factor. Let S be an nc ×nx array which holds all the synaptic connections
that linknc SPmini-columnswithnx−dimensional input space. Now, letns be themaximum number
of potential synapses associated with each mini-column and is defined by the non-zero elements
in ®s (®s is a row vector in S) whose indexes are generated by a pseudo-random number generator.
Similarly, let ρ be a nc × nx array that describes the permanence of the potential synapses in S ,
where the permanence values are randomly initialized with a uniform distribution. After initializing
the synaptic connections, the boosting factor for each mini-column is defined to be a scalar value of
one. The initialization phase is followed by the overlap and inhibition phase (lines 7-11) in which the
feed-forward input is collectively represented by a subset of active mini-columns, namely winning
mini-columns. The selection of winning mini-columns occurs after determining the activation level
of each mini-column, called overlap score (α ). The mini-columns’ overlap scores for a given region
is computed by counting each mini-column’s active synapses that associate with active bits in the
input space. Mathematically, it is achieved by performing a dot product operation between the
feed-forward input vector (®x ) and the active synapses vector as in line 9, where the active synapses
vector is the result of an element-wise multiplication (denoted as ⊙) between S and ρ∗. ®b, here,
denotes the boosting factor that regulates mini-column activities. ®ρ∗ is a permanence binary vector
to indicate the status of each potential synapse, where ‘1’ indicates a connected synapse and ‘0’ an
unconnected synapse. Upon the completion of computing the overlap scores, each mini-column
overlap score gets evaluated by comparing it to a threshold, known asminOverlap (Oth ) (line 10).
The resulting vector ( ®eα ) is an indicator vector representing the nominated mini-columns with high
overlap scores. The nominated mini-columns compete against each other with a radius defined by
ξ to represent the feed-forward input. Based on the mini-column overlap scores and desired level
of sparsity (η), nw number of mini-columns will be selected to represent the input, as shown in line
11, where kmax is a function that implements k-winner-take-all which returns the top nw elements
within ξ . After determining the winning mini-columns (®Λ), the learning phase (lines 12-15) starts
to update the permanence values of the mini-columns’ synapses as necessary, i.e. only the synapses
of the active mini-columns are updated. The approach followed in updating the permanence of
the synapses is based on the Hebbian rule [16]. The rule implies that the connection of synapses
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:5
ALGORITHM 1: HTM-Spatial Pooling
Input: ®x ∈ Rnx{0,1} , where ®x ⊂ X and X ∈ R
nx×nm
{0,1} ; /* nm: Number of input vectors */
Output: ®w ∈ Rnc{0,1} ; /* nc: Number of columns */
1 # Initialization:
2 Sind ∼ rand.pseudo, where Sind ∈ Nnc×ns{1,nx } ; /* nx: Input vector length */
3 S[Sind ] = 1,where S and ρ ∈ Rnc×nx ; /* ns: Number of proximal connections */
4 ρ[Sind ] ∼ rand.uniform[0,1] ;
5 ®b ∈ Rnc , where ∀ b[j] = 1;
6 repeat
7 # Overlap and Inhibition:
8 ρ∗ = I(ρ≥Pth ) ;
9 ®α = ®b ⊙ [(S ⊙ ρ∗) · ®xT ] ;
10 ®eα = I( ®α ≥ Oth ) ;
11 ®Λ = kmax( ®eα ,η, ξ ) ;
12 # Learning:
13 if learning = Enable then
14 ∆ρ = ®ΛT ⊙ S ⊙ ρ∗ ⊙ λ®x − P−;
15 ®b = e−γ (a¯(t )−<a(t )>) ;
16 end
17 until t > nm ;
to active bits must be strengthened, increase their permanence by P+, while the connection of
synapses to inactive bits will be weakened, decrease their permanence by P−, as in line 14, where
∆ρ j is the change in the permanence array for all mini-columns given an input ®x , and λ denotes
the sum of P+ and P−. After adjusting the synapses permanence, the boosting factor is updated
to regulated the activities of the mini-columns, as in line 15, where a¯(t) and <a¯(t)> contain each
mini-column time-averaged activity level and its activity level with respect to its neighbor, and γ is
a positive constant controlling the adaptation pace [7].
2.2 Temporal Memory Model
The main role of the temporal memory is learning sequences and making predictions for future
inputs. The cells of the winning mini-columns are involved in this process. The active cells of
the winning mini-columns form lateral synaptic connections with the prior active cells, such that
cells can anticipate their active state by just examining the distal segments. The number of distal
segments that a cell may have depends on the number of distinct patterns that can be predicted. The
more distal segments a cell has, the more connections it can have with other cells and thereby more
patterns can be predicted. The operation of the temporal memory essentially involves activating
the cells of the winning mini-columns to model the input patterns within the context, predicting
the future cellular activities, and updating the distal synaptic permanences. As the scope of this
paper focuses on hardware implementation of the SP and its SDR classifier, the aforementioned
operation will not be discussed. However, a detailed description of the temporal memory and its
implementation can be found in [44].
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:6 Abdullah M. Zyarah and Dhireesha Kudithipudi
3 DESIGN METHODOLOGY
In spite of the fact that the HTM network, in theory, has several hierarchical levels, this aspect has
not yet been studied throughly. This work is therefore confined to study and implement only one
level/region in HTM. Using one region is equivalent to implementing only the primary sensory
region of the neocortex. In the following subsection, modeling of every aspect of the region will be
discussed.
3.1 HTM Synapse Modeling
HTM cells have a large number of synaptic connections allowing them to detect the pattern of
activities occurring in the input space and within the region. Each synaptic level of growth is
defined by its permanence value. Typically, the permanence value ranges between 0-1, where 0
indicates the absence of the synaptic connection with a likelihood to form one and 1 indicates the
full growth of the synaptic connection [13]. When the permanence value exceeds the threshold,
the synapse provides a low-impedance path to the input and vice-versa when the permanence
value is below the threshold. However, HTM synapses are binary in nature in the sense that if two
synapses permanence exceed the threshold, they exhibit the same properties regardless of their
connection strength. While this is the case, the synapse with the highest permanence is harder to
forget. In this research, memristor devices are chosen to emulate the synaptic connections in HTM.
A memristor is a two-terminal synapse-like nanoscale resistive memory. Its term was coined by
Leon Chua in 1971 [5] and the device received rekindled interest when it fabricated by HP labs in
2008 [35]. The device exhibits properties such as low-energy consumption [28], small footprint,
high integration density [18], and non-volatility [2]. These features make it an ideal candidate to
model the synaptic connections in neuromorphic chips.
The VTEAM memristor model described in [20] is used for this research. The device, essentially, is
described with two variables:w andD which define the state variable of the device and its thickness.
Changing the state of the device, i.e. its resistance value (Rmem), is considered to have an analog
nature. Thus, it is gradual and bounded between the memristor’s high resistance state (HRS ≡ Rof f )
and low resistance state (LRS ≡ Ron). The change in the memristor is a function of the voltage
applied across the device or the current through it. This work mainly focuses on the voltage driven
memristors whose resistance change can be described by Equation (1) [35] and Equation (2) [20].
Rmem =
w
D
× Ron + (1 − w
D
) × Rof f (1)
∆w
∆t
=

kof f .
(
v(t )
vof f
− 1
)αof f
. fof f (w), 0 < vof f < v
0, von < v < vof f
kon .
(
v(t )
von
− 1
)αon
. fon(w), v < von < 0
(2)
where kof f , kon , αof f , and αon are constants, vof f and von are the memristor threshold voltages,
and fon and fof f describe the device window function. In order to use the memristor device to
emulate the HTM synaptic connections, we need to have a memristor that manifests a slight drift
when it moves from the boundary toward the mid-point of the device, and as it approaches the
mid-point, the drift should be accelerated. In 2016, Brivio et al. proposed a physical memristor that
demonstrates properties, which to some extent, match HTM synapse requirements [3]. However,
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:7
0.0 0.2 0.4 0.6 0.8 1.0
W/D
0.0
0.2
0.4
0.6
0.8
1.0
W
in
do
w
 fu
nc
ti
on
foff Window
fon Window
(a)
0.0 0.2 0.4 0.6 0.8 1.0 1.2
Voltage [v]
0
1
2
3
4
5
C
ur
re
nt
 [
uA
]
(b)
1.2 1.0 0.8 0.6 0.4 0.2 0.0
Voltage [v]
30
20
10
0
C
ur
re
nt
 [
uA
]
(c)
0.0 0.2 0.4 0.6 0.8 1.0
W/D
0.0
0.2
0.4
0.6
0.8
1.0
W
in
do
w
 fu
nc
ti
on
k=0.6
k=0.7
k=0.8
k=0.9
k=1.0
(d)
0.0 0.2 0.4 0.6 0.8 1.0
W/D
0.0
0.2
0.4
0.6
0.8
1.0
W
in
do
w
 fu
nc
ti
on
off=0.2
on=0.8
on=0.7
on=0.6
(e)
0.0 0.5 1.0 1.5 2.0 2.5
Time [ms]
1.0
0.5
0.0
0.5
1.0
1.5
v(
t)
 [
v]
W/D=0
W/D=1
Input
W/D
(f)
Fig. 2. (a) Characteristic curves of the proposed Z-window function to model the HTM synapses behavior.
(b) and (c) Hysteresis characteristic curves of the memristor while driving it with a sine wave signal biased
with positive and negative DC offset, respectively. (d) A plot shows the linkage to the linear drift model
and scalability features, where the arrows to the right refer to fof f and the ones to left are for fon . (e) Non-
symmetrical behavior of fof f and fon . (f) Modulating the resistance of a memristor device equipped with the
Z-window function. The change in memristor resistance is limited to the regions where |v(t)| > |von |,vof f ,
shaded in light green and red.
modeling this device for circuit simulation requires a special window function so that it exhibits the
aforementioned properties. To the best of our knowledge, there is no memristor window function
that captures this exponential attribute of HTM synapses. Thus, we developed a window function,
called Z-window function, derived from the mathematical formulation of the sigmoid function.
The Z-window function has built-in control parameters for adjusting its characteristics and it takes
into account the memristor device boundary conditions. Furthermore, it possesses all the attributes
of an effective window function such as circumventing the boundary lock problem, providing a
linkage with linear dopant drift model, scaling the window function upward and downward [29, 40],
and modeling the non-symmetrical behavior of some memristor devices. The proposed window
function is given in Equation (3), where τ ,δ , k , and P1 are constants that control the slope of the
window function, sliding level (over the x-axis), scalability, and falling slope as it approaches either
ends of device terminal, respectively. The subscript r denotes the on and o f f subscript of the
window function. s(v) is a sign function used to make the window function not only depends on
the normalized state variable (WD ) but also on the voltage across the device and in this case the
boundary lock problem is avoided. Figure 2 illustrates the window function characteristic curves
and its hysteresis2 as simulated in Cadence using Verilog-A memristor model.
1Nominal parameters used to achieve most of the plots in Figure 2 are: τ=15, δ=0.5, k=1, and p=0.01.
2A sine wave signal has an amplitude of ±1.2v and frequency of 20kHz is used to achieve the hysteresis plots.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:8 Abdullah M. Zyarah and Dhireesha Kudithipudi
fr (w) =
k[s(v) − wD (−1)s(v)+1]p
1 + eτ (wD −δr )(−1)s (v )
(3)
s(v) =
{
1, 0 < vof f < v
0, v < von < 0
(4)
3.2 Receptive Field
The receptive field (RF) defines a sub-region in the input space to which a mini-column’s proximal
synaptic connections are tapped. This section discusses the various approaches of realizing the
RFs of the SP mini-columns. It also highlights the advantages of each approach, its constraints and
feasibility in realizing a large-scale neuromorphic chip for the HTM algorithm:
3.2.1 Memristive Crossbar. The memristive crossbar is mainly composed of perpendicular metal
nanowires sandwiching memory elements modeled by memristors [23]. The memristive crossbar
offers several advantages such as enabling the integration of a large number of memory elements
within a compact area and allowing highly-parallel vector-matrix computations. As most neural
networks are dominated by vector-matrix multiplications, this makes the memristive crossbar a
natural fit for such networks. However, the memristor crossbar structure is really beneficial for
densely connected neural networks. When it comes to sparsely connected networks such as HTM,
using the crossbar would only be possible by randomly disconnecting devices or setting them to
a high impedance state 3. Although both these approaches may result in a sparsely connected
crossbar, it is still inefficient modeling. This is because disconnecting devices requires a special
burning process, while setting them to a high impedance will not result in perfect current blocking.
Having said that, there is a research group that has explored the high impedance method to fulfill
a part of HTM’s requirements [36]. The authors suggest using a crossbar in which each column
models an HTM mini-column and the rows represent the mini-column’s synaptic connections
which are connected to the input space. For a given crossbar, the adjacent columns have to maintain
a certain level of overlap in the input space. Figure 3-(a) shows an example of adjacent columns
with two proximal connections each, connected to 4x1 input space (a slice of the presented 4x4
image). Here, it can be noted that the mini-columns C1 and C2 share the input x2 but not x3. In
spite of the fact that this method results in partially sparse connections and it enables high-speed
computation in the SP, it has several limitations. The first of which is the limited range in the
overlap that can be achieved among the neighboring mini-columns because more overlap space
implies more unused regions in the crossbar (called the "dark-spot" in the rest of the paper). Second,
it leads to current sneak path as the memristors cannot be programmed to zero conductance. Lastly,
it lacks the reconfigurability and it makes online learning, which is the most important feature in
HTM, more challenging as it requires a training circuit with feedback.
The other possible approach to achieve sparely connected crossbars is based on changing its
structure. Instead of using the regular perpendicular cross connections, a regional space to each
column is defined such that its connections can be tapped, as shown in Figure 3-(b). However, such
3There is another approach proposed in [6] to map a sparse matrix to the crossbar. It is based on decomposing the sparse
matrix into a small sub-blocks mapped separately to the crossbar. The sub-blocks with all-zero elements are excluded
from the mapping process and therefore reduce the crossbar size. However, such a process requires continuous matrix
manipulation and is infeasible for networks with on-chip training. Thus, this approach is not considered here.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:9
Column Memristor
C1 C2 C3 C1 C3 C4C2
Blocking Memristor
(a) (b)
t
Bi
na
riz
ed
 in
pu
t i
m
ag
e x1
x2
x3
x4
x1
x2
x3
x4
Fig. 3. Mini-column receptive fields modeled by a sparsely connected memristor crossbar implemented using
(a) blocking memristor (b) predefined mini-columns regional connections.
an approach may have its own challenges during the fabrication process and the same current
sneak path issue.
In general, the biggest challenge of adapting the crossbar approach in order to establish the receptive
field of each mini-column is the integration between the HTM region and the input space. Using
crossbar structure in the ways described above involves establishing hundreds of connections to the
input space. This makes HTM architecture over-dominated by the interconnects which eventually
lead to undesired noise, scaling limitations, and more power consumption. Furthermore, these
connections are rigid in nature and lacks reconfigurability, which is an essential feature to develop
an HTM network on chip.
3.2.2 Dynamic Memristive Crossbar. The principle concept of this approach is based on using a
linear feedback shift register (LFSR) and a memristor crossbar as a single entity to enable crossbar
end-terminal reconfigurability4. Due to the fact that the columns in the crossbar share the rows, a
full reconfigurability can only be achieved when the columns are separated to be one-dimensional
arrays, where each column models a mini-column in HTM. Each column is assigned its own
dedicated LFSR which is initialized by the mini-column index in the HTM region. The RF that is
generated by the LFSR can either be local or global. In the global RF, all the registers of the LFSR,
shown in Figure 4-(a), are used to generate random numbers such that the entire input space can be
seen by the mini-columns. Given a mini-column, ns number of potential synapses can be generated
by its LFSR to link it with ns locations in the input space. In the case of the local RF, the LFSR
registers are used in a partial manner. Some of them will be used to generate the random numbers
whereas the rest are dedicated to provide address shifting. Figure 4-(b) illustrates the concept of
the partially used LFSR. The registers with a colored base represent the registers that will generate
the synapses address while the rest are used for shifting. For instance, if an 8-bit LFSR is loaded
with a seed of 200, random integer numbers ranged between 192-207 can be achieved if only the 4
least significant bits (LSB) of the LFSR are used.
It turns out that this approach of generating the RF of HTM mini-columns is more expensive in
terms of resource utilization and latency in comparison to the rigid memristive crossbar discussed
previously in 3.2.1. It is, however, more realistic when it comes to scalability because there is no
restriction related to the crossbar size, or the number of interconnects being used. Furthermore, on
4This is similar to the concept of using synthetic synapses which we proposed in [43], but here we apply it to crossbar
structure.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:10 Abdullah M. Zyarah and Dhireesha Kudithipudi
Dn D3 D2 D1 D0
Encoder
Dn D3 D2 D1 D0
Encoder
(a) (b)
+
0000101010101 0101001010101
Fig. 4. (a) LFSR used to generate a global RF. (b) LFSR with partially used registers (red-base) to generate the
local RF.
one hand, it satisfies an essential requirement for HTM-SPwhich involves providing a reconfigurable
interconnect that enables implementing topologies of HTM RF, both local and global. On the other
hand, it facilitates the communication of the HTM network with the environment and reduces the
physical interconnects.
3.3 Homeostasis and Neurogenesis Plasticity Mechanisms
Homeostasis is an essential mechanism in biologically inspired networks. It prevents neurons from
being hyperactive through regulating its threshold of generating a somatic action potential [41],
namedminOverlap in HTM theory. The concept of homeostasis in HTM does not involve regulating
theminOverlap directly. Rather, it implies exciting the action potential of relatively low active
neurons through multiplying the action potential by a positive scalar value called a boosting factor.
This results in an effect similar to that of regulating theminOverlap value. The boosting in SP is
used to ensure equal likelihood for mini-columns to represent the spatial inputs in SDR form. It is
applied through stimulating the mini-columns that have not been active over a predefined time
period, i.e not frequently active with respect to its neighboring mini-columns. Consequently, low
active mini-columns can have better chance of representing the feed-forward input in future.
It turns out that using the boosting mechanism is impactful when there is a uniform statistical
distribution of information in the input space. Unfortunately, this requirement is not guaranteed
especially for visual applications unless a custom encoder is used to process all the inputs. An
example of a non-uniform distribution of information in the input space would be the usage of
MNIST images. Such non-uniformity in the input space make several mini-columns rarely active or
completely inactive. Even the use of boosting here would not cut down the number of inactive mini-
columns. Therefore, as a possible solution to overcome this issue, this paper suggests applying the
neurogenesis mechanism to HTM. Neurogenesis is a structural plasticity mechanism that suggests
’dead’ neurons be replaced with ’new’ neurons to enhance network computational capabilities [33].
Just as in homeostasis, neurogenesis can be applied via tracking the recent mini-column activities
over a predefined period of time and comparing it with respect to its neighbor. The mini-columns
that were not active frequently are considered ’dead’ neurons and should be replaced with new
ones. For a given ’dead’ neuron, this is achieved by replacing its connections with new randomly
initialized connections that are connected to different locations in the input space. Hence, the mini-
columns proximal connections will start shifting toward the most active regions in the input space
while maintaining a low number of connections to rarely active regions. Figure 5 demonstrates the
influence of using neurogenesis on the synaptic connections density in the input space. It can be
seen that when the activity in the input space is mediated in the mid-region and neurogenesis is
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:11
disabled, the connections on the sides are not involved in any computations leading to form non-
robust sparse representations. In contrast, when neurogenesis is enabled, the synaptic connections
start to move toward the most active spots in the input space and form better representations.
0
10
20
30
Sy
na
ps
es
 d
en
sit
y
0 50 100 150 200 250
Input space index
0
2000
4000
6000
Ac
tiv
ity
 le
ve
l
(a)
0
10
20
30
Sy
na
ps
es
 d
en
sit
y
0 50 100 150 200 250
Input space index
0
2000
4000
6000
Ac
tiv
ity
 le
ve
l
(b)
Fig. 5. The density of the potential synapses as linked to an input space with activity centered to the middle
region when (a) the neurogenesis mechanism is disabled (b) Neurogenesis mechanism is enabled.
Implementing the neurogenesis mechanism in hardware presents several challenges due to the lack
of reconfigurability in interconnects which models the synaptic connections. Thus, we adopted
the concept of synthetic synapses [43] to enable the reconfigurability in the interconnects and
neurogenesis in HTM neuromorphic system. The synthetic synapse concept involves generating
the synaptic connections by using an LFSR rather than using rigid connections (just as in dynamical
memristive crossbar). By using this, when a given jth mini-column is ’dead’ and replaced by jthnew
’new’ mini-column, all the connections of jth will be removed and replaced by new connections
assigned to different locations in the input space and the strength of the new connections are again,
randomly initialized.
4 SYSTEM DESIGN AND IMPLEMENTATION
The high-level architecture of the memristive HTM-SP along with the SDR classifier is shown
in Figure 6-(a). The SP architecture comprises a set of mini-columns to spatially process the feed-
forward input and a main control unit (MCU) to enable the mini-columns to interact with the
ambient environment. When the input is presented to the network, the MCU relays it to the
mini-columns such that each mini-column’s active proximal connections will be identified and
counted. The mini-columns with active proximal segments compete against each other and the
top X% mini-columns that receive most of the input are activated. The selection of the top X%
mini-columns is performed using a voltage-mode winner-take-all (WTA) circuit. After identifying
the best mini-columns that represent the input, the learning process starts. The learning process
occurs in an online fashion which grants the network the ability to adapt continuously to the input
changes. Learning is governed by a Hebbian rule and involves modulating the connections for
the active columns only. Upon the completion of the learning, the output of the SP is generated
and then passed to an SDR classifier when the network is used for classification applications (see
Figure 6-(b)). In the following subsections more details about the implementation of the SP and the
SDR classifier will be provided:
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:12 Abdullah M. Zyarah and Dhireesha Kudithipudi
SD
R
 C
la
ss
ifi
er
C1
C2
Cnc
Winner-take-all
circuit
Rs1 Rsm
H
TM
 te
st
be
nc
h
M
ai
n 
co
nt
ro
l u
ni
t (
M
C
U
)
-Tr -Tr -Tr
Columns controller
R
ow
s 
co
nt
ro
lle
r
+Tr
+Tr
+Tr
Identify class label
HTM-SP
TrEn
Vs1
Vsny
HTM testbench
HTM-SP
SDR Classifier
Input 
Image 
SP
-S
D
R
Reconstructed 
image 
0 1 9 (a) (b)
Fig. 6. (a) The high-level architecture of the proposed design, HTM-SP for sparse representation of the
feed-forward input, the SDR classifier to recognize the SDR representations, and training circuity to enable
the learning for both SP and SDR classifier. (b) High-level diagram demonstrating the input images after
being processed by each stage in the network.
4.1 HTM Spatial Pooler
The SP is essentially composed of an MCU and mini-columns. As aforementioned, the MCU
bridges the mini-columns with either the sensory input or other regions in the hierarchy and the
mini-columns are the units where the main SP computations occur. In the next subsection, the
mini-column circuit, its training circuity, and the WTA unit are discussed in more details.
4.1.1 Mini-Column. Figure 7 depicts the architecture of the SP mini-column. The mini-column
is modeled by three units named: peripheral unit, proximal unit, and WTA cell. The peripheral unit
models the part of the mini-column in which the proximal connections are generated and connected
to the input space. The proximal unit and WTA cell hold the proximal connection permanences
and a contesting unit that enables each mini-column to compete with its neighbors for the input
representation, respectively. The input to the mini-column is generated by the network encoder
(modeled by the network testbench). The encoder task, in this work, is confined to binarizing the
images and slicing them into small patches to minimize data movement and the required storage
units. Sequentially, each patch is presented to the mini-column and stored into an Addr_Reд.
Meanwhile, the LFSR generates a random number indexing the observe patterns activities in the
feed-forward inputs.
Given an input image of size 32x32, it is sequentially fetched to the network in the form of patches,
where each patch is a 32-bit row vector. When the input patch is stored in the Add_Reд and the
LFSR generates an address for a location in the received patch, a matching score is stored in a
synapses register which is modeled by ns × 1 serial-in-parallel-out shift register. Once all inputs
are received, the output of the synapses register is fetched to the word-line of ns × 1 memristive
crossbar where the proximal synapse permanences are stored. The input voltages to the crossbar
will be converted in form of current through the memristor and the output is collected at the
crossbar bit-line. The output of the crossbar which modulates the mini-column overlap score to
current is then boosted. Boosting is done via the usage of a sense memristor (Ms ). The boosting
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:13
D Q
Q
D Q
Q
D Q
Q
Mpns
Ms
VDD 
V
Decoder
=
En
3
Addr_Reg
LFSR
Local_CU
Packet_CNT
WTA  cellProximal unitPeripheral unit
From
MCU
32
Mp0
T1 T2
T3 T4
T5
Ic
TrEn
Λ[j]
Vx
SIPO_Reg
clk
α
Fig. 7. The circuit diagram of a mini-column in an HTM region. It consists of a peripheral unit in which the
proximal connections are generated and connected to input space, a proximal unit to store the connections
strength, and a WTA cell to enable the mini-columns to compete for input representation.
factor is inversely proportional to Ms (≡ 1дs ) conductance as such decreasing дs value leads to
increase the boosting factor and vice versa. The output at this point, Vα j , is given by Equation (5):
Vα j =
ns∑
i=1
дi Vi
дs +
ns∑
i=1
дi
(5)
Vα j ≈
1
дs
ns∑
i=1
дi Vi , if
дs
ns∑
i=1
дi
>> 1 (6)
where дi (≡ ρi, j ) indicates the conductance of ith memristor and Vi is the ith input voltage. Vα j
(≡ α j ) denotes the jth mini-column overlap score. Upon the completion of computing the overlap
score, its value, which is sampled by the sense memristor, is then presented to a WTA circuit. The
WTA performs a kmax5 operation on Vx j followed by a thresholding, to generate the final jth
mini-column output, (Λj ), as given in Equation (7).
Λj =
{
1, Vx j > Vth , where Vx j = f (Vα j )
0, Otherwise
(7)
The minimum input to the WTA circuit should be no less than 0.2v so that a cell is activated. This
requirement implicitly realizes the concept ofminOverlap in HTMwhich implies that mini-columns
overlap score should be large enough to enable it for competing against other mini-columns to
represent the input. The output of the WTA cell indicates the mini-columns status, where logic ’1’
refers to a winner. Selecting the winners is followed by the learning process in which each winning
mini-column synapses are adjusted in response to the stimulated feed-forward input and according
to Hebbian learning rules.
4.1.2 Winner-take-all Circuit. The WTA cells are utilized as apart of the mini-columns circuit to
select the winners in each local (or global) cluster and in the SDR classifier to identify the winning
class labels. Figure 8-(a) depicts the WTA circuit which models a simple local competitive algorithm
which is naturally imposed through Kirchhoff current law (KCL). Each branch in the circuit has an
5The function f (≡ kmax ) is computed in the next winner-take-all, section (4.1.2).
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:14 Abdullah M. Zyarah and Dhireesha Kudithipudi
NMOS transistor (T1) to capture the input signal (= Vα in the mini-column circuit and Vs in the
SDR classifier) of one competitor. The competitors interact with each other through the shared
point Vc . When inputs are presented to the circuit, the potential of Vc follows the input with the
highest voltage and turns off all the other transistors. The cell conveying most of the bias current,
Ic , is identified as a winner. Given that all the transistors operate in subthreshold regime, applying
an input voltage VG j at the gate of the transistor in the jth branch results in a current Ij , which can
be approximated by Equation (8) [31]:
Ij = Io (W
L
) e
VGSj
nUT (8)
where Io is the zero-bias current for the given device, WL is the transistor channel width to length
ratio, UT and n indicate the thermal voltage and the subthreshold slope coefficient, respectively.
For the given circuit with nk branches, according to KCL, the branches’ current should sum up to
Ic , as given by Equation (9). By using Equation (8) and Equation (9), we can solve for the current
flowing in each branch as shown in Equation (10), which is identical to the softmax function.
Ic =
nk∑
j=1
Ij (9)
Ij = Ic
e
VGj
nUT
nk∑
j=1
e
VGj
nUT
(10)
Recall that the output of the SP is a voltage and is represented in a binary sparse form. Thus,
we designed the WTA to be a voltage mode circuit. In order to maintain the same normalized
exponential relationship between the input and output (described in Equation (10)), the current in
each branch is sent to a current comparator via a current mirror formed by T3 and T4, as shown in
Figure 8. The mirrored current is compared to a fixed reference current resulting in a voltage drop
across the point Vx j , which can be calculated using Equation (11):
Vx j =
1
λ5
[ 2Ai I3
β5(VGS5 −Vth5 )2
− 1
]
(11)
where λ is the channel-length modulation, β is the transconductance parameter, Ai and Vth de-
note the current mirror gain between T3 and T4 and transistor threshold voltage. By substituting
Equation (10) in Equation (11), the output node, Vx j , is calculated in Equation (12):
Vx j = ψ5Ic
e
VG5j
nUT
nk∑
j=1
e
VGj
nUT
− 1
λ5
(12)
Due to the fact that ψ5 is approximately constant and is given by 2Aiλ5β5(VGS5−Vth5)2 , Equation (12)
indicates that the output voltageVx j for branch j has a normalized exponential relationship with the
input VG j . Such relation has a unique benefit for WTA circuit because it maximizes the difference
between the inputs. It generously rewards the input with the highest value and punishes the losing
ones. Most of the power consumption is dominated by the winning cells which are low in number
compared to losing cells.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:15
Im
VDD
Ic
T1 
Vc
Vx
T4 T3 
T5
Tf
WTA  
cell
Vnk
Vy
nk
Vy
V1
(a) (b)
Fig. 8. (a) WTA circuit (nk number of cells) with local excitatory feedback., (b) A waveform diagram demon-
strating the WTA circuit operation while presenting three sine waves with 75° phase shift between them as
inputs. The input with maximum amplitude for a given period is shaded with light gray.
It is important to notice here that unlike most other WTA circuits, in literature [19, 22, 30], all
outputs are buffered to provide enough driving capabilities when transmitting signals across long
distances. Also, few of the previous WTA circuits are endowed with a hysteresis mechanism to
increase network stability and prevents the selection of a potential winner unless they are strong.
Due to the fact that the hysteresis is achieved via a local excitatory feedback, some of these circuits
require a reset process to any competition as in [10]. In the proposed WTA circuit, the hysteresis
characteristic is introduced via the positive feedback formed by the transistor Tf . Additionally,
having a current comparator improves the stability further as it imposes a threshold current that
needs to be crossed to switch cells status. The other advantage of using the current comparator is
that it enables more than one winner, which is a desirable feature especially in HTM as it allows
controlling the network output sparsity level.
4.1.3 Mini-column Learning Circuit. The learning in HTM is performed in an online fashion and
it involves modulating the synaptic permanence of the winning mini-columns only. As aforemen-
tioned, the proximal synaptic connections of each mini-column are emulated by one-dimensional
memristive array. Therefore, the training here can be performed simultaneously. By using a mod-
ified Ziksa unit [45], training each mini-column synaptic connections can be performed in two
clock cycles. After computing the mini-column overlap scores, the synaptic connections that were
connected to active bits in the input space has their D-Flip-Flop (DFF) set to high and vice-versa for
the synapses connected to inactive bits. All the DFF outputs are buffered with a modified NOT gate
that generates a logical level output during the normal operation and a training voltage during
the learning phase. When the TrEn signal is generated (active-low), the positive terminals of the
memristors will be connected toVT r if the output of the DFF is high andGND otherwise. The other
terminal of the memristor will be controlled by Tr1 and Tr2. During the first cycle of training, Tr1
is set to ON by Tune+. If DFF output is low, this causes a voltage drop across the memristors that
needs to be adjusted to exceed the threshold leading to an increase in its resistance. During the
second clock cycle, the same procedure will be applied but in the opposite manner6.
6The training circuit is verified while considering memristor device variability for resistance range and threshold voltage.
Variability with normal distribution has been considered with mean defined by device parameters and standard deviation
(STD) equals to 10% of the mean for resistance and 5% of the mean for threshold voltage.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:16 Abdullah M. Zyarah and Dhireesha Kudithipudi
D Q
Q
D Q
Q
D Q
Q
VTR VDD
TrEn
To 
Memristor 
VTR
Tune+
Tune-
DFF output buffer
V , to WTA
Circuit
Tr1
Tr2
Ms
Mpns
SIPO_Reg
TrEn
From
Peripheral unit
α
Q0
Q1
M0
M1
(a) (b)
Fig. 9. (a) The training circuit of the proximal synaptic connections in an HTM mini-column, (b) A waveform
diagram demonstrating the operation of the training circuit during the testing period (shaded in light gray)
and training period.
The downside of using the inverters of DFFs in conjunction with Ziksa unit is that the network will
suffer from the sneak path issue especially during the learning phase. However, this issue can be
overcome by buffering the output of DFF with a tri-state buffer rather than an inverter gate. Using
a tri-state buffer allows the memristors that are not involved in the training process 7 to be floating
such that it does not draw any current. Figure 10 covers the possible scenarios for the sneak paths
when a NOT and tri-state buffers are used.
VTR
VTR
VTR
VTR
VTR
VTR
VTR VTR
OFF ON
F
VTR
VTR
ON
OFF
ON OFF
F
F
ON
OFF
Sneak current Training current F: Floating terminal
NOT_Buffer (a) NOT_Buffer (b) Tri_Buffer (a) Tri_Buffer (b)
Fig. 10. The possible scenarios for the current sneak paths when a DFF is buffered with (a) A NOT gate, (b) A
Tri-state buffer to drive the proximal connection memristors.
4.2 SDR classifier
The SDR classifier recognizes the SDR combinations as generated by the SP and generates the
predicted class label. It is built using a softmax circuit (≡WTA circuit shown in Figure 8) with
ny number of units, where ny denotes the number of class labels that needs to be recognized. All
the SDR classifier units are interconnected with the SP mini-columns in a dense manner through
weighted connections,ϖ ∈ Rnc×ny . Initially, the weighted connections of the classifier are randomly
initialized and then tuned according to the delta rule, given in Equation (13), where ϖ j,k is the
7The memristors that are not involved in the training are those that need to be decremented during the first clock cycles or
those that need to be incremented in the second clock cycles of training.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:17
weight of the connection between the kth classifier unit and jth SP mini-column, σ is the learning
rate. ®Λj denotes the jth mini-column output, and y∗k and yk are the predicated and expected class
labels, respectively. It is important to mention here that the delta rule can be applied using Ziksa
training circuitry as described in [45].
∆ϖk, j = σ × ®Λj × (y∗k − yk ) (13)
y∗k =
e
®Λ× ®ϖi
ny∑
i=1
e ®Λ× ®ϖi
(14)
5 EXPERIMENTAL SETUP
5.1 Device Parameters
The Verilog-A memristor model, VTEAM [20] is mapped to the physical memristor device by
Brivio [3]. The device resistance range is opted to fulfill the design constraints and to ensure proper
operation. The low resistance state (LRS ≡ Ron) is chosen to be 200kΩ and the high resistance
state (HRS ≡ Rof f ) is 5MΩ such that, sufficient amount of input current causes a voltage drop
of ≈0.85v8 across the sense memristor. This range of memristor resistance also minimizes the
power consumption of the crossbar array. Given the technology node and the supply rail, from the
mini-column circuit (Figure 7), the sense memristor range can be estimated as in Equation (15)9.
Here, the minimum value of the sense memristor is chosen such that it does not bring Vα to more
than 0.3v even when the inputs are set to high. On the contrary, activating three-quarter of the
proximal connections is enough to bring the voltage drop across the sense memristor to maximum
(≈0.85v).
Ms =
∑ns
i=1 Mi[ ns∑
i=1
Vmi
Vα
− 1
] (15)
Table 1. The device parameters used to simulate the SP and its SDR classifier.
Parameter Value
Proximal memristor range 200kΩ - 5MΩ
Memristor threshold ±1v
High-Frequency clock 50 MHz
Training voltage ≈1.2 v
Sense memristor range 20kΩ-80kΩ
8This is when a mini-column overlap score exceedsminOver lap
9Parameter to findMsmin : дi=2e-7
1
Ω , ns=32,Vs=0.2v, ∀Vmi = 0.9v . Parameters to findMsmax : дi=5e-6 1Ω , ns=24,Vs=0.9v,∀Vmi = 0.9v
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:18 Abdullah M. Zyarah and Dhireesha Kudithipudi
Recall that the proposed HTM-SP network offers online learning. However, in order to test the
system for generalization, the learning might be turned off. Since a threshold voltage memristor
model is adapted, a threshold voltage of ±1v is used to disable the device undesired adjustment
when the learning is disabled. Upon the completion of setting the mini-columns’ parameters, the
system parameters are defined such that system offers real-time data processing, online learning,
and minimum power consumption. The system parameters are reported in Table 1.
5.2 SP Network Setup
In order to set the optimal network parameters for the utilized verification benchmarks, the
particle swarm optimization (PSO) [9] algorithm is used. The algorithm is integrated with the
software module of the SP and the search space is defined within a range that meets the hardware
constraints. The search space of the optimal hyper-parameters is observed by using 50 particles
randomly initialized within the predefined range, and the algorithm runs over 100 iterations. The
optimization is applied only for the SP network and the evaluation of the SP for any given hyper-
parameters is performed using the SDR classifier, the highest accuracy of which represents the
optimal point. For three separated runs and different benchmarks, we run PSO to get the optimal
hyper-parameters that result in the highest recognition accuracy. The hyper-parameters that are
included in the search space are: the number of winning mini-columns which impacts the SP
sparsity level,minOverlap which influences the sparsity level and SP noise robustness, permanence
parameters which determine the learning and forgetting rate, and the proximal segment size which
controls each mini-column overlap level with the input space. Table 2 lists the hyper-parameters
search space and the optimal values for each benchmark.
Table 2. HTM-SP parameters for the suites benchmarks.
Parameter Range MNIST YaleFaces
Number of winning mini-columns 5-40 40 16
MinOverlap 2-25 3 20
Permanence threshold 0-0.8 0.52 0.5
Permanence increment (P+) 0-0.2 0.01 0.1
Permanence decrement (P−) 0-0.2 -0.01 -0.15
Proximal segment size 10-500 32 250
6 SPATIAL POOLER EVALUATION METRICS
The performance of the SP is evaluated using the metrics defined in [7], the sparseness and entropy,
which reflect the sparsity level of encoding and efficient use of the available resources (mini-
columns) in the encoding process. The dataset used during the evaluation is suggested in [7] and
consists of a set of random vectors with sparsity level varies between 2% to 20% (it is called random
dataset in the rest of the paper). In the following subsections, each evaluation metric is discussed in
detail.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:19
6.1 Sparseness
Sparseness (η) defines the activity of the mini-columns overtime to ensure the fixed sparsity level in
the SDR produced by the SP. Given a SP SDR output vector (®Λ), its sparsity level can be measured
by dividing the Hamming weight of ®Λ, | ®Λ|1, by the mini-columns count (nc ) in the HTM region,
and as given in Equation (16)
η =
| ®Λ|1
nc
× 100 (16)
0 50 100 150 200
Number of Input vectors
5
10
15
20
S
pa
rs
it
y 
le
ve
l [
%
]
SP input
G-SP output
SP output
(a)
0.00 0.25 0.50 0.75 1.00 1.25
Entropy bar
0.00
0.05
0.10
0.15
0.20
0.25
0.30
E
nt
ro
py
 [
bi
ts
]
0.139 0.128 0.132 0.123 0.128
Ideal
SP_tr Entropy
SP_ts Entropy
(b)
Fig. 11. (a) Sparsity level of the input samples and their corresponding SP representations, (b) The average
entropy across all the mini-columns for the random dataset when the learning is enabled and disabled.
Figure 11-(a) illustrates the sparsity level for 200 input samples before and after being treated
by the SP. It can be observed that the wide variability in the sparsity level in the input has been
significantly reduced to almost ≈2%. However, due to the fact that during this experiment the
minOverlap is set to 4, having 2% of the sparsity level is not guaranteed as each mini-column
should have at least 4 active proximal connections to get involved in the input representation. This
issue can be noticed in the negative spikes of the green line which is an indicator of a degradation
in the sparsity level of SP output (below 2%). After adding the neurogenesis mechanism, this issue is
actually reduced as seen in orange line (G-SP). This is because the mini-columns that were inactive
are replaced with new ones with a higher likelihood to be active.
6.2 Entropy
The entropy metric quantifies whether the SP uses all the mini-columns in the region or not. This
metric can be computed by summing up the binary entropy function of each mini-column in the
HTM region and as given by Equation (17) [7], where E is the mean SP entropy and P(aj ) indicates
the average activation frequency of the jth mini-column across nm inputs. Figure 11-(b) illustrates
the average entropy10 of the SP for the random dataset when the learning is enabled and disabled.
It can be noticed that enabling the training in the network leads to improve the entropy from 0.123
bits/mini-column to 0.128 bits/mini-column (the maximum possible entropy that can be achieved
for the same network setup is 0.139 bits/mini-column). This means that the fraction of mini-columns
that are not involved in representing the input before learning became much more active after
training. Furthermore, when the neurogenesis mechanism is enabled, further improvement in
10The mean entropy increases when the mini-columns are equally activated due to the SP fixed sparsity constraints [7].
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:20 Abdullah M. Zyarah and Dhireesha Kudithipudi
the entropy has been achieved for both training and testing results (bars with black boxes). This
attributes to the continuous mini-columns renewing as a function of their activities.
E =
nc∑
j=1
−P(aj )loд2P(aj ) − (1 − P(aj ))loд2(1 − P(aj ))
nc
(17)
P(aj ) = 1
nm
nm∑
t=1
Λtj (18)
7 EXPERIMENTAL RESULTS
7.1 Image Recognition
SP performance is evaluated on the image recognition task which is conducted using several
benchmarks including MNIST11 and Yale faces12 datasets. For MNIST, all the images are resized
from their original size, 28x28, to 32x32 pixels. Then, all images are binarized by thresholding prior
to introduce them to SP. The same process is applied for Yale faces, but here the images are cropped
using the face detection Open-CV python library prior to the resizing. The binarization, here,
is performed using adaptive thresholding to preserve most image details during the conversion
process.
In separate experiments, the data is introduced to the SP as training and testing sets. When the
training set is introduced, the SP learns the feed-forward input (images) in an unsupervised fashion.
Then, its output is relayed to an SDR classifier implemented by the winner-take-all circuit. The SDR
classifier, here, is trained in a supervised fashion using the delta rule. Then, the learning is disabled
in both SP and SDR classifier and the testing set is presented to the network. The results, shown
in Table 3, demonstrate that the network is able to classify the SDR representation generated by the
SP with a testing accuracy of 90.33% for MNIST. In case of Yale faces, due to the limited available
training samples, the same training set is presented to the network several times and the resulting
accuracy for testing, averaged over 10 runs, is 86.86%.
In an attempt to compare our results with previous implementations of the SP, for MNIST, it is
found that although our network is smaller in size, it still offers a comparable accuracy to other
implementations. In case of our previous work, in which 100 mini-columns are used, the high
accuracy mainly attributes to the use of high-performance classifier, support vector machine (SVM).
When it comes to other sparsity classifiers such as the locally competitive algorithm (LCA), SP+SDR
classifier still outperform this classifier with ≈ 0.33%. For Yale faces, SP+SDR classifier outperforms
the smooth-marginal fisher analysis (S-MFA) and offers higher average accuracy than the SP
implementation in [17] which did not consider the entire dataset during the training and testing.
11MNIST is the standard benchmark for hand-written images. It has grayscale images of 28x28 pixels associated with 10
classes for numbers from 0 to 9. The images are split into 60,000 training examples and 10,000 testing examples.
12Yale faces dataset contains 165 grayscale images corresponding to 15 subjects, 11 images each. The images are taken under
different conditions and variations including illumination effects, facial expression, etc. In this work, the set is randomly
split into training and testing examples, where the training examples contain 8 samples from each subject and 3 samples are
used for testing.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:21
Table 3. Summary of image recognition accuracy for various datasets using HTM-SP and other algorithms.
Work No. of columns Classifier Accuracy (%) ± STD
M
NI
ST
F-HTM [34] 784 SP + SVM (Linear kernel) 91.98
Memristive-LCA [38] 300 LCA 90.0
Digital-HTM [44] 100 SP + SVM (RBF kernel) 91.16
Crossbar HTM [36]c 1024 SP+X ≈90.5
This work 484 SP+SDR classifier 90.33 ± 0.17d
Ya
le
Fa
ce
s Memristor HTM [17]b - SP+XOR classifier 86.67
Smooth-MFA[4]a - S-MFA 81.1
This work 1024 SP+SDR classifier 86.86 ± 3.82d
a Software implementation.
b In [17], the SP parameters to achieve the aforementioned accuracy are not included. Also, the reported
average accuracy is when the network is tested separately only on emotions, light conditions, facial
expressions portions of the dataset.
c In [36], 95% classification accuracy is reported for using 4096 mini-columns, but it is not mentioned if this
is for a hardware implementation. Furthermore, the authors did not mention the type of classifier used
with the SP, for this reason, we do denote it by X.
d The high-level simulation model that we developed to model HTM network has accounted for memristor
device variability and cycle-to-cycle write variation. The device variability here is confined to device
resistance range which is emulated as a variation in the weight range, while the cycle-to-cycle write
variation is modeled by adding noise to the learning rule.
7.2 Noise Robustness
In order to quantify the noise robustness of the SP+SDR classifier for image recognition applications,
two experiments are performed. The first of which involves classifying MNIST images in the
presence of noise and the second one involves interrupting the training process by injecting
random SDRs. For the first experiment, the SP+SDR classifier is trained on clean training MNIST
images and tested with noisy test images. The noise here is added by flipping the image pixels
randomly. The noise level is defined by the percentage of the flipped pixels in an image. For a
noise level ranging between 0% to 10%, both the SDR classifier (≡ softmax classifier) and SP+SDR
classifier are tested separately. Figure 12 demonstrates the drop in recognition accuracy as both
classifiers are tested with corrupted MNIST images with various noise levels. It can be observed
that the SP+SDR classifier was able to handle the noise with a graceful degradation in accuracy in
comparison to the SDR classifier which its accuracy dropped to 37.7%, when 10% noise is added to
the images.
In the second experiment, the SP is used to generate the SDR representations of MNIST training
and testing sets. Then, the SDRs representations are presented to the SDR classifier during the
training and testing processes, where one cycle of training and testing is considered as one epoch.
MNIST SDRs are presented to the SDR classifier for 100 epochs mediated by noise injection. The
noise here consists of a set of random SDR vectors with sparsity level similar to that in MNIST SDR
vectors as generated by the SP. 10,000 noise vectors are generated and injected in parts after the
SDR classifier settles to a reasonable accuracy level (≈ 90.38). Between epochs 50-55, random 1000
SDR vectors are presented to the SDR classifier, between epochs 65-75, 5000 random SDR vectors
are used, and between epochs 80-95, 10,000 random vectors are used. Figure 12-(b) illustrates the
drop in recognition accuracy when the data streams are replaced by a stream of noisy vectors and
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:22 Abdullah M. Zyarah and Dhireesha Kudithipudi
0 2 4 6 8 10
Noise level [%]
30
40
50
60
70
80
90
100
R
ec
og
ni
ti
on
 A
cc
ur
ac
y
SDR classifier only
SP + SDR classifier
(a)
0 20 40 60 80 100
Epoch
0
20
40
60
80
100
R
ec
og
ni
ti
on
 A
cc
ur
ac
y
   Input:     
 Random Noise
Input: MNIST   
Training Accuracy
Testing Accuracy
(b)
Fig. 12. (a) Recognition accuracy of MNIST dataset classified with SDR classifier and SP+SDR classifier in
the presence of a noise level range between 0% and 10%. (b) Noise robustness of the SP+SDR classifier when
presenting MNIST dataset as a stream of data mediated by noisy information.
the fast recovery after the noisy vectors are removed. The fast recovery, here, attributes to training
the SDR classifier on sparse inputs which makes the likelihood of adjusting the critical connections
(i.e. weights) less likely. This, consequently, shows that the degradation in classifier performance,
even after removing the noise vectors, is almost negligible.
7.3 Power Consumption and Area
Figure 13-(a) shows the total power consumption13 of the SP and the SDR classifier during the
training and testing processes when the network is used to recognize MNIST digits, running
at 50MHz. For a single iteration14, initially, the power consumption resides at 18.72mW until
each mini-column activates its corresponding proximal unit. During the training, the proximal
connections are driven by the training voltage (1.2v) rather than the testing voltage (0.9v) to adjust
the strength of the connections leading to greater power consumption. However, since the use of
a proximal unit takes 2 clock cycles, we do not see this abrupt increase in the power last long. It
is important to mention here that the SP and SDR classifier are working simultaneously and in a
pipelined fashion. Thus, when the SDR classifier is used for inference, there is around 0.994mW of
power consumption which then degrades to 0.121mW during the training as only one crossbar
column is trained in 2 clock cycles. Finally, when the SP communicates with the testbench according
to the hand-shake protocol to receive a new patch, the network experiences a sudden drop in
power (last few cycles) as most system units are disabled. In Figure 13-(b), the total average power
consumption of each unit in the SP design is illustrated. From here, we can see that the power
13The power consumption of the proposed design is evaluated in Cadence (analog blocks) and Synopsys (digital blocks) for
65nm technology node. The digital units are integrated with a testbench to fetch the training images to the network and
their loads are emulated by D-FFs. Initially, we run the circuit for 1ms to capture the circuit switching activity. Then, we use
Synopsys tools to measure the power consumption of the digital units. In case of the analog blocks, a crossbar network and
mini-column training circuitry with memristor proximal connections serve as a testbed. During the testing, we consider an
input voltage of 0.9v, whereas 1.2v is used during the training. It is important to mention here that while measuring the
power consumption, all the memristors are replaced by resistors and we consider the worst case scenario, (all crossbar
memristors are set to low resistance state). Then, the power consumption is estimated using Cadence ADE tools by covering
all the input combinations and averaging the results. For the area estimation, we used Synopsys tools to estimate the area of
the digital units and the analog units area is estimated from the physical layout designed in Cadence.
14Iteration in this context refers to the time starting from fetching an image to the network until it gets recognized. It also
includes the training time of the SP and the classifier.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:23
consumption of the mini-columns is dominating the network. It can be reduced further by lowering
the operating frequency and if memristors with higher resistance are used. Finally, the network
(SP+SDR classifier) setup used to classify the MNIST dataset consumes an average total power of
18.47mW with an area estimate of 0.513mm2.
0.00 0.25 0.50 0.75 1.00 1.25
Time in us
0
10
20
30
40
50
60
Po
w
er
 [
m
W
]
Proximal 
 unit is on 
      Testing 
 SDR classifier
SP_Tr
SP_Ts
SDR classifier
(a)
Mini-
col.(train)
Mini-
col.(test)
MCU WTA*
Unit
0.05
0.10
0.15
0.20
Po
w
er
 [
m
W
]
0.131
0.115
0.0264 0.0349
(b)
Fig. 13. (a) The average total power consumption of the SDR classifier and SP during training and testing
when the network is used to classify MNIST dataset (nc=484, ns=32). (b) Average total power consumption of
each unit in the SP (WTA size = 1000 cells).
8 CONCLUSIONS
This paper proposes a memristor-based architecture for the HTM spatial pooler and its SDR classifier
for mobile devices and energy constrained platforms. The proposed design enables high-speed
computations, low power consumption, and reconfigurability, all in a single entity that has the
capability to recognize images even in the presence of noise. The proposed design is implemented
using 65nm technology node and verified using various datasets including MNIST (accuracy 90.33%)
and Yale faces (accuracy = 86.86%). It is found that the network exhibits a strong robustness to
noise especially from the classifier side as it is trained with SDR representations. Furthermore,
during the power consumption analysis, it is observed that the power consumption while using
the proximal unit is approximately tripled. However, limiting the use of this unit to 2 clock cycles
significantly reduces the overall network power consumption to 18.47mW.
ACKNOWLEDGMENTS
The authors would like to thank the members of the Neuromorphic AI research Lab at RIT for their
support and critical feedback. The authors also would like to thank the reviewers for their time
and extensive feedback to enhance the quality of the paper.
REFERENCES
[1] Subutai Ahmad and Jeff Hawkins. 2015. Properties of Sparse Distributed Representations and their Application to
Hierarchical Temporal Memory. arXiv preprint arXiv:1503.07469 (2015).
[2] Julien Borghetti, Gregory S Snider, Philip J Kuekes, J Joshua Yang, Duncan R Stewart, and R Stanley Williams. 2010.
Memristive switches enable stateful logic operations via material implication. Nature 464, 7290 (2010), 873–876.
[3] S Brivio, E Covi, A Serb, T Prodromakis, M Fanciulli, and S Spiga. 2016. Experimental study of gradual/abrupt dynamics
of HfO2-based memristive devices. Applied Physics Letters 109, 13 (2016), 133504.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
1:24 Abdullah M. Zyarah and Dhireesha Kudithipudi
[4] Deng Cai, Xiaofei He, Yuxiao Hu, Jiawei Han, and Thomas Huang. 2007. Learning a spatially smooth subspace for face
recognition. In Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Conference on. IEEE, 1–7.
[5] Leon Chua. 1971. Memristor-the missing circuit element. IEEE Transactions on circuit theory 18, 5 (1971), 507–519.
[6] Jianwei Cui and Qinru Qiu. 2016. Towards memristor based accelerator for sparse matrix vector multiplication. In
Circuits and Systems (ISCAS), 2016 IEEE International Symposium on. IEEE, 121–124.
[7] Yuwei Cui, Subutai Ahmad, and Jeff Hawkins. 2017. The HTM Spatial Pooler −A Neocortical Algorithm for Online
Sparse Distributed Coding. Frontiers in Computational Neuroscience 11 (2017).
[8] Yuwei Cui, Chetan Surpur, Subutai Ahmad, and Jeff Hawkins. 2016. A comparative study of HTM and other neural
network models for online sequence learning with streaming data. In Neural Networks (IJCNN), 2016 International
Joint Conference on. IEEE, 1530–1538.
[9] Russell Eberhart and James Kennedy. 1995. A new optimizer using particle swarm theory. InMicro Machine and Human
Science, 1995. MHS’95., Proceedings of the Sixth International Symposium on. IEEE, 39–43.
[10] Alexander Fish, Vadim Milrud, and Orly Yadid-Pecht. 2005. High-speed and high-precision current winner-take-all
circuit. IEEE Transactions on Circuits and Systems II: Express Briefs 52, 3 (2005), 131–135.
[11] Peter Földiak. 1990. Forming sparse representations by local anti-Hebbian learning. Biological cybernetics 64, 2 (1990),
165–170.
[12] Dileep George and Jeff Hawkins. 2009. Towards a mathematical theory of cortical micro-circuits. PLoS computational
biology 5, 10 (2009), e1000532.
[13] Jeff Hawkins and Subutai Ahmad. 2016. Why neurons have thousands of synapses, a theory of sequence memory in
neocortex. Frontiers in neural circuits 10 (2016), 23.
[14] Jeff Hawkins and Sandra Blakeslee. 2004. On Intelligence: How a New Understanding of the Brain will lead to Truly
Intelligent Machines. New York: Henry Holt & Co (2004).
[15] Jeff Hawkins, Dileep George, and Jamie Niemasik. 2009. Sequence memory for prediction, inference and behaviour.
Philosophical Transactions of the Royal Society of London B: Biological Sciences 364, 1521 (2009), 1203–1209.
[16] Donald Hebb. 1988. 0.(1949) The Organization of Behavior. Wiley, New York.
[17] Alex Pappachen James, Irina Fedorova, Timur Ibrayev, and Dhireesha Kudithipudi. 2017. HTM Spatial Pooler With
Memristor Crossbar Circuits for Sparse Biometric Recognition. IEEE Transactions on Biomedical Circuits and Systems
(2017).
[18] Sung Hyun Jo, Ting Chang, Idongesit Ebong, Bhavitavya B Bhadviya, Pinaki Mazumder, and Wei Lu. 2010. Nanoscale
memristor device as synapse in neuromorphic systems. Nano letters 10, 4 (2010), 1297–1301.
[19] Tomasz Kulej and Fabian Khateb. 2017. Sub 0.5-V bulk-driven winner take all circuit based on a new voltage follower.
Analog Integrated Circuits and Signal Processing 90, 3 (2017), 687–691.
[20] Shahar Kvatinsky, Misbah Ramadan, Eby G Friedman, and Avinoam Kolodny. 2015. VTEAM: A general model for
voltage-controlled memristors. IEEE Transactions on Circuits and Systems II: Express Briefs 62, 8 (2015), 786–790.
[21] Alexander Lavin and Subutai Ahmad. 2015. Evaluating Real-Time Anomaly Detection Algorithms–The Numenta
Anomaly Benchmark. In Machine Learning and Applications (ICMLA), 2015 IEEE 14th International Conference on. IEEE,
38–44.
[22] John Lazzaro, Sylvie Ryckebusch, Misha Anne Mahowald, and Caver A Mead. 1989. Winner-take-all networks of O (n)
complexity. In Advances in neural information processing systems. 703–711.
[23] Wei Lu, Kuk-Hwan Kim, Ting Chang, and Siddharth Gaba. 2011. Two-terminal resistive switches (memristors) for
memory and logic applications. In Proceedings of the 16th Asia and South Pacific Design Automation Conference. IEEE
Press, 217–223.
[24] Wim JC Melis and Michitaka Kameyama. 2009. A study of the different uses of colour channels for traffic sign
recognition on hierarchical temporal memory. In Innovative Computing, Information and Control (ICICIC), 2009 Fourth
International Conference on. IEEE, 111–114.
[25] Numenta. 2014. The Science of Anomaly Detection (How HTM Enables Anomaly Detection in Streaming Data).
[26] Daniel E Padilla, Russell Brinkworth, and Mark D McDonnell. 2013. Performance of a hierarchical temporal memory
network in noisy sequence learning. In Computational Intelligence and Cybernetics (CYBERNETICSCOM), 2013 IEEE
International Conference on. IEEE, 45–51.
[27] Daniel E Padilla-Baez. 2015. Analysis and Spiking Implementation of the Hierarchical Temporal Memory Model for
Pattern and Sequence Recognition. Ph.D. Dissertation. University of South Australia.
[28] Mirko Prezioso, Farnood Merrikh-Bayat, BD Hoskins, GC Adam, Konstantin K Likharev, and Dmitri B Strukov. 2015.
Training and operation of an integrated neuromorphic network based on metal-oxide memristors. Nature 521, 7550
(2015), 61–64.
[29] Themistoklis Prodromakis, Boon Pin Peh, Christos Papavassiliou, and Christofer Toumazou. 2011. A versatile memristor
model with nonlinear dopant kinetics. IEEE transactions on electron devices 58, 9 (2011), 3099–3105.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
Neuromemrisitive Architecture of HTM with On-Device Learning and Neurogenesis 1:25
[30] Shubha Ramakrishnan and Jennifer Hasler. 2014. Vector-matrix multiply and winner-take-all as an analog classifier.
IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 2 (2014), 353–361.
[31] Behzad Razavi. 2017. Design of Analog CMOS Integrated Circuits. McGraw-Hill.
[32] Kenneth L Rice, Tarek M Taha, and Christopher N Vutsinas. 2008. Hardware acceleration of image recognition through
a visual cortex model. Optics & Laser Technology 40, 6 (2008), 795–802.
[33] Nicholas Soures, AM Zyarah, K Carlson, JB Aimone, and D Kudithipudi. [n. d.]. How Neural Plasticity Boosts
Performance of Spiking Neural Networks. ([n. d.]).
[34] Lennard Streat, Dhireesha Kudithipudi, and Kevin Gomez. 2016. Non-volatile hierarchical temporal memory: Hardware
for spatial pooling. arXiv preprint arXiv:1611.02792 (2016).
[35] Dmitri B Strukov, Gregory S Snider, Duncan R Stewart, and R Stanley Williams. 2008. The missing memristor found.
nature 453, 7191 (2008), 80.
[36] Son Ngoc Truong, Khoa Van Pham, and Kyeong-Sik Min. 2018. Spatial-Pooling Memristor Crossbar Converting
Sensory Information to Sparse Distributed Representation of Cortical Neurons. IEEE Transactions on Nanotechnology
17, 3 (2018), 482–491.
[37] Pavan Vyas and Mazad Zaveri. 2013. VERILOG IMPLEMENTATION OF A NODE OF HIERARCHICAL TEMPORAL
MEMORY. Asian Journal of Computer Science & Information Technology 3, 7 (2013).
[38] Walt Woods, Jens Bürger, and Christof Teuscher. 2015. Synaptic weight states in a locally competitive algorithm for
neuromorphic memristive hardware. IEEE Transactions on Nanotechnology 14, 6 (2015), 945–953.
[39] Jianguo Xing, Tao Wang, Yang Leng, and Jun Fu. 2012. A bio-inspired olfactory model using hierarchical temporal
memory. In Biomedical Engineering and Informatics (BMEI), 2012 5th International Conference on. IEEE, 923–927.
[40] Jinxiang Zha, He Huang, and Yujie Liu. 2016. A novel window function for memristor model with application in
programming analog circuits. IEEE Transactions on Circuits and Systems II: Express Briefs 63, 5 (2016), 423–427.
[41] Wei Zhang and David J Linden. 2003. The other side of the engram: experience-driven changes in neuronal intrinsic
excitability. Nature Reviews Neuroscience 4, 11 (2003), 885.
[42] Abdullah M Zyarah. 2015. Design and analysis of a reconfigurable hierarchical temporal memory architecture. Rochester
Institute of Technology.
[43] Abdullah M Zyarah and Dhireesha Kudithipudi. 2015. Reconfigurable hardware architecture of the spatial pooler for
hierarchical temporal memory. In System-on-Chip Conference (SOCC), 2015 28th IEEE International. IEEE, 143–153.
[44] Abdullah M Zyarah and Dhireesha Kudithipudi. 2018. Neuromorphic Architecture for the Hierarchical Temporal
Memory. IEEE Transactions on Emerging Topics in Computational Intelligence 2, 5 (2018), xx–138.
[45] Abdullah M Zyarah, Nicholas Soures, Lydia Hays, Robin B Jacobs-Gedrim, Sapan Agarwal, Matthew Marinella, and
Dhireesha Kudithipudi. 2017. Ziksa: On-chip learning accelerator with memristor crossbars for multilevel neural
networks. In Circuits and Systems (ISCAS), 2017 IEEE International Symposium on. IEEE, 1–4.
ACM J. Emerg. Technol. Comput. Syst., Vol. 1, No. 1, Article 1. Publication date: January 2018.
