Thermodynamic-RAM Technology Stack by Nugent, M. Alexander & Molter, Timothy W.
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
The International Journal of Parallel, Emergent and Distributed Systems
Vol. 00, No. 00, Month 2016, 1–17
RESEARCH ARTICLE
Thermodynamic-RAM Technology Stack
M. Alexander Nugenta and Timothy W. Moltera∗
aKnowm Inc., Santa Fe, NM, USA;
(Received 00 Month 200x; in final form 00 Month 200x)
We introduce a technology stack or specification describing the multiple levels of abstrac-
tion and specialization needed to implement a neuromorphic processor (NPU) based on
the previously-described concept of AHaH Computing and integrate it into today’s digi-
tal computing systems. The general purpose NPU implementation described here is called
Thermodynamic-RAM (kT-RAM) and is just one of many possible architectures, each with
varying advantages and trade offs. Bringing us closer to brain-like neural computation, kT-
RAM will provide a general-purpose adaptive hardware resource to existing computing plat-
forms enabling fast and low-power machine learning capabilities that are currently hampered
by the separation of memory and processing, a.k.a the von Neumann bottleneck. Because
understanding such a processor based on non-traditional principles can be difficult, by pre-
senting the various levels of the stack from the bottom up, layer by layer, explaining kT-RAM
becomes a much easier task. The levels of the Thermodynamic-RAM technology stack include
the memristor, synapse, AHaH node, kT-RAM, instruction set, sparse spike encoding, kT-
RAM emulator, and SENSE server.
Keywords: neuromorphic, memristor, artificial intelligence, machine learning
1. Introduction
Machine learning applications span a very diverse landscape. Some areas in-
clude motor control, combinatorial search and optimization, clustering, prediction,
anomaly detection, classification, regression, natural language processing, planning
and inference. A common thread is that a system learns the patterns and structure
of the data in its environment, builds a model, and uses that model to make pre-
dictions of subsequent events and take action. The models which emerge contain
hundreds to trillions of continuously adaptive parameters. Human brains contain
on the order of 1015 adaptive synapses. How the adaptive weights are exactly im-
plemented in an algorithm varies, and established methods include support vector
machines, decision trees, artificial neural networks and deep learning, to name a few
[1]. Intuition tells us learning and modeling the environment is a valid approach
in general, as the biological brain also appears to operate in this manner. The
unfortunate limitation with the algorithmic approach, however, is that it runs on
traditional digital hardware. In such a computer, calculations and memory updates
must necessarily be performed in different physical locations, often separated by a
significant distance. The power required to adapt parameters grows impractically
large as the number of parameters increases owing to the tremendous energy con-
sumed shuttling digital bits back and forth. In a biological brain (and all of nature),
the processor and memory are the same physical substrate and many computations
∗Corresponding author. Email: tim@knowm.org
ISSN: 1744-5760 print/ISSN 1744-5779 online
c© 2016 Taylor & Francis
DOI: 10.1080/17445760.YYYY.CATSid
http://www.informaworld.com
ar
X
iv
:1
40
6.
56
33
v2
  [
cs
.N
E]
  2
9 M
ar 
20
17
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
2 M. Alexander Nugent and Timothy W. Molter
and memory adaptations are performed in parallel. Recent progress has been made
with multi-core processors and specialized parallel processing hardware like GP-
GPUs [2] and FPGAs [3], but for machine learning applications that intend to
achieve the ultra-low power dissipation of biological nervous systems, it is a dead
end approach [4]. The low-power solution to machine learning occurs when the
memory-processor distance goes to zero, and this can only be achieved through
intrinsically adaptive hardware, such as memristors.
Given the success of recent advancements in machine learning algorithms com-
bined with the hardware power dilemma, an immense pressure exists for the de-
velopment neuromorphic computer hardware. The Human Brain Project and the
BRAIN Initiative with funding of over EUR 1.190 billion and USD 3 billion respec-
tively partly aim to reverse engineer the brain in order to build brain-like hardware
[5, 6]. DARPA’s recent SyNAPSE program funded two large American tech com-
panies IBM and HP as well as research giant HRL labs and aimed to develop a new
type of cognitive computer similar to the form and function of a mammalian brain.
The recent Nanotechnology-Inspired Grand Challenge for Future Computing in the
United States [7] was formed to ”Create a new type of computer that can proac-
tively interpret and learn from data, solve unfamiliar problems using what it has
learned, and operate with the energy efficiency of the human brain.” CogniMem is
commercializing a k-nearest neighbor application specific integrated circuit (ASIC)
for pattern classification, a common machine learning task found in diverse appli-
cations [8]. Stanford’s Neurogrid, a computer board using mixed digital and analog
computation to simulate a network, is yet another approach at neuromorphic hard-
ware [9]. Manchester University’s SpiNNaker is another hardware platform utilizing
parallel cores to simulate biologically realistic spiking neural networks[10]. IBM’s
neurosynaptic core and TrueNorth cognitive computing system resulted from the
SyNAPSE program [11]. All these platforms have yet to prove utility along the
path towards mass adoption, and none have yet solved the foundational problem
of memory-process separation.
More rigorous theoretical frameworks are also being developed for the neuromor-
phic computing field. Recently, Traversa and Ventra have introduced the idea of
‘universal memcomputing machines’, a general-purpose computing machine that
has the same computational power as a non-deterministic Universal Turing Ma-
chine showing intrinsic parallelization and functional polymorphism [12]. Their
system and other similar proposals employ a relatively new electronic component,
the memristor, whose instantaneous state is a function of its past states. In other
words, it has memory, and like a biological synapse, it can be used as a subcom-
ponent for computation while at the same time storing a unit of data. A previous
study by Thomas et al. demonstrated that the memristor can better be used to
implement neuromorphic hardware than traditional CMOS electronics [13].
Our attempt to develop neuromorphic hardware takes a unique approach in-
spired by life, and more generally, natural self-organization. We call the theoretical
result of our efforts ‘AHaH Computing’ and have previously provided a thorough
and rigorous quantitative description [14]. Rather than trying to reverse engineer
the brain or transfer existing machine learning algorithms to new hardware and
blindly hope to end up with an elegant power efficient chip, AHaH computing
was designed from the beginning with a few key constraints: (1) must result in a
hardware solution where memory and computation are combined, (2) must enable
most or all machine learning applications, (3) must be simple enough to build chips
with existing manufacturing technology and emulated with existing computational
platforms for verification of methods (4) must be understandable and adoptable by
application developers across all manufacturing sectors. This initial motivation led
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 3
us to utilize physics create a technological framework for a neuromorphic processor
satisfying the above constraints.
In trying to understand how nature computes, we stumbled upon a fundamental
structure found not only in the brain but also almost everywhere one looks - a
self-organizing energy-dissipating fractal. We find it in rivers, trees, lighting and
fungus, but we also find it deep within us. The air that we breathe is coupled to
our blood through thousands of bifurcating flow channels that form our lungs. Our
brain is coupled to our blood through thousands of bifurcating flow channels that
form our arteries and veins. The neurons in our brains are built of thousands of
bifurcating flow channels that form our axons and dendrites. At all scales of organi-
zation we see the same fractal built from the same simple building block: a simple
structure formed of competing energy dissipation pathways. We call this build-
ing block ‘nature’s Transistor’, as it appears to represent a foundational adaptive
building block from which higher-order self-organized structures are built, much
like the transistor is a building block for modern computing.
When multiple conduction pathways compete to dissipate energy through an
adaptive container, the container will adapt in a particular way that leads to the
maximization of energy dissipation. We call this mechanism the Anti-Hebbian and
Hebbian (AHaH) plasticity rule. It is computationally universal, but perhaps more
importantly and interestingly, it also leads to general-purpose solutions in ma-
chine learning. Because the AHaH rule describes a physical process, we can create
efficient and dense analog AHaH synaptic circuits with memristive components.
One version of these mixed signal (digital and analog) circuits forms a generic
adaptive computing resource we call Thermodynamic Random Access Memory or
Thermodynamic-RAM, described herein. Thermodynamics is the branch of physics
that describes the temporal evolution of matter as it flows from ordered to disor-
dered states, and nature’s Transistor is an energy-dissipation flow structure, hence
‘thermodynamic’.
In neural systems, the algorithm is specified by two things: the network topology
and the plasticity of the interconnections or synapses. Any general-purpose neural
processor must contend with the problem that hard-wired neural topology will
restrict the available neural algorithms that can be run on the processor. It is also
crucial that the NPU interface merge easily with modern methods of computing.
A ‘Random Access Synapse’ structure satisfies these constraints.
Thermodynamic-RAM is the first attempt at realizing a working neuromorphic
processor implementing the theory of AHaH computing. While several alternative
designs, such as dual crossbars, are feasible and may offer specific advantages over
others, this first design aims to be a general computing substrate geared towards
reconfigurable network topologies and the entire spectrum of the machine learning
application space. In the following sections, we break down the entire design specifi-
cation into various levels from ideal memristors to integrating the finished product
into existing technology. Defining the individual levels of this ‘technology stack’
helps to introduce the technology step by step and group the necessary pieces into
tasks with focused objectives. This allows for separate groups to specialize at one
or more levels of the stack where their strengths and interests exist. Improvements
at various levels can propagate throughout the whole technology ecosystem, from
materials to markets, without any single participant having to bridge the whole
stack. In a way, the technology stack is an industry specification.
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
4 M. Alexander Nugent and Timothy W. Molter
Figure 1. Our generalized memristor model captures both the memory and
exponential diode characteristics via metastable switches (MSS) and parallel
Schottky diodes and provides an excellent model for a wide range of mem-
ristive devices. Here we show a hysteresis plot for a Ag-chalcogenide device
from Boise State University along with a fitted model.
2. The Thermodynamic-RAM Technology Stack
2.1 The Memristor – Metastable Switch Collection
Many memristive materials have recently been reported [15–20], and the trend
continues. New designs and materials are being used to create a diverse range of
devices. Memristor models are also being developed and incrementally improved
upon [21–25]. Our generalized metastable switch (MSS) memristor model is to
date a candidate for the most accurate model shown to capture the behavior of
memristors at a level of abstraction sufficient to enable efficient circuit simulations
while simultaneously describing as wide a range of devices as possible [14]. A MSS
is an idealized two-state element that switches probabilistically between its two
states as a function mainly of the applied time-voltage integral. The MSS model
describes a memristor as a collection of MSSs evolving in time, which captures
important device behavior such as hysteresis in response to an oscillating excitation
and incremental conductance change in response to voltage pulses. The MSS model
can be made more complex to account for failure modes, for example by making
the MSS state potentials temporally variable. Multiple MSS models with different
state potentials can be combined in parallel or series to model increasingly more
complex state systems.
In our semi-empirical model, the total current through the device comes from
both a memory-dependent (MSS) current component, Im, and a Schottky diode
current, Is in parallel:
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 5
I = φIm(V, t) + (1− φ)Is(V ), (1)
where φ ∈ [0, 1]. A value of φ = 1 represents a device that contains no Schottky
diode effects. Tuning φ more towards zero gradually introduces more diode current
while scaling back the memory component. The Schottky diode effect accounts for
the exponential behavior found in many memristor devices made of sandwiched
layers of metal and semiconductor material and allows for the accurate modeling
of that effect, which the MSS component cannot capture alone.
Thermodynamic-RAM is not constrained to just one particular memristive de-
vice; any memristive device can be used as long as it meets the following criteria: (1)
it is incremental and (2) its state change is voltage dependent. The ideal device for
neuromemristive processors would have low thresholds of adaptation (<0.2 V, re-
duce power loses during learning V 2 ·R), on-state resistance of ∼10 kΩ or greater
(reduce static current loses P = I2/R), high dynamic range (increase synaptic
weight resolution), durability (increase life of chip), the capability of incremental
operation with very short pulse widths (increase learning precision and reduce en-
ergy E = P ∗∆t) and long retention times of a week or more (reduce loss of trained
state). However, even devices that deviate considerably from these parameters will
be useful in more specific applications. For example, short retention times on the
order of seconds are perfectly compatible with combinatorial optimizers.
We have previously shown that our generalized MSS model for memristors accu-
rately models four potential memristor candidates [14] for Thermodynamic-RAM,
and we have incorporated the model into our circuit simulation and machine learn-
ing benchmarking software. A recent Ag-chalgogenide memristor device fabricated
by Kris Campbell at Boise State University and model hysteresis plot is shown
in Figure 1. The model provides common ground from which the diversity of de-
vices can be compared and incorporated into the technology stack. By modeling
a device with the MSS model, a material scientist can evaluate its utility across
real-world benchmarks via software emulators and gain valuable insight into which
memristive properties are, and are not, useful in the application space.
2.2 The Synapse – Competing Memristors
As a variable conductance device, a memristor is an adaptive energy-dissipating
pathway. As current flows through it, its internal state changes and heat is ex-
changed to the surrounding environment. When two adaptive energy-dissipating
pathways compete for conduction resources, a nature’s transistor will emerge. Two
competing memristors thus form a synapse as shown in Figure 2. We see this
building block for self-organized structures throughout nature, for example in ar-
teries, veins, lungs, neurons, leaves, branches, roots, lightning, rivers and mycelium
networks. We observe that in all cases there is a particle that flows through com-
petitive energy-dissipating assemblies. The particle is either directly a carrier of
free energy dissipation or else it appears to gate access, like a key to a lock, to free
energy dissipation of the units in the collective. Some examples of these particles
include water in plants, ATP in cells, blood in bodies, neurotrophins in brains, and
money in economies. In the cases of whirlpools, hurricanes, tornadoes and convec-
tion currents we note that although the final structure does not appear to be built
of competitive structures, it is the result of a competitive process with one winner;
namely, the spin or rotation.
The circuits capable of achieving AHaH plasticity can be broadly categorized
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
6 M. Alexander Nugent and Timothy W. Molter
Figure 2. A) A self-organizing energy-dissipating fractal can be found
throughout nature, is composed of a simple repeating structure formed of
competing energy dissipation pathways. B) The simple bifurcating dissipa-
tive pathway is what we call nature’s transistor or synapse. C) A differential
pair of memristors provide a means for implementing a synapse in our elec-
tronics.
by the electrode configuration that forms the synapse as well as how the input
activation (current) is converted to a feedback voltage that drives unsupervised
anti-Hebbian learning [26, 27]. Synaptic currents can be converted to a feedback
voltage statically (resistors or memristors), dynamically (capacitors), or actively
(operational amplifiers). Each configuration requires unique circuitry to drive the
electrodes so as to achieve AHaH plasticity, and multiple driving methods exist.
Non-Polar, Polar, and Bipolar memristors can be used. These are defined as fol-
lowing:
• Non-Polar: Application of both positive and negative voltage bias induces only
increase or only decrease in conductance. Thermodynamic decay is used to
change the conductance in the other direction. Examples include the dielec-
trophoretic aggregation of conductive nanoparticles in colloidal suspension [28]
.
• Polar: Application of voltage bias enables incremental conductance in one di-
rection, but all-or-nothing change in the opposite direction. An example of this
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 7
Figure 3. An AHaH node is made up of n synapses sharing a common output
electrode, y. The memristor pair synapse and the AHaH node are analogous to
a biological synapse and neuron, respectively. In Thermodynamic-RAM, the
number of input synapses can be configured via software and several AHaH
nodes can be connected together to form any desired network topology by a
technique called temporal partitioning.
includes phase-change memory [29] .
• Bi-Polar: Application of positive and negative voltage bias enables incremental
conductance increase and decrease. An example of this include Self Directed
Channel (SDC) memristors [15] .
All of these devices can be used as adaptive dissipation pathways and, via a spe-
cialized circuit, be made to compete for conduction resources. Hence, a large num-
ber of AHaH circuits exist. Herein, a ‘2-1’ two-phase circuit configuration with polar
memristors is introduced because of its compactness and because it is amenable to
simple mathematical analysis [14].
2.3 The AHaH Node – Collections of Synapses
An AHaH node is formed when a collective of synapses are coupled to a common
readout line. Through spike encoding and temporal multiplexing, an AHaH node is
capable of being partitioned into smaller functional AHaH nodes. An AHaH node
provides a simple but computationally universal (and extremely useful) adaptation
resource.
The functional objective of the AHaH node shown in Figure 3 is to produce an
analog output signal on electrode y, given an arbitrary spike input of length n with
k active inputs and n− k inactive (floating) inputs. The circuit consists of one or
more memristor pairs (synapses) sharing a common electrode labeled y. Switches
gating access to a driving voltage are labeled with an S. The individual switches
for spike inputs of the AHaH node are labeled S0, S1 · · · Sn. The driving voltage
source for supervised and unsupervised learning is labeled F . The subscript values
a and b indicate the positive and negative dissipative pathways, respectively.
Similar to a binary perceptron [30], an AHaH node is able to linearly separate
two classes of samples represented as n-dimensional input vectors. The output is
the sum of the products of all inputs and weights plus a bias. Unlike a perceptron,
an AHaH node is not an algorithm but an adaptive hardware construct, i.e. a
physical adapting circuit where the output is an analog value representing the sum
or integration of currents of physical synapse circuits cable of Anti-Hebbian and
Hebbian plasticity. The synaptic weight is the difference in conductance between
the differential memristor pair.
During the read phase, switches Sa and Sb connect voltage sources +V and −V
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
8 M. Alexander Nugent and Timothy W. Molter
respectively for all k active inputs. Inactive S inputs are left floating. The combined
conductance of the active inputs produce an output voltage on electrode y. This
analog signal contains useful confidence information and can be digitized via the
sgn() function to either a logical 1 or a 0, if desired. The read example in Figure 3
shows a simple case with one active synapse, V = 1V , Ma = 1mS and Mb = 0.1mS.
The resulting voltage divider circuit produces a voltage Vy, which in this case is
0.818 V. The act of reading also decays the synaptic weight slightly towards 0 V.
This is the anti-Hebbian part of the AHaH rule.
During the write phase, voltage source F is set to either V writey = V sgn
(
Vy
read
)
(unsupervised) or V writey = V sgn(s) (supervised), where s is an externally applied
teaching signal. The polarity of the driving voltage sources gates by the switches
S are inverted to −V and +V . The polarity switch causes all active memristors to
be driven to a less conductive state, counteracting the read phase. If this dynamic
counteraction did not take place, the memristors would quickly saturate into their
maximally conductive states, rendering the synapses useless. The write example in
Figure 3 extends the simple read example, where a 1V teaching signal is applied
at node y. This applied voltage will case a -2 V drop across Ma, driving it into
a less conductive state. The actual ending conductance value will depend on the
time the voltage is applied. This is the Hebbian part of the AHaH rule.
A more intuitive explanation of the above feedback cycle is that “the winning
pathway is rewarded by not getting decayed.” Each synapse can be thought of
as two competing energy dissipating pathways (positive or negative evaluations)
that are building structure (differential conductance). We may apply reinforcing
Hebbian feedback by (1) allowing the winning pathway to dissipate more energy or
(2) forcing the decay of the losing pathway. If we chose method (1) then we must at
some future time ensure that we decay the conductance before device saturation is
reached. If we chose method (2) then we achieve both decay and reinforcement at
the same time. Method (2) is faster while method (1) is more energy efficient. The
lowest energy solution is to use natural decay rather than forced decay, but this
introduces complexities associated with matching the decay rate to the particular
processing task.
2.4 Sparse Spike Encoding – Information Encoding
A spike stream is the means in which real-world data is asynchronously fed into
kT-RAM. Its biological counterpart would be the bundles of axons of the nervous
system which carry sensed information from sensing organs to and around the cor-
tex. A sparse spike stream interface is the only option with kT-RAM, and it is used
for all machine learning applications from robotic control to clustering to classifi-
cation. This trait enables an application developer to leverage their knowledge and
experience using kT-RAM in one domain and transfer it over to another. Spikes can
directly address core synapses. The synaptic core address can thus be given by the
sum of the AHaH node’s core partition index and the spike ID, which are both just
integers in the spike space. Spikes enable kT-Core partitioning and multiplexing,
which in turn enables arbitrary AHaH node sizes and hence very flexible network
topologies. Sparse spike encoding is also very energy and bandwidth efficient and
has shown to produce state-of-the-art results on numerous benchmarks. We choose
spikes because they work, and we are attempting to engineer a useful computing
substrate. The fact that the spike encoding appears to match biology is of course
curious, but ultimately not important to our objectives.
A collection of n synapses belong to a neuron (AHaH node), each with an as-
sociated weight: {w0, w1, · · ·wn}. A subset of the synapses in an AHaH node can
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 9
Figure 4. A spike-based system such as kT-RAM requires spike encoders
(sensors), spike streams (wire bundles), spike channels (a wire), spike space
(number of wires), spike patterns (active spike channels) and finally spikes
(the state of being active). A spike encoding is, surprisingly, nothing more
than a list of encoders that directly address synapses on a kT-RAM core.
be activated by some input spike pattern, and the total neural activation is the
voltage of the H-Tree, which can be read out on the common electrode, y by the
AHaH Controller. For many input patterns, x is a sparse spiking representation,
meaning that only a small subset of the spike channels are activated out of the
spike space, and when they are, they are of value 1. So for a neuron with 16 inputs,
one possible sparse-spike pattern would look like: x = {1000001000000000}. Since
two of the 16 possible inputs are active (spiking), we say that it has a sparsity of
2/16 or 12.5%. Since most of the inputs are zero, we can write this spike pattern in
a much more efficient way by just listing the index of the inputs that are spiking:
x = {0, 6}.
We call x a ‘spike set’ or ‘spike pattern’ or sometimes just ‘spikes’. The ‘spike
space’ is the total number of ‘spike channels’, in this case 16. In some problems
such as inference or text classification the spike space can get all the way up to
250,000 or more. A good way to picture it is as a big bundle of wires, where the
total number of wires is the spike space and the set of wires active at any given time
is the spike pattern. We call this bundle of wires and the information contained in
it the ‘spike stream’. The algorithms or hardware that convert data into a sparse-
spiking representation are called ‘spike encoders’. Your eyes, ears and nose are
examples of spike encoders. A visual representation of this can be seen in Figure 4.
2.5 kT-RAM – AHaH Nodes with RAM Interface
As previously stated, the particular design of kT-RAM presented in this paper
prioritizes flexibility and general utility above anything else, much in the same way
that a CPU is designed for general purpose use. This particular design builds upon
commodity RAM using its form factor and the row and column address space map-
ping to specific bit cells. Modifying RAM to create a kT-RAM core requires the
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
10 M. Alexander Nugent and Timothy W. Molter
Figure 5. A) Spikes (integers) in a spike pattern (integer set) are used to ad-
dress synaptic elements in a core, which become selectively coupled to drive
circuitry (AHaH Controller). B) During the read and write phases, the acti-
vated synapses (memristor pairs) are coupled to the triple H-tree electrodes
A, B and y. C) By coupling several cores together, via either analog or digital
methods, large collections or core (kT-RAM) can be created and specialized
for tasks such as high-dimension inference (analog coupling) or compositional
learning (digital coupling). D) kT-RAM can borrow from existing RAM ar-
chitecture to integrate into existing digital computing platforms.
following steps: (1) removal of the RAM reading circuitry, (2) minor design modifi-
cations of the RAM cells, (3) the addition of memristive synapses to the RAM cells,
(4) addition of H-Tree circuitry connecting the synapses, (5) and addition of driv-
ing and output sensing circuitry - the ‘AHaH controller’. Multiple kT-RAM cores
can be manufactured and connected to each other on the same die (Figure 5 C).
Leveraging existing techniques and experience of foundries capable of producing
commodity RAM as well as using three to five generation-old processing facilities
will make the prototyping and manufacturing of kT-RAM relatively inexpensive.
Even the final packaging of kT-RAM modules (Figure 5 D) can leverage existing
commodity hardware infrastructure.
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 11
Figure 5A and B show what kT-RAM would look like with its triple H-Tree
sensing node connecting all the underlying synapses located at each cell in the
RAM array. The 3 fractal binary trees shown (although from above it look like
one single H-tree there are 3 stacked on top of each other) are the AHaH node’s
output electrode, y, as well as the driving voltage sources A and B connected via
the switches as shown in Figure 3. The switches are activated via the incoming
spikes (integers) and couple the individual synapses (memristor pairs) to the triple
H-tree in preparation for the read and write cycles. All other synapses in the H-
tree are left floating i.e. not connected to the triple H-tree. While at first glance it
appears like this architecture leads to one giant AHaH node per chip or core, the
core can be partitioned into smaller AHaH nodes of arbitrary size by temporally
partitioning sub portions of the tree. In other words, so long as it is guaranteed
that synapses assigned to a particular AHaH node partition are never co-activated
with other partitions, these ‘virtual’ AHaH nodes can co-exist on the same physical
core. This allows us to effectively exploit the extreme speed of modern electronics.
Any desired network topology linking AHaH nodes together can be achieved easily
through a kT-RAM/CPU/RAM paring. Software enforces the constraints, while
the hardware remains flexible.
Through temporal partitioning combined with spike encoding, AHaH nodes can
be allocated with as few as one or as many synapses as the application requires and
can be connected to create any network topology. This flexibility is possible because
of a RAM interface with addressable rows and columns. Crossbar architectures, in
addition to sneak-path issues, introduce a restrictive topology. While this is good
for specialized applications, one cannot build a general-purpose machine learning
substrate from an intrinsically restricted topology. Cores can be electrically coupled
to form a larger combined core. The number of cores, and the way in which they
are addressed and accessed will vary across implementations so as to be optimized
for end use applications. AHaH node sizes can therefore vary from one synapse
to the size of the kT-RAM chip, while digital coupling could extend the maximal
size to ‘the cloud’, limited only by the kT-Cores intrinsic adaptation rates and
chip-to-chip communication.
2.6 kT-RAM Instruction Set
Thermodynamic RAM performs an analog sum of currents and adapts physically,
eliminating the need to compute and write memory updates. One can theoreti-
cally exploit the kT-RAM instruction set (Table 1) however they wish. However,
to prevent weight saturation, one must pair ‘forward’ instructions with ‘reverse’
instructions, although not necessarily one right after the other. For example, a
forward-read operation FF should be followed by a reverse operation (RF , RH,
RL, RZ, RA or RU) and vise versa. The only way to extract state information is
to leave the feedback voltage floating, and thus there are two possible read instruc-
tions: FF and RF . There is no such thing as a ‘non-destructive read’ operation in
kT-RAM, as this property is governed by the underlying physics of the memristive
elements, which can differ. Every memory access results in weight adaptation, al-
though it should be noted that operating under the adaptation threshold of some
devices may result in negligible changes. By understanding how the Anti-Hebbian
and Hebbian plasticity works (AHaH computing), we can exploit weight adapta-
tions to create, among other things, ‘self-healing hardware’. The act of accessing
the information actually repairs and heals it.
Figure 6 contains pseudo code demonstrating how to construct a multi-label
online classifier by loading spikes and executing instructions in the kT-RAM in-
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
12 M. Alexander Nugent and Timothy W. Molter
Table 1. kT-RAM Instruction Set
Instruction Synapse Driving Voltage Feedback Voltage (F)
FF Forward-Float None/Floating
FH Forward-High −V
FL Forward-Low +V
FU Forward-Unsupervised −V if y ≥ 0 else +V
FA Forward-Anti-Unsupervised +V if y ≥ 0 else −V
FZ Forward-Zero 0
RF Reverse-Float None/Floating
RH Reverse-High −V
RL Reverse-Low +V
RU Reverse-Unsupervised −V if y ≥ 0 else +V
RA Reverse-Anti-Unsupervised +V if y ≥ 0 else −V
RZ Reverse-Zero 0
1: procedure Classify(active spikes set S, truth labels set L)
2: for Each AHaH Node N do
3: KTRAM.loadSpikes(N , S)
4: y ← KTRAM.execute(N , FF ) . forward read
5: if supervised then
6: if N ∈ L then
7: KTRAM.execute(N , RH)
8: else if y ≥ 0 then . false-positive
9: KTRAM.execute(N , RL)
10: else . true-negative
11: KTRAM.execute(N , RF )
12: end if
13: else . unsupervised
14: KTRAM.execute(N , RF )
15: end if
16: end for
17: end procedure
Figure 6. A multi-label online linear classifier with confidence estimation can be easily constructed via
calls to the kT-RAM instruction set.
struction set. The network topology of the classifier is simply N AHaH nodes with
M synapses, where N is the number of labels being classified and M is the number
of unique spikes in the entire spike stream space. The active spikes S, a subset of
M , is loaded onto each AHaH node, and the execute method returns the voltage on
the AHaH node’s output electrode, y. Although all the AHaH nodes may exist on
the same physical chip and share the same output electrode, temporal partitioning,
as described above, allows for a virtual separation of AHaH nodes. Note that the
execute method actually receive two instructions at the same time in the imple-
mentation of the execution set in order to enforce executing ‘reverse’ instructions
with ‘forward’ instructions as discussed above. For the rare case where only one
instruction should be executed, for example a ‘FF’ read instruction, a special ”do
nothing” instruction is defined: ‘XX’”
The Mixed National Institute of Standards and Technology (MNIST) database
[31] is a classic dataset in the machine learning community. It consists of 60,000
train and 10,000 test samples of handwritten digits, each containing a digit 0 to
9 (10 classes). The 28 x 28 pixel grayscale images have been preprocessed to size-
normalize and center the digits. This very basic test benchmark is commonly used
in the machine learning community to test out new ideas and algorithms and
numerous results are published. When using our kT-RAM emulator to run the
pseudo-code shown in Figure 6 on the MNIST benchmark dataset, it produces an
accuracy of 92.1% congruent with state-of-the-art linear classifier algorithms. In
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
Knowm Inc. 13
order to improve on this result, additional machine learning techniques need to be
employed beyond linear classification.
2.7 kT-RAM Emulator – Cross-platform Universality
Thermodynamic-RAM is designed to plug into existing computing architectures
easily. The envisioned hardware format is congruent with standard RAM chips
and RAM modules and would plug into a motherboard in a variety of different
ways. In general there are two main categories of integration. First, kT-RAM can
be tightly coupled with the CPU, on the CPU die itself or connected via the north
bridge. In this case, the instruction set of the CPU would have to be modified
to accommodate the new capabilities of kT-RAM. Secondly, kT-RAM is loosely
coupled as a peripheral device either via the PCI bus, the LPC bus, or via cables or
ports to the south bridge. In these cases, no modification to the CPU’s instruction
set would be necessary, as the interfacing would be implemented over the generic
plug-in points over the south bus. As in the case with other peripheral devices,
a device driver would need to be developed. Additional integration configurations
are also possible.
Given the envisioned hardware integration, kT-RAM simply becomes an addi-
tional resource that software developers have access to via an API. In the meantime,
kT-RAM is implemented as an emulator running on von Neumann architecture (for
more machine learning benchmarks see [14]), but the API will remain the same.
Later, when the new NPU is available, it will replace the emulator, and existing
programs will not need to be rewritten to benefit from the accelerated capabilities
offered by the hardware. In any case, kT-RAM operates asynchronously. As new
spike streams arrive, the driver in control of kT-RAM is responsible for activating
the correct synapses and providing the AHaH controller with an instruction pair
for each AHaH node. The returned activation value can then be passed back to the
program and used as needed.
Emulators allow developers to commence application development while remain-
ing competitive with competing machine learning approaches. In other words, we
can build a market for kT-RAM across all existing computing platforms while we
simultaneously build the next generation of kT-RAM hardware. kT-RAM software
emulators for both memristive circuit validation and near-term application develop-
ment on digital computers have already been developed and deployed commercially
on real-world client problems. Our current digital kT-Core emulators seem to be
extremely efficient running on commodity hardware compared to existing meth-
ods in performance, energy and memory efficiency, and a thorough comparison is
planned for the near future. Thermodynamic-RAM is not a ‘ten year technology’
nor is it ‘bleeding edge’. Rather, it is already solving real-world machine learning
problems on existing digital platforms.
2.8 SENSE Server – Plug-and-Play Machine Learning Apps
While a machine learning application developer using the kT-RAM emulator would
have full control of the design of the application and can use kT-RAM to its full
potential, she would be required to understand the instruction set and underly-
ing mechanics of kT-RAM and AHaH Computing. This level of development is
analogous to writing assembly code or using a very low-level programming library.
To assist in the rapid development of applications based on kT-RAM, we have
developed a top-level server-based framework. We call it ‘Scalable and Extensible
Neural Sensing Engine’ or ‘SENSE server’ for short. The SENSE server contains
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
14 M. Alexander Nugent and Timothy W. Molter
higher level pre-built machine learning modules, standard spike encoders, buffers,
spike stream joiners and other miscellaneous building blocks, which can be con-
figured by the developer for a unique machine learning application. This level of
development is analogous to an SQL server like MySQL, where you provide a con-
figuration file to specify its behavior. Like the MySQL server, the SENSE server
runs as a daemon service, waiting for asynchronous interactions from the outside
world. In the case of the SENSE server, it is waiting for incoming spikes flowing in
over the configured spike streams. To install and run the SENSE server on Linux,
you would run a command in a terminal such as ‘sudo apt-get install knowm-sense’
followed by ‘start knowm-sense myconfig.yml’, where ‘myconfig.yml’ would be the
custom configuration file defining the ‘netlist’ and parameter settings of the par-
ticular machine learning application. The SENSE server can be run on commodity
computer hardware, robotic platforms or mobile devices with a Linux or *nix-based
operating system. The SENSE server also has built-in support for seamless clus-
tering for parallelization of high throughput machine learning applications such as
vision and audio processing.
3. Conclusion
In this paper, we have introduced Thermodynamic-RAM and a technology stack,
a specification or blueprint, for a future industry enabled by AHaH Computing.
kT-RAM is a particular design that prioritizes flexibility and general utility above
anything else, much in the same way that a CPU is designed for general purpose use.
The flexibility offered by this design allows for a single architecture that can be used
for the entire range of machine learning applications given their unique network
topologies. Much like the cortex integrates signals from different sensing organs
via a common ‘protocol’, the sparse spike encoding interface of kT-RAM allows for
a well defined way to integrate environmental data asynchronously. Conveniently,
the sparse spike encoding interface is a perfect bridge between digital systems
and neuromorphic hardware. Just as modern computing is based on the concept
of the bit and quantum computing is based on the concept of the qubit, AHaH
computing is built from the ahbit. AHaH attractor states are a reflection of the
underlying statistics (history) of the applied data stream. It is both the collection of
physical synapses and also the structure of the information that is being processed
that together result in an AHaH attractor state. Hence, an ahbit is what results
when we couple information to energy dissipation. Our kT-RAM design borrows
heavily from commodity RAM using its form factor to build upon and leverage
today’s chip manufacturing resources. The RAM module packaging and concise
instruction set will allow for easy integration into existing computing platforms such
as commodity personal computers, smart phones and super computers. Our kT-
RAM emulator allows us to develop applications, demonstrate utility, and justify a
large investment into future chip development. When chips are available, existing
applications using the emulator API will not have to be rewritten in order to
take advantage of new hardware acceleration capabilities. The topmost level of
the kT-RAM technology stack is the SENSE server, a framework for configuring
a custom machine learning application, based on a ‘netlist’ of pre-built machine
learning modules, standard spike encoders, buffers, spike stream joiners and other
miscellaneous building blocks.
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
REFERENCES 15
4. Future Work
At the core of the adaptive power problem is the energy wasted during memory-
processor communication. The ultimate solution to the problem entails finding
ways to let memory configure itself, and AHaH computing is a conceptual frame-
work for understanding how this can be accomplished. Thermodynamic-RAM is
an adaptive physical hardware resource for providing AHaH plasticity and hence a
substrate from which AHaH computing is possible. In previous work, we have shown
demonstrations of universal logic, clustering, classification, prediction, robotic ac-
tuation and combinatorial optimization benchmarks using AHaH computing, and
we have successfully mapped all these functions to the kT-RAM instruction set and
emulator. Efficient emulation has already been demonstrated on commodity von
Neumann hardware, and a path ahead towards neuromorphic chips has been de-
fined. Along the way, the emulator will be ported to co-processors like GP-GPUs
and FPGAs to further improve speed and power efficiency with available hard-
ware. Progress is being made independently at various levels, but a coordinated
and focused effort by multiple participants is needed to bridge the full technology
stack.
Acknowledgment
The authors would like to thank the Air Force Research Labs in Rome, NY for their
support under the SBIR/STTR programs AF10-BT31, AF121-049. The authors
would like to thank Kristy A. Campbell from Boise State University for graciously
providing us with memristor device data.
References
[1] S. Marsland, Machine learning: an algorithmic perspective, CRC press, 2015.
[2] W. Xiong, J. Droppo, X. Huang, F. Seide, M. Seltzer, A. Stolcke, D. Yu, and
G. Zweig, Achieving human parity in conversational speech recognition, arXiv
preprint arXiv:1610.05256 (2016).
[3] G. Lacey, G.W. Taylor, and S. Areibi, Deep learning on fpgas: Past, present,
and future, arXiv preprint arXiv:1602.04283 (2016).
[4] T. Potok, C. Schuman, R. Patton, and H. Li, Neuromorphic computing archi-
tectures, models, and applications, (2016).
[5] T. Hampton, European-led project strives to simulate the human brain, JAMA
311 (2014), pp. 1598–1600.
[6] T.R. Insel, S.C. Landis, and F.S. Collins, The NIH brain initiative, Science
340 (2013), pp. 687–688.
[7] Y. Chen, Q. Wu, S. Basu, J.J. Candelaria, D. Hammerstrom, D. Mountain,
V. Narayanan, R.E. Pino, and T. Potok, Nanotechnology-inspired future com-
puting, challenges and opportunities, .
[8] B. McCormick, Applying cognitive memory to cybersecurity, in Network Sci-
ence and Cybersecurity, Springer, 2014, pp. 63–73.
[9] B.V. Benjamin, P. Gao, E. McQuinn, S. Choudhary, A.R. Chandrasekaran, J.
Bussat, R. Alvarez-Icaza, J. Arthur, P. Merolla, and K. Boahen, Neurogrid: A
mixed-analog-digital multichip system for large-scale neural simulations, Pro-
ceedings of the IEEE 102 (2014), pp. 699–716.
[10] J. Navaridas, S. Furber, J. Garside, X. Jin, M. Khan, D. Lester, M. Luja´n, J.
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
16 REFERENCES
Miguel-Alonso, E. Painkras, C. Patterson, et al., SpiNNaker: Fault tolerance
in a power-and area-constrained large-scale neuromimetic architecture, Parallel
Computing 39 (2013), pp. 693–708.
[11] S.K. Esser, A. Andreopoulos, R. Appuswamy, P. Datta, D. Barch, A. Amir,
J. Arthur, A. Cassidy, M. Flickner, P. Merolla, et al., Cognitive computing
systems: Algorithms and applications for networks of neurosynaptic cores, in
Neural Networks (IJCNN), The 2013 International Joint Conference on, 2013,
pp. 1–10.
[12] F.L. Traversa and M. Di Ventra, Universal memcomputing machines, arXiv
preprint arXiv:1405.0931 (2014).
[13] A. Thomas, Memristor-based neural networks, Journal of Physics D: Applied
Physics 46 (2013), p. 093001.
[14] M.A. Nugent and M.T. W, Ahah computing-from metastable switches to at-
tractors to machine learning, PLoS ONE 9 (2014), p. e85175.
[15] K.A. Campbell, Self-directed channel memristor for high temperature opera-
tion, arXiv preprint arXiv:1608.05357 (2016).
[16] A.S. Oblea, A. Timilsina, D. Moore, and K.A. Campbell, Silver chalcogenide
based memristor devices, in Proc. 2010 IEEE International Joint Conference
on Neural Networks (IJCNN), 2010, pp. 1–3.
[17] Y. Yang, P. Sheridan, and W. Lu, Complementary resistive switching in tanta-
lum oxide-based resistive memory devices, Applied Physics Letters 100 (2012),
p. 203112.
[18] I. Valov and M.N. Kozicki, Cation-based resistance change memory, Journal
of Physics D: Applied Physics 46 (2013), p. 074005.
[19] T. Hasegawa, A. Nayak, T. Ohno, K. Terabe, T. Tsuruoka, J.K. Gimzewski,
and M. Aono, Memristive operations demonstrated by gap-type atomic
switches, Applied Physics A 102 (2011), pp. 811–815.
[20] B.L. Jackson, B. Rajendran, G.S. Corrado, M. Breitwisch, G.W. Burr, R.
Cheek, K. Gopalakrishnan, S. Raoux, C.T. Rettner, A. Padilla, A.G. Schrott,
R.S. Shenoy, B.N. Kurdi, C.H. Lam, and D.S. Modha, Nanoscale electronic
synapses using phase change devices, ACM Journal on Emerging Technologies
in Computing Systems (JETC) 9 (2013), p. 12.
[21] S. Choi, S. Ambrogio, S. Balatti, F. Nardi, and D. Ielmini, Resistance drift
model for conductive-bridge (CB) RAM by filament surface relaxation, in
Proc. 2012 IEEE 4th International Memory Workshop (IMW), 2012, pp. 1–4.
[22] S. Menzel, U. Bottger, and R. Waser, Simulation of multilevel switching in
electrochemical metallization memory cells, Journal of Applied Physics 111
(2012), p. 014501.
[23] T. Chang, S.H. Jo, K.H. Kim, P. Sheridan, S. Gaba, and W. Lu, Synaptic
behaviors and modeling of a metal oxide memristive device, Applied Physics
A 102 (2011), pp. 857–863.
[24] P. Sheridan, K.H. Kim, S. Gaba, T. Chang, L. Chen, and W. Lu, Device and
SPICE modeling of RRAM devices, Nanoscale 3 (2011), pp. 3833–3840.
[25] D. Biolek, Z. Biolek, and V. Biolkova, SPICE modeling of memristive, mem-
capacitative and meminductive systems, in Proc. 2009 IEEE European Con-
ference on Circuit Theory and Design (ECCTD), 2009, pp. 249–252.
[26] A. Nugent, Universal logic gate utilizing nanotechnology (2008), US Patent
7,420,396.
[27] A. Nugent, Methodology for the configuration and repair of unreliable switching
elements (2009).
[28] A.D. Wissner-Gross, Dielectrophoretic architectures, (2009), pp. 155–173.
[29] e.a. Burr Geoffrey W., Neuromorphic computing using non-volatile memory.,
September 14, 2018 5:24 The International Journal of Parallel, Emergent and Distributed Systems
2016˙JPEDS
The International Journal of Parallel, Emergent and Distributed Systems 17
Advances in Physics: X 2 (2017), pp. 89–124.
[30] F. Rosenblatt, The perceptron: A probabilistic model for information storage
and organization in the brain., Psychological review 65 (1958), p. 386.
[31] Y. LeCun and C. Cortes, The mnist database of handwritten digits (1998).
