This work explores the feasibility of specialized hardware implementing the
Cortical Learning Algorithm (CLA) in order to fully exploit its inherent
advantages. This algorithm, which is inspired in the current understanding of
the mammalian neo-cortex, is the basis of the Hierarchical Temporal Memory
(HTM). In contrast to other machine learning (ML) approaches, the structure is
not application dependent and relies on fully unsupervised continuous learning.
We hypothesize that a hardware implementation will be able not only to extend
the already practical uses of these ideas to broader scenarios but also to
exploit the hardware-friendly CLA characteristics. The architecture proposed
will enable an unfeasible scalability for software solutions and will fully
capitalize on one of the many CLA advantages: low computational requirements
and reduced storage utilization. Compared to a state-of-the-art CLA software
implementation it could be possible to improve by 4 orders of magnitude in
performance and up to 8 orders of magnitude in energy efficiency. We propose to
use a packet-switched network to tackle this. The paper addresses the
fundamental issues of such an approach, proposing solutions to achieve scalable
solutions. We will analyze cost and performance when using well-known
architecture techniques and tools. The results obtained suggest that even with
CMOS technology, under constrained cost, it might be possible to implement a
large-scale system. We found that the proposed solutions enable a saving of 90%
of the original communication costs running either synthetic or realistic
workloads.Comment: Submitted for publicatio