98 research outputs found
Hardware optimizations of dense binary hyperdimensional computing: Rematerialization of hypervectors, binarized bundling, and combinational associative memory
Brain-inspired hyperdimensional (HD) computing models neural activity patterns of the very size of the brain's circuits with points of a hyperdimensional space, that is, with hypervectors. Hypervectors are Ddimensional (pseudo)random vectors with independent and identically distributed (i.i.d.) components constituting ultra-wide holographic words: D = 10,000 bits, for instance. At its very core, HD computing manipulates a set of seed hypervectors to build composite hypervectors representing objects of interest. It demands memory optimizations with simple operations for an efficient hardware realization. In this article, we propose hardware techniques for optimizations of HD computing, in a synthesizable open-source VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx UltraScale FPGAs: (1)We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory. These operations massively reduce the memory footprint by directly computing the composite hypervectors whose individual seed hypervectors do not need to be stored in memory. (2) Bundling a series of hypervectors over time requires a multibit counter per every hypervector component. We instead propose a binarized back-to-back bundling without requiring any counters. This truly enables onchip learning with minimal resources as every hypervector component remains binary over the course of training to avoid otherwise multibit components. (3) For every classification event, an associative memory is in charge of finding the closest match between a set of learned hypervectors and a query hypervector by using a distance metric. This operator is proportional to hypervector dimension (D), and hence may take O(D) cycles per classification event. Accordingly, we significantly improve the throughput of classification by proposing associative memories that steadily reduce the latency of classification to the extreme of a single cycle. (4) We perform a design space exploration incorporating the proposed techniques on FPGAs for a wearable biosignal processing application as a case study. Our techniques achieve up to 2.39
7 area saving, or 2,337
7 throughput improvement. The Pareto optimal HD architecture is mapped on only 18,340 configurable logic blocks (CLBs) to learn and classify five hand gestures using four electromyography sensors
PULP-HD: Accelerating Brain-Inspired High-Dimensional Computing on a Parallel Ultra-Low Power Platform
Computing with high-dimensional (HD) vectors, also referred to as
, is a brain-inspired alternative to computing with
scalars. Key properties of HD computing include a well-defined set of
arithmetic operations on hypervectors, generality, scalability, robustness,
fast learning, and ubiquitous parallel operations. HD computing is about
manipulating and comparing large patterns-binary hypervectors with 10,000
dimensions-making its efficient realization on minimalistic ultra-low-power
platforms challenging. This paper describes HD computing's acceleration and its
optimization of memory accesses and operations on a silicon prototype of the
PULPv3 4-core platform (1.5mm, 2mW), surpassing the state-of-the-art
classification accuracy (on average 92.4%) with simultaneous 3.7
end-to-end speed-up and 2 energy saving compared to its single-core
execution. We further explore the scalability of our accelerator by increasing
the number of inputs and classification window on a new generation of the PULP
architecture featuring bit-manipulation instruction extensions and larger
number of 8 cores. These together enable a near ideal speed-up of 18.4
compared to the single-core PULPv3
Integer Sparse Distributed Memory and Modular Composite Representation
Challenging AI applications, such as cognitive architectures, natural language understanding, and visual object recognition share some basic operations including pattern recognition, sequence learning, clustering, and association of related data. Both the representations used and the structure of a system significantly influence which tasks and problems are most readily supported. A memory model and a representation that facilitate these basic tasks would greatly improve the performance of these challenging AI applications.Sparse Distributed Memory (SDM), based on large binary vectors, has several desirable properties: auto-associativity, content addressability, distributed storage, robustness over noisy inputs that would facilitate the implementation of challenging AI applications. Here I introduce two variations on the original SDM, the Extended SDM and the Integer SDM, that significantly improve these desirable properties, as well as a new form of reduced description representation named MCR.Extended SDM, which uses word vectors of larger size than address vectors, enhances its hetero-associativity, improving the storage of sequences of vectors, as well as of other data structures. A novel sequence learning mechanism is introduced, and several experiments demonstrate the capacity and sequence learning capability of this memory.Integer SDM uses modular integer vectors rather than binary vectors, improving the representation capabilities of the memory and its noise robustness. Several experiments show its capacity and noise robustness. Theoretical analyses of its capacity and fidelity are also presented.A reduced description represents a whole hierarchy using a single high-dimensional vector, which can recover individual items and directly be used for complex calculations and procedures, such as making analogies. Furthermore, the hierarchy can be reconstructed from the single vector. Modular Composite Representation (MCR), a new reduced description model for the representation used in challenging AI applications, provides an attractive tradeoff between expressiveness and simplicity of operations. A theoretical analysis of its noise robustness, several experiments, and comparisons with similar models are presented.My implementations of these memories include an object oriented version using a RAM cache, a version for distributed and multi-threading execution, and a GPU version for fast vector processing
QubitHD: A Stochastic Acceleration Method for HD Computing-Based Machine Learning
Machine Learning algorithms based on Brain-inspired Hyperdimensional (HD)
computing imitate cognition by exploiting statistical properties of
high-dimensional vector spaces. It is a promising solution for achieving high
energy-efficiency in different machine learning tasks, such as classification,
semi-supervised learning and clustering. A weakness of existing HD
computing-based ML algorithms is the fact that they have to be binarized for
achieving very high energy-efficiency. At the same time, binarized models reach
lower classification accuracies. To solve the problem of the trade-off between
energy-efficiency and classification accuracy, we propose the QubitHD
algorithm. It stochastically binarizes HD-based algorithms, while maintaining
comparable classification accuracies to their non-binarized counterparts. The
FPGA implementation of QubitHD provides a 65% improvement in terms of
energy-efficiency, and a 95% improvement in terms of the training time, as
compared to state-of-the-art HD-based ML algorithms. It also outperforms
state-of-the-art low-cost classifiers (like Binarized Neural Networks) in terms
of speed and energy-efficiency by an order of magnitude during training and
inference.Comment: 8 pages, 7 figures, 3 table
QHD: A brain-inspired hyperdimensional reinforcement learning algorithm
Reinforcement Learning (RL) has opened up new opportunities to solve a wide
range of complex decision-making tasks. However, modern RL algorithms, e.g.,
Deep Q-Learning, are based on deep neural networks, putting high computational
costs when running on edge devices. In this paper, we propose QHD, a
Hyperdimensional Reinforcement Learning, that mimics brain properties toward
robust and real-time learning. QHD relies on a lightweight brain-inspired model
to learn an optimal policy in an unknown environment. We first develop a novel
mathematical foundation and encoding module that maps state-action space into
high-dimensional space. We accordingly develop a hyperdimensional regression
model to approximate the Q-value function. The QHD-powered agent makes
decisions by comparing Q-values of each possible action. We evaluate the effect
of the different RL training batch sizes and local memory capacity on the QHD
quality of learning. Our QHD is also capable of online learning with tiny local
memory capacity, which can be as small as the training batch size. QHD provides
real-time learning by further decreasing the memory capacity and the batch
size. This makes QHD suitable for highly-efficient reinforcement learning in
the edge environment, where it is crucial to support online and real-time
learning. Our solution also supports a small experience replay batch size that
provides 12.3 times speedup compared to DQN while ensuring minimal quality
loss. Our evaluation shows QHD capability for real-time learning, providing
34.6 times speedup and significantly better quality of learning than
state-of-the-art deep RL algorithms
- …