907 research outputs found
Mixed Order Hyper-Networks for Function Approximation and Optimisation
Many systems take inputs, which can be measured and sometimes controlled, and outputs, which can also be measured and which depend on the inputs. Taking numerous measurements from such systems produces data, which may be used to either model the system with the goal of predicting the output associated with a given input (function approximation, or regression) or of finding the input settings required to produce a desired output (optimisation, or search). Approximating or optimising a function is central to the field of computational intelligence.
There are many existing methods for performing regression and optimisation based on samples of data but they all have limitations. Multi layer perceptrons (MLPs) are universal approximators, but they suffer from the black box problem, which means their structure and the function they implement is opaque to the user. They also suffer from a propensity to become trapped in local minima or large plateaux in the error function during learning. A regression method with a structure that allows models to be compared, human knowledge to be extracted, optimisation searches to be guided and model complexity to be controlled is desirable. This thesis presents such as method.
This thesis presents a single framework for both regression and optimisation: the mixed order hyper network (MOHN). A MOHN implements a function f:{-1,1}^n ->R to arbitrary precision. The structure of a MOHN makes the ways in which input variables interact to determine the function output explicit, which allows human insights and complexity control that are very difficult in neural networks with hidden units. The explicit structure representation also allows efficient algorithms for searching for an input pattern that leads to a desired output. A number of learning rules for estimating the weights based on a sample of data are presented along with a heuristic method for choosing which connections to include in a model. Several methods for searching a MOHN for inputs that lead to a desired output are compared.
Experiments compare a MOHN to an MLP on regression tasks. The MOHN is found to achieve a comparable level of accuracy to an MLP but suffers less from local minima in the error function and shows less variance across multiple training trials. It is also easier to interpret and combine from an ensemble. The trade-off between the fit of a model to its training data and that to an independent set of test data is shown to be easier to control in a MOHN than an MLP.
A MOHN is also compared to a number of existing optimisation methods including those using estimation of distribution algorithms, genetic algorithms and simulated annealing. The MOHN is able to find optimal solutions in far fewer function evaluations than these methods on tasks selected from the literature
Benchmarking Hebbian learning rules for associative memory
Associative memory or content addressable memory is an important component
function in computer science and information processing and is a key concept in
cognitive and computational brain science. Many different neural network
architectures and learning rules have been proposed to model associative memory
of the brain while investigating key functions like pattern completion and
rivalry, noise reduction, and storage capacity. A less investigated but
important function is prototype extraction where the training set comprises
pattern instances generated by distorting prototype patterns and the task of
the trained network is to recall the correct prototype pattern given a new
instance. In this paper we characterize these different aspects of associative
memory performance and benchmark six different learning rules on storage
capacity and prototype extraction. We consider only models with Hebbian
plasticity that operate on sparse distributed representations with unit
activities in the interval [0,1]. We evaluate both non-modular and modular
network architectures and compare performance when trained and tested on
different kinds of sparse random binary pattern sets, including correlated
ones. We show that covariance learning has a robust but low storage capacity
under these conditions and that the Bayesian Confidence Propagation learning
rule (BCPNN) is superior with a good margin in all cases except one, reaching a
three times higher composite score than the second best learning rule tested.Comment: 24 pages, 9 figure
Full-Stack Optimization for CAM-Only DNN Inference
The accuracy of neural networks has greatly improved across various domains
over the past years. Their ever-increasing complexity, however, leads to
prohibitively high energy demands and latency in von Neumann systems. Several
computing-in-memory (CIM) systems have recently been proposed to overcome this,
but trade-offs involving accuracy, hardware reliability, and scalability for
large models remain a challenge. Additionally, for some CIM designs, the
activation movement still requires considerable time and energy. This paper
explores the combination of algorithmic optimizations for ternary weight neural
networks and associative processors (APs) implemented using racetrack memory
(RTM). We propose a novel compilation flow to optimize convolutions on APs by
reducing their arithmetic intensity. By leveraging the benefits of RTM-based
APs, this approach substantially reduces data transfers within the memory while
addressing accuracy, energy efficiency, and reliability concerns. Concretely,
our solution improves the energy efficiency of ResNet-18 inference on ImageNet
by 7.5x compared to crossbar in-memory accelerators while retaining software
accuracy.Comment: To be presented at DATE2
Structure Discovery in Mixed Order Hyper Networks
Background Mixed Order Hyper Networks (MOHNs) are a type of neural network in which the interactions between inputs are modelled explicitly by weights that can connect any number of neurons. Such networks have a human readability that networks with hidden units lack. They can be used for regression, classification or as content addressable memories and have been shown to be useful as fitness function models in constraint satisfaction tasks. They are fast to train and, when their structure is fixed, do not suffer from local minima in the cost function during training. However, their main drawback is that the correct structure (which neurons to connect with weights) must be discovered from data and an exhaustive search is not possible for networks of over around 30 inputs. Results This paper presents an algorithm designed to discover a set of weights that satisfy the joint constraints of low training error and a parsimonious model. The combined structure discovery and weight learning process was found to be faster, more accurate and have less variance than training an MLP. Conclusions There are a number of advantages to using higher order weights rather than hidden units in a neural network but discovering the correct structure for those weights can be challenging. With the method proposed in this paper, the use of high order networks becomes tractable
Holographic Generative Memory: Neurally Inspired One-Shot Learning with Memory Augmented Neural Networks
Humans quickly parse and categorize stimuli by combining perceptual information and previously learned knowledge. We are capable of learning new information quickly with only a few observations, and sometimes even a single observation. This one-shot learning (OSL) capability is still very difficult to realize in machine learning models. Novelty is commonly thought to be the primary driver for OSL. However, neuroscience literature shows that biological OSL mechanisms are guided by uncertainty, rather than novelty, motivating us to explore this idea for machine learning.
In this work, we investigate OSL for neural networks using more robust compositional knowledge representations and a biologically inspired uncertainty mechanism to modulate the rate of learning. We introduce several new neural network models that combine Holographic Reduced Representation (HRR) and Variational Autoencoders. Extending these new models culminates in the Holographic Generative Memory (HGMEM) model.
HGMEM is a novel unsupervised memory augmented neural network. It offers solutions to many of the practical drawbacks associated with HRRs while also providing storage, recall, and generation of latent compositional knowledge representations. Uncertainty is measured as a native part of HGMEM operation by applying trained probabilistic dropout to fully-connected layers. During training, the learning rate is modulated using these uncertainty measurements in a manner inspired by our motivating neuroscience mechanism for OSL. Model performance is demonstrated on several image datasets with experiments that reflect our theoretical approach
- …