4 research outputs found

    MIMONets: Multiple-Input-Multiple-Output Neural Networks Exploiting Computation in Superposition

    Full text link
    With the advent of deep learning, progressively larger neural networks have been designed to solve complex tasks. We take advantage of these capacity-rich models to lower the cost of inference by exploiting computation in superposition. To reduce the computational burden per input, we propose Multiple-Input-Multiple-Output Neural Networks (MIMONets) capable of handling many inputs at once. MIMONets augment various deep neural network architectures with variable binding mechanisms to represent an arbitrary number of inputs in a compositional data structure via fixed-width distributed representations. Accordingly, MIMONets adapt nonlinear neural transformations to process the data structure holistically, leading to a speedup nearly proportional to the number of superposed input items in the data structure. After processing in superposition, an unbinding mechanism recovers each transformed input of interest. MIMONets also provide a dynamic trade-off between accuracy and throughput by an instantaneous on-demand switching between a set of accuracy-throughput operating points, yet within a single set of fixed parameters. We apply the concept of MIMONets to both CNN and Transformer architectures resulting in MIMOConv and MIMOFormer, respectively. Empirical evaluations show that MIMOConv achieves about 2-4 x speedup at an accuracy delta within [+0.68, -3.18]% compared to WideResNet CNNs on CIFAR10 and CIFAR100. Similarly, MIMOFormer can handle 2-4 inputs at once while maintaining a high average accuracy within a [-1.07, -3.43]% delta on the long range arena benchmark. Finally, we provide mathematical bounds on the interference between superposition channels in MIMOFormer. Our code is available at https://github.com/IBM/multiple-input-multiple-output-nets.Comment: accepted in NeurIPS 202

    Analogical Mapping with Sparse Distributed Memory: A Simple Model that Learns to Generalize from Examples

    Get PDF
    Abstract We present a computational model for the analogical mapping of compositional structures that combines two existing ideas known as holistic mapping vectors and sparse distributed memory. The model enables integration of structural and semantic constraints when learning mappings of the type x i ! y i and computing analogies x j ! y j for novel inputs x j . The model has a one-shot learning process, is randomly initialized, and has three exogenous parameters: the dimensionality D of representations, the memory size S, and the probability v for activation of the memory. After learning three examples, the model generalizes correctly to novel examples. We find minima in the probability of generalization error for certain values of v, S, and the number of different mapping examples learned. These results indicate that the optimal size of the memory scales with the number of different mapping examples learned and that the sparseness of the memory is important. The optimal dimensionality of binary representations is of the order 10 4 , which is consistent with a known analytical estimate and the synapse count for most cortical neurons. We demonstrate that the model can learn analogical mappings of generic two-place relationships, and we calculate the error probabilities for recall and generalization

    Delta rhythms as a substrate for holographic processing in sleep and wakefulness

    Get PDF
    PhD ThesisWe initially considered the theoretical properties and benefits of so-called holographic processing in a specific type of computational problem implied by the theories of synaptic rescaling processes in the biological wake-sleep cycle. This raised two fundamental questions that we attempted to answer by an experimental in vitro electrophysiological approach. We developed a comprehensive experimental paradigm based on a pharmacological model of the wake-sleep-associated delta rhythm measured with a Utah micro-electrode array at the interface between primary and associational areas in the rodent neocortex. We first verified that our in vitro delta rhythm model possessed two key features found in both in vivo rodent and human studies of synaptic rescaling processes in sleep: The first property being that prior local synaptic potentiation in wake leads to increased local delta power in subsequent sleep. The second property is the reactivation in sleep of neural firing patterns observed prior to sleep. By reproducing these findings we confirmed that our model is arguably an adequate medium for further study of the putative sleep-related synaptic rescaling process. In addition we found important differences between neural units that reactivated or deactivated during delta; these were differences in cell types based on unit spike shapes, in prior firing rates and in prior spike-train-to-local-field-potential coherence. Taken together these results suggested a mechanistic chain of explanation of the two observed properties, and set the neurobiological framework for further, more computationally driven analysis. Using the above experimental and theoretical substrate we developed a new method of analysis of micro-electrode array data. The method is a generalization to the electromagnetic case of a well-known technique for processing acoustic microphone array data. This allowed calculation of: The instantaneous spatial energy flow and dissipation in the neocortical areas under the array; The spatial energy source density in analogy to well-known current source density analysis. We then refocused our investigation on the two theoretical questions that we hoped to achieve experimental answers for: Whether the state of the neocortex during a delta rhythm could be described by ergodic statistics, which we determined by analyzing the spectral properties of energy dissipation as a signature of the state of the dynamical system; A more explorative approach prompting an investigation of the spatiotemporal interactions across and along neocortical layers and areas during a delta rhythm, as implied by energy flow patterns. We found that the in vitro rodent neocortex does not conform to ergodic statistics during a pharmacologically driven delta or gamma rhythm. We also found a delta period locked pattern of energy flow across and along layers and areas, which doubled the processing cycle relative to the fundamental delta rhythm, tentatively suggesting a reciprocal, two-stage information processing hierarchy similar to a stochastic Helmholtz machine with a wake-sleep training algorithm. Further, the complex valued energy flow might suggest an improvement to the Helmholtz machine concept by generalizing the complex valued weights of the stochastic network to higher dimensional multi-vectors of a geometric algebra with a metric particularity suited for holographic processes. Finally, preliminary attempts were made to implement and characterize the above network dynamics in silico. We found that a qubit valued network does not allow fully holographic processes, but tentatively suggest that an ebit valued network may display two key properties of general holographic processing

    Learning the systematic transformation of holographic reduced representations

    No full text
    Holographic Reduced Representation is a representational scheme which allows for the representation of variable-sized structures in a distributed manner. It has been shown that these compositional structures can be transformed holistically. However, in order to do so, the transformation vector was constructed by hand. In this paper we present two methods of learning the holistic transformation of Holographic Reduced Representations from examples. We show that the acquired knowledge can be generalised to structures containing unseen elements and to structures more complex than the training examples. These generalisations require a degree of systematicity which to our knowledge has not yet been achieved by other comparable methods
    corecore