1,796 research outputs found

    On the Existence and Design of the Best Stack Filter Based Associative Memory

    Get PDF
    The associative memory of a stack filter is defined to be the set of root signals of that filter. If the root sets of two stack filters both contain a desired set of patterns, but one filter’s root set is smaller than the other, then the filter with the smaller root set is said to be better for that set of patterns. Any filter which has the smallest number of roots containing the specified set of patterns is said to be a best filter. The configuration of the family of best filters is described via a graphical approach which specifies an upper and lower bound for the subset of possible best filters which are furthest from the sets of type-1 and type-2 stack filters. Knowledge of this configuration leads to an algorithm which can produce a near-best filter. This new method of constructing associative memories does not require the desired set of patterns to be independent and it can construct a much better filter than the methods in [I]

    The number and probability of canalizing functions

    Full text link
    Canalizing functions have important applications in physics and biology. For example, they represent a mechanism capable of stabilizing chaotic behavior in Boolean network models of discrete dynamical systems. When comparing the class of canalizing functions to other classes of functions with respect to their evolutionary plausibility as emergent control rules in genetic regulatory systems, it is informative to know the number of canalizing functions with a given number of input variables. This is also important in the context of using the class of canalizing functions as a constraint during the inference of genetic networks from gene expression data. To this end, we derive an exact formula for the number of canalizing Boolean functions of n variables. We also derive a formula for the probability that a random Boolean function is canalizing for any given bias p of taking the value 1. In addition, we consider the number and probability of Boolean functions that are canalizing for exactly k variables. Finally, we provide an algorithm for randomly generating canalizing functions with a given bias p and any number of variables, which is needed for Monte Carlo simulations of Boolean networks

    Holographic Generative Memory: Neurally Inspired One-Shot Learning with Memory Augmented Neural Networks

    Get PDF
    Humans quickly parse and categorize stimuli by combining perceptual information and previously learned knowledge. We are capable of learning new information quickly with only a few observations, and sometimes even a single observation. This one-shot learning (OSL) capability is still very difficult to realize in machine learning models. Novelty is commonly thought to be the primary driver for OSL. However, neuroscience literature shows that biological OSL mechanisms are guided by uncertainty, rather than novelty, motivating us to explore this idea for machine learning. In this work, we investigate OSL for neural networks using more robust compositional knowledge representations and a biologically inspired uncertainty mechanism to modulate the rate of learning. We introduce several new neural network models that combine Holographic Reduced Representation (HRR) and Variational Autoencoders. Extending these new models culminates in the Holographic Generative Memory (HGMEM) model. HGMEM is a novel unsupervised memory augmented neural network. It offers solutions to many of the practical drawbacks associated with HRRs while also providing storage, recall, and generation of latent compositional knowledge representations. Uncertainty is measured as a native part of HGMEM operation by applying trained probabilistic dropout to fully-connected layers. During training, the learning rate is modulated using these uncertainty measurements in a manner inspired by our motivating neuroscience mechanism for OSL. Model performance is demonstrated on several image datasets with experiments that reflect our theoretical approach

    Analyzing Transformer Dynamics as Movement through Embedding Space

    Full text link
    Transformer language models exhibit intelligent behaviors such as understanding natural language, recognizing patterns, acquiring knowledge, reasoning, planning, reflecting and using tools. This paper explores how their underlying mechanics give rise to intelligent behaviors. We adopt a systems approach to analyze Transformers in detail and develop a mathematical framework that frames their dynamics as movement through embedding space. This novel perspective provides a principled way of thinking about the problem and reveals important insights related to the emergence of intelligence: 1. At its core the Transformer is a Embedding Space walker, mapping intelligent behavior to trajectories in this vector space. 2. At each step of the walk, it composes context into a single composite vector whose location in Embedding Space defines the next step. 3. No learning actually occurs during decoding; in-context learning and generalization are simply the result of different contexts composing into different vectors. 4. Ultimately the knowledge, intelligence and skills exhibited by the model are embodied in the organization of vectors in Embedding Space rather than in specific neurons or layers. These abilities are properties of this organization. 5. Attention's contribution boils down to the association-bias it lends to vector composition and which influences the aforementioned organization. However, more investigation is needed to ascertain its significance. 6. The entire model is composed from two principal operations: data independent filtering and data dependent aggregation. This generalization unifies Transformers with other sequence models and across modalities. Building upon this foundation we formalize and test a semantic space theory which posits that embedding vectors represent semantic concepts and find some evidence of its validity

    Using machine learning techniques to evaluate multicore soft error reliability

    Get PDF
    Virtual platform frameworks have been extended to allow earlier soft error analysis of more realistic multicore systems (i.e., real software stacks, state-of-the-art ISAs). The high observability and simulation performance of underlying frameworks enable to generate and collect more error/failurerelated data, considering complex software stack configurations, in a reasonable time. When dealing with sizeable failure-related data sets obtained from multiple fault campaigns, it is essential to filter out parameters (i.e., features) without a direct relationship with the system soft error analysis. In this regard, this paper proposes the use of supervised and unsupervised machine learning techniques, aiming to eliminate non-relevant information as well as identify the correlation between fault injection results and application and platform characteristics. This novel approach provides engineers with appropriate means that able are able to investigate new and more efficient fault mitigation techniques. The underlying approach is validated with an extensive data set gathered from more than 1.2 million fault injections, comprising several benchmarks, a Linux OS and parallelization libraries (e.g., MPI, OpenMP), as well as through a realistic automotive case study

    Empirical study of parallel LRU simulation algorithms

    Get PDF
    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs
    corecore