14 research outputs found

    A Modern Primer on Processing in Memory

    Full text link
    Modern computing systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in computing that cause performance, scalability and energy bottlenecks: (1) data access is a key bottleneck as many important applications are increasingly data-intensive, and memory bandwidth and energy do not scale well, (2) energy consumption is a key limiter in almost all computing platforms, especially server and mobile systems, (3) data movement, especially off-chip to on-chip, is very expensive in terms of bandwidth, energy and latency, much more so than computation. These trends are especially severely-felt in the data-intensive server and energy-constrained mobile systems of today. At the same time, conventional memory technology is facing many technology scaling challenges in terms of reliability, energy, and performance. As a result, memory system architects are open to organizing memory in different ways and making it more intelligent, at the expense of higher cost. The emergence of 3D-stacked memory plus logic, the adoption of error correcting codes inside the latest DRAM chips, proliferation of different main memory standards and chips, specialized for different purposes (e.g., graphics, low-power, high bandwidth, low latency), and the necessity of designing new solutions to serious reliability and security issues, such as the RowHammer phenomenon, are an evidence of this trend. This chapter discusses recent research that aims to practically enable computation close to data, an approach we call processing-in-memory (PIM). PIM places computation mechanisms in or near where the data is stored (i.e., inside the memory chips, in the logic layer of 3D-stacked memory, or in the memory controllers), so that data movement between the computation units and memory is reduced or eliminated.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0398

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    AI/ML Algorithms and Applications in VLSI Design and Technology

    Full text link
    An evident challenge ahead for the integrated circuit (IC) industry in the nanometer regime is the investigation and development of methods that can reduce the design complexity ensuing from growing process variations and curtail the turnaround time of chip manufacturing. Conventional methodologies employed for such tasks are largely manual; thus, time-consuming and resource-intensive. In contrast, the unique learning strategies of artificial intelligence (AI) provide numerous exciting automated approaches for handling complex and data-intensive tasks in very-large-scale integration (VLSI) design and testing. Employing AI and machine learning (ML) algorithms in VLSI design and manufacturing reduces the time and effort for understanding and processing the data within and across different abstraction levels via automated learning algorithms. It, in turn, improves the IC yield and reduces the manufacturing turnaround time. This paper thoroughly reviews the AI/ML automated approaches introduced in the past towards VLSI design and manufacturing. Moreover, we discuss the scope of AI/ML applications in the future at various abstraction levels to revolutionize the field of VLSI design, aiming for high-speed, highly intelligent, and efficient implementations

    On the application of neural networks to symbol systems.

    Get PDF
    While for many years two alternative approaches to building intelligent systems, symbolic AI and neural networks, have each demonstrated specific advantages and also revealed specific weaknesses, in recent years a number of researchers have sought methods of combining the two into a unified methodology which embodies the benefits of each while attenuating the disadvantages. This work sets out to identify the key ideas from each discipline and combine them into an architecture which would be practically scalable for very large network applications. The architecture is based on a relational database structure and forms the environment for an investigation into the necessary properties of a symbol encoding which will permit the singlepresentation learning of patterns and associations, the development of categories and features leading to robust generalisation and the seamless integration of a range of memory persistencies from short to long term. It is argued that if, as proposed by many proponents of symbolic AI, the symbol encoding must be causally related to its syntactic meaning, then it must also be mutable as the network learns and grows, adapting to the growing complexity of the relationships in which it is instantiated. Furthermore, it is argued that in order to create an efficient and coherent memory structure, the symbolic encoding itself must have an underlying structure which is not accessible symbolically; this structure would provide the framework permitting structurally sensitive processes to act upon symbols without explicit reference to their content. Such a structure must dictate how new symbols are created during normal operation. The network implementation proposed is based on K-from-N codes, which are shown to possess a number of desirable qualities and are well matched to the requirements of the symbol encoding. Several networks are developed and analysed to exploit these codes, based around a recurrent version of the non-holographic associati ve memory of Willshaw, et al. The simplest network is shown to have properties similar to those of a Hopfield network, but the storage capacity is shown to be greater, though at a cost of lower signal to noise ratio. Subsequent network additions break each K-from-N pattern into L subsets, each using D-from-N coding, creating cyclic patterns of period L. This step increases the capacity still further but at a cost of lower signal to noise ratio. The use of the network in associating pairs of input patterns with any given output pattern, an architectural requirement, is verified. The use of complex synaptic junctions is investigated as a means to increase storage capacity, to address the stability-plasticity dilemma and to implement the hierarchical aspects of the symbol encoding defined in the architecture. A wide range of options is developed which allow a number of key global parameters to be traded-off. One scheme is analysed and simulated. A final section examines some of the elements that need to be added to our current understanding of neural network-based reasoning systems to make general purpose intelligent systems possible. It is argued that the sections of this work represent pieces of the whole in this regard and that their integration will provide a sound basis for making such systems a reality

    Computational Neural Learning Formalisms for Perceptual Manipulation: Singularity Interaction Dynamics Model.

    Get PDF
    This dissertation addresses a fundamental problem in computational AI--developing a class of massively parallel, neural algorithms for learning robustly, and in real-time, complex nonlinear transformations from representative exemplars. Provision of such a capability is at the core of many real-life problems in robotics, signal processing and control. The concepts of terminal attractors in dynamical systems theory and adjoint operators in nonlinear sensitivity theory are exploited to provide a firm mathematical foundation for learning such mappings with dynamical neural networks, while achieving a dramatic reduction in the overall computational costs. Further, we derive an efficient methodology for handling a multiplicity of application-specific constraints during run-time, that precludes additional retraining or disturbing the synaptic structure of the learned network. The scalability of proposed theoretical models to large-scale embodiments in neural hardware is analyzed. Neurodynamical parameters, e.g., decay constants, response gains, etc., are systematically analyzed to understand their implications on network scalability, convergence, throughput and fault tolerance, during both concurrent simulations and implementation in concurrently asynchronous VLSI, optical and opto-electronic hardware. Dynamical diagnostics, e.g., Lyapunov exponents, are used to formally characterize the widely observed dynamical instability in neural networks as emergent computational chaos . Using contracting operators and nonconstructive theorems from fixed point theory, we rigorously derive necessary and sufficient conditions for eliminating all oscillatory and chaotic behavior in additive-type networks. Extensive benchmarking experiments are conducted with arbitrarily large neural networks (over 100 million interconnects) to verify the methodological robustness of our network conditioning formalisms. Finally, we provide insight for exploiting our proposed repertoire of neural learning formalisms in addressing a fundamental problem in robotics--manipulation controller design for robots operating in unpredictable environments. Using some recent results in task analysis and dynamic modeling we develop the Perceptual Manipulation Architecture . The architecture, conceptualized within a perceptual framework, is shown to be well beyond the state-of-the-art model-directed robotics. For a stronger physical interpretation of its implications, our discussions are embedded in context of a novel systems\u27 concept for automated space operations

    Community-driven & Work-integrated Creation, Use and Evolution of Ontological Knowledge Structures

    Get PDF