4,367 research outputs found

    Algorithm-Directed Crash Consistence in Non-Volatile Memory for HPC

    Full text link
    Fault tolerance is one of the major design goals for HPC. The emergence of non-volatile memories (NVM) provides a solution to build fault tolerant HPC. Data in NVM-based main memory are not lost when the system crashes because of the non-volatility nature of NVM. However, because of volatile caches, data must be logged and explicitly flushed from caches into NVM to ensure consistence and correctness before crashes, which can cause large runtime overhead. In this paper, we introduce an algorithm-based method to establish crash consistence in NVM for HPC applications. We slightly extend application data structures or sparsely flush cache blocks, which introduce ignorable runtime overhead. Such extension or cache flushing allows us to use algorithm knowledge to \textit{reason} data consistence or correct inconsistent data when the application crashes. We demonstrate the effectiveness of our method for three algorithms, including an iterative solver, dense matrix multiplication, and Monte-Carlo simulation. Based on comprehensive performance evaluation on a variety of test environments, we demonstrate that our approach has very small runtime overhead (at most 8.2\% and less than 3\% in most cases), much smaller than that of traditional checkpoint, while having the same or less recomputation cost.Comment: 12 page

    Integer Sparse Distributed Memory and Modular Composite Representation

    Get PDF
    Challenging AI applications, such as cognitive architectures, natural language understanding, and visual object recognition share some basic operations including pattern recognition, sequence learning, clustering, and association of related data. Both the representations used and the structure of a system significantly influence which tasks and problems are most readily supported. A memory model and a representation that facilitate these basic tasks would greatly improve the performance of these challenging AI applications.Sparse Distributed Memory (SDM), based on large binary vectors, has several desirable properties: auto-associativity, content addressability, distributed storage, robustness over noisy inputs that would facilitate the implementation of challenging AI applications. Here I introduce two variations on the original SDM, the Extended SDM and the Integer SDM, that significantly improve these desirable properties, as well as a new form of reduced description representation named MCR.Extended SDM, which uses word vectors of larger size than address vectors, enhances its hetero-associativity, improving the storage of sequences of vectors, as well as of other data structures. A novel sequence learning mechanism is introduced, and several experiments demonstrate the capacity and sequence learning capability of this memory.Integer SDM uses modular integer vectors rather than binary vectors, improving the representation capabilities of the memory and its noise robustness. Several experiments show its capacity and noise robustness. Theoretical analyses of its capacity and fidelity are also presented.A reduced description represents a whole hierarchy using a single high-dimensional vector, which can recover individual items and directly be used for complex calculations and procedures, such as making analogies. Furthermore, the hierarchy can be reconstructed from the single vector. Modular Composite Representation (MCR), a new reduced description model for the representation used in challenging AI applications, provides an attractive tradeoff between expressiveness and simplicity of operations. A theoretical analysis of its noise robustness, several experiments, and comparisons with similar models are presented.My implementations of these memories include an object oriented version using a RAM cache, a version for distributed and multi-threading execution, and a GPU version for fast vector processing

    The edge cloud: A holistic view of communication, computation and caching

    Get PDF
    The evolution of communication networks shows a clear shift of focus from just improving the communications aspects to enabling new important services, from Industry 4.0 to automated driving, virtual/augmented reality, Internet of Things (IoT), and so on. This trend is evident in the roadmap planned for the deployment of the fifth generation (5G) communication networks. This ambitious goal requires a paradigm shift towards a vision that looks at communication, computation and caching (3C) resources as three components of a single holistic system. The further step is to bring these 3C resources closer to the mobile user, at the edge of the network, to enable very low latency and high reliability services. The scope of this chapter is to show that signal processing techniques can play a key role in this new vision. In particular, we motivate the joint optimization of 3C resources. Then we show how graph-based representations can play a key role in building effective learning methods and devising innovative resource allocation techniques.Comment: to appear in the book "Cooperative and Graph Signal Pocessing: Principles and Applications", P. Djuric and C. Richard Eds., Academic Press, Elsevier, 201

    Macroservers: An Execution Model for DRAM Processor-In-Memory Arrays

    Get PDF
    The emergence of semiconductor fabrication technology allowing a tight coupling between high-density DRAM and CMOS logic on the same chip has led to the important new class of Processor-In-Memory (PIM) architectures. Newer developments provide powerful parallel processing capabilities on the chip, exploiting the facility to load wide words in single memory accesses and supporting complex address manipulations in the memory. Furthermore, large arrays of PIMs can be arranged into a massively parallel architecture. In this report, we describe an object-based programming model based on the notion of a macroserver. Macroservers encapsulate a set of variables and methods; threads, spawned by the activation of methods, operate asynchronously on the variables' state space. Data distributions provide a mechanism for mapping large data structures across the memory region of a macroserver, while work distributions allow explicit control of bindings between threads and data. Both data and work distributuions are first-class objects of the model, supporting the dynamic management of data and threads in memory. This offers the flexibility required for fully exploiting the processing power and memory bandwidth of a PIM array, in particular for irregular and adaptive applications. Thread synchronization is based on atomic methods, condition variables, and futures. A special type of lightweight macroserver allows the formulation of flexible scheduling strategies for the access to resources, using a monitor-like mechanism

    Hyperspectral Unmixing Overview: Geometrical, Statistical, and Sparse Regression-Based Approaches

    Get PDF
    Imaging spectrometers measure electromagnetic energy scattered in their instantaneous field view in hundreds or thousands of spectral channels with higher spectral resolution than multispectral cameras. Imaging spectrometers are therefore often referred to as hyperspectral cameras (HSCs). Higher spectral resolution enables material identification via spectroscopic analysis, which facilitates countless applications that require identifying materials in scenarios unsuitable for classical spectroscopic analysis. Due to low spatial resolution of HSCs, microscopic material mixing, and multiple scattering, spectra measured by HSCs are mixtures of spectra of materials in a scene. Thus, accurate estimation requires unmixing. Pixels are assumed to be mixtures of a few materials, called endmembers. Unmixing involves estimating all or some of: the number of endmembers, their spectral signatures, and their abundances at each pixel. Unmixing is a challenging, ill-posed inverse problem because of model inaccuracies, observation noise, environmental conditions, endmember variability, and data set size. Researchers have devised and investigated many models searching for robust, stable, tractable, and accurate unmixing algorithms. This paper presents an overview of unmixing methods from the time of Keshava and Mustard's unmixing tutorial [1] to the present. Mixing models are first discussed. Signal-subspace, geometrical, statistical, sparsity-based, and spatial-contextual unmixing algorithms are described. Mathematical problems and potential solutions are described. Algorithm characteristics are illustrated experimentally.Comment: This work has been accepted for publication in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensin
    corecore