490,491 research outputs found

    NASTRAN computer resource management for the matrix decomposition modules

    Get PDF
    Detailed computer resource measurements of the NASTRAN matrix decomposition spill logic were made using a software input/output monitor. These measurements showed that, in general, job cost can be reduced by avoiding spill. The results indicated that job cost can be minimized by using dynamic memory management. A prototype memory management system is being implemented and evaluated for the CDC Cyber computer

    SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

    Full text link
    Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has 10410^4 basic network layers on a 12GB K40c.Comment: PPoPP '2018: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programmin

    Benchmarking Memory Management Capabilities within ROOT-Sim

    Get PDF
    In parallel discrete event simulation techniques, the simulation model is partitioned into objects, concurrently executing events on different CPUs and/or multiple CPUCores. In such a context, run-time supports for logical time synchronization across the different simulation objects play a central role in determining the effectiveness of the specific parallel simulation environment. In this paper we present an experimental evaluation of the memory management capabilities offered by the ROme OpTimistic Simulator (ROOT-Sim). This is an open source parallel simulation environment transparently supporting optimistic synchronization via recoverability (based on incremental log/restore techniques) of any type of memory operation affecting the state of simulation objects, i.e., memory allocation, deallocation and update operations. The experimental study is based on a synthetic benchmark which mimics different read/write patterns inside the dynamic memory map associated with the state of simulation objects. This allows sensibility analysis of time and space effects due to the memory management subsystem while varying the type and the locality of the accesses associated with event processin

    DYNAMIC MEMORY MANAGEMENT WITH REDUCED FRAGMENTATION USING THE BEST-FIT APPROACH

    Get PDF
    This disclosure relates to the field of Dynamic memory management in general. Disclosed idea makes use of the Best fit approach which makes use of the balanced trees with nodes sorted based on key values corresponding to free memory portion sizes. Also disclosed is the method to efficiently coalesce the freed memory. This idea addresses the disadvantages of sequential search mechanism of finding the available free space and firstfit approach of memory management in the current flat memory-based allocators that are based on [1] approach. The current mechanism for dynamic memory management in use in most of the systems follows a sequential search for all the operations, this leads to a worst-case time complexity of O(N) and it follows the first-fit approach to allocate the first available free space for any request which leads to fragmentation issues
    corecore