Search CORE

490,491 research outputs found

NASTRAN computer resource management for the matrix decomposition modules

Author: Bolz C. W.
Publication venue
Publication date
Field of study

Detailed computer resource measurements of the NASTRAN matrix decomposition spill logic were made using a software input/output monitor. These measurements showed that, in general, job cost can be reduced by avoiding spill. The results indicated that job cost can be minimized by using dynamic memory management. A prototype memory management system is being implemented and evaluated for the CDC Cyber computer

NASA Technical Reports Server

SuperNeurons: Dynamic GPU Memory Management for Training Deep Neural Networks

Author: Abadi M.
Coates A.
Collobert R.
Dean J.
Krizhevsky A.
LeCun Y.
Szegedy C.
Vanhoucke V.
Wang L.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/01/2018
Field of study

Going deeper and wider in neural architectures improves the accuracy, while the limited GPU DRAM places an undesired restriction on the network design domain. Deep Learning (DL) practitioners either need change to less desired network architectures, or nontrivially dissect a network across multiGPUs. These distract DL practitioners from concentrating on their original machine learning tasks. We present SuperNeurons: a dynamic GPU memory scheduling runtime to enable the network training far beyond the GPU DRAM capacity. SuperNeurons features 3 memory optimizations, \textit{Liveness Analysis}, \textit{Unified Tensor Pool}, and \textit{Cost-Aware Recomputation}, all together they effectively reduce the network-wide peak memory usage down to the maximal memory usage among layers. We also address the performance issues in those memory saving techniques. Given the limited GPU DRAM, SuperNeurons not only provisions the necessary memory for the training, but also dynamically allocates the memory for convolution workspaces to achieve the high performance. Evaluations against Caffe, Torch, MXNet and TensorFlow have demonstrated that SuperNeurons trains at least 3.2432 deeper network than current ones with the leading performance. Particularly, SuperNeurons can train ResNet2500 that has

10^4

basic network layers on a 12GB K40c.Comment: PPoPP '2018: 23nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programmin

arXiv.org e-Print Archive

Crossref

Recommended from our members

Elements of latent learning in a maze environment

Author: Granger Richard H.
McNulty Dale
Publication venue: eScholarship, University of California
Publication date: 03/01/1985
Field of study

A general purpose learning program is described which demonstrates a latent learning ability by operating at two separate goal pursuit levels. At one level are the constant, implicit goals associated with the system's memory management mechanisms. At the higher level are the dynamic, explicit behavioral goals which the implicit goals enable by manipulating memory representations to conform to the external surroundings. The program is shown to negotiate a simulated maze environment by the step-wise refinement of its latently learned experiences

eScholarship - University of California

Benchmarking Memory Management Capabilities within ROOT-Sim

Author: Pellegrini Alessandro
Quaglia Francesco
Vitali Roberto
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

In parallel discrete event simulation techniques, the simulation model is partitioned into objects, concurrently executing events on different CPUs and/or multiple CPUCores. In such a context, run-time supports for logical time synchronization across the different simulation objects play a central role in determining the effectiveness of the speciﬁc parallel simulation environment. In this paper we present an experimental evaluation of the memory management capabilities offered by the ROme OpTimistic Simulator (ROOT-Sim). This is an open source parallel simulation environment transparently supporting optimistic synchronization via recoverability (based on incremental log/restore techniques) of any type of memory operation affecting the state of simulation objects, i.e., memory allocation, deallocation and update operations. The experimental study is based on a synthetic benchmark which mimics different read/write patterns inside the dynamic memory map associated with the state of simulation objects. This allows sensibility analysis of time and space effects due to the memory management subsystem while varying the type and the locality of the accesses associated with event processin

ART

Archivio della ricerca- Università di Roma La Sapienza

DYNAMIC MEMORY MANAGEMENT WITH REDUCED FRAGMENTATION USING THE BEST-FIT APPROACH

Author: INC HP
Publication venue: Technical Disclosure Commons
Publication date: 21/12/2020
Field of study

This disclosure relates to the field of Dynamic memory management in general. Disclosed idea makes use of the Best fit approach which makes use of the balanced trees with nodes sorted based on key values corresponding to free memory portion sizes. Also disclosed is the method to efficiently coalesce the freed memory. This idea addresses the disadvantages of sequential search mechanism of finding the available free space and firstfit approach of memory management in the current flat memory-based allocators that are based on [1] approach. The current mechanism for dynamic memory management in use in most of the systems follows a sequential search for all the operations, this leads to a worst-case time complexity of O(N) and it follows the first-fit approach to allocate the first available free space for any request which leads to fragmentation issues

Technical Disclosure Common