Search CORE

1,793 research outputs found

Effect of virtual memory on efficient solution of two model problems

Author: Lambiotte J. J., Jr.
Publication venue
Publication date
Field of study

Computers with virtual memory architecture allow programs to be written as if they were small enough to be contained in memory. Two types of problems are investigated to show that this luxury can lead to quite an inefficient performance if the programmer does not interact strongly with the characteristics of the operating system when developing the program. The two problems considered are the simultaneous solutions of a large linear system of equations by Gaussian elimination and a model three-dimensional finite-difference problem. The Control Data STAR-100 computer runs are made to demonstrate the inefficiencies of programming the problems in the manner one would naturally do if the problems were indeed, small enough to be contained in memory. Program redesigns are presented which achieve large improvements in performance through changes in the computational procedure and the data base arrangement

NASA Technical Reports Server

A vectorization of the Hess McDonnell Douglas potential flow program NUED for the STAR-100 computer

Author: Boney L. R.
Smith R. E., Jr.
Publication venue
Publication date
Field of study

The computer program NUED for analyzing potential flow about arbitrary three dimensional lifting bodies using the panel method was modified to use vector operations and run on the STAR-100 computer. A high speed of computation and ability to approximate the body surface with a large number of panels are characteristics of NUEDV. The new program shows that vector operations can be readily implemented in programs of this type to increase the computational speed on the STAR-100 computer. The virtual memory architecture of the STAR-100 facilitates the use of large numbers of panels to approximate the body surface

NASA Technical Reports Server

vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design

Author: Clemons Jason
Gimelshein Natalia
Keckler Stephen W.
Rhu Minsoo
Zulfiqar Arslan
Publication venue
Publication date: 28/07/2016
Field of study

The most widely used machine learning frameworks require users to carefully tune their memory usage so that the deep neural network (DNN) fits into the DRAM capacity of a GPU. This restriction hampers a researcher's flexibility to study different machine learning algorithms, forcing them to either use a less desirable network architecture or parallelize the processing across multiple GPUs. We propose a runtime memory manager that virtualizes the memory usage of DNNs such that both GPU and CPU memory can simultaneously be utilized for training larger DNNs. Our virtualized DNN (vDNN) reduces the average GPU memory usage of AlexNet by up to 89%, OverFeat by 91%, and GoogLeNet by 95%, a significant reduction in memory requirements of DNNs. Similar experiments on VGG-16, one of the deepest and memory hungry DNNs to date, demonstrate the memory-efficiency of our proposal. vDNN enables VGG-16 with batch size 256 (requiring 28 GB of memory) to be trained on a single NVIDIA Titan X GPU card containing 12 GB of memory, with 18% performance loss compared to a hypothetical, oracular GPU with enough memory to hold the entire DNN.Comment: Published as a conference paper at the 49th IEEE/ACM International Symposium on Microarchitecture (MICRO-49), 201

arXiv.org e-Print Archive

Crossref

포항공과대학교

MOSS, an evaluation of software engineering techniques

Author: Bounds J. R.
Pruitt J. L.
Publication venue
Publication date
Field of study

An evaluation of the software engineering techniques used for the development of a Modular Operating System (MOSS) was described. MOSS is a general purpose real time operating system which was developed for the Concept Verification Test (CVT) program. Each of the software engineering techniques was described and evaluated based on the experience of the MOSS project. Recommendations for the use of these techniques on future software projects were also given

NASA Technical Reports Server

FASTCUDA: Open Source FPGA Accelerator &amp; Hardware-Software Codesign Toolset for CUDA Kernels

Author: de la Torre E.()
Lavagno L.()
Lazarescu M.()
Mavroidis I. ()
Papaefstathiou I.()
Papaefstathiou Ioannis(http://users.isc.tuc.gr/~ipapaefstathiou)
Schafer F.()
Παπαευσταθιου Ιωαννης(http://users.isc.tuc.gr/~ipapaefstathiou)
Publication venue: IEEE / Institute of Electrical and Electronics Engineers Incorporated:445 Hoes Lane:Piscataway, NJ 08854:(800)701-4333, (732)981-0060, EMAIL: [email protected], INTERNET: http://www.ieee.org, Fax: (732)981-9667
Publication date: 01/01/2012
Field of study

Using FPGAs as hardware accelerators that communicate with a central CPU is becoming a common practice in the embedded design world but there is no standard methodology and toolset to facilitate this path yet. On the other hand, languages such as CUDA and OpenCL provide standard development environments for Graphical Processing Unit (GPU) programming. FASTCUDA is a platform that provides the necessary software toolset, hardware architecture, and design methodology to efficiently adapt the CUDA approach into a new FPGA design flow. With FASTCUDA, the CUDA kernels of a CUDA-based application are partitioned into two groups with minimal user intervention: those that are compiled and executed in parallel software, and those that are synthesized and implemented in hardware. A modern low power FPGA can provide the processing power (via numerous embedded micro-CPUs) and the logic capacity for both the software and hardware implementations of the CUDA kernels. This paper describes the system requirements and the architectural decisions behind the FASTCUDA approach

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Institutional Repository of the Technical University of Crete

The design of a microprocessor with an object oriented architecture

Author: van Hamersveld F.P.
Publication venue
Publication date: 01/01/1992
Field of study

Repository TU/e

Pure OAI Repository