Search CORE

3,461 research outputs found

ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

Author: Ribbens Calvin J.
Sudarsan Rajesh
Publication venue
Publication date: 01/01/2007
Field of study

Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A traditional application scheduler running on a parallel cluster only supports static scheduling where the number of processors allocated to an application remains fixed throughout the lifetime of execution of the job. Due to the unpredictability in job arrival times and varying resource requirements, static scheduling can result in idle system resources thereby decreasing the overall system throughput. In this paper we present a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executed on distributed memory platforms. The framework includes a scheduler that supports resizing of applications, an API to enable applications to interact with the scheduler, and a library that makes resizing viable. Applications executed using the ReSHAPE scheduler framework can expand to take advantage of additional free processors or can shrink to accommodate a high priority application, without getting suspended. In our research, we have mainly focused on structured applications that have two-dimensional data arrays distributed across a two-dimensional processor grid. The resize library includes algorithms for processor selection and processor mapping. Experimental results show that the ReSHAPE framework can improve individual job turn-around time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference on Parallel Processing (ICPP'07

arXiv.org e-Print Archive

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Characterizing Deep-Learning I/O Workloads in TensorFlow

Author: Chien Steven W. D.
Herman Pawel
Laure Erwin
Markidis Stefano
Narasimhamurthy Sai
Santos Luis
Sishtla Chaitanya Prasad
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2018
Field of study

The performance of Deep-Learning (DL) computing frameworks rely on the performance of data ingestion and checkpointing. In fact, during the training, a considerable high number of relatively small files are first loaded and pre-processed on CPUs and then moved to accelerator for computation. In addition, checkpointing and restart operations are carried out to allow DL computing frameworks to restart quickly from a checkpoint. Because of this, I/O affects the performance of DL applications. In this work, we characterize the I/O performance and scaling of TensorFlow, an open-source programming framework developed by Google and specifically designed for solving DL problems. To measure TensorFlow I/O performance, we first design a micro-benchmark to measure TensorFlow reads, and then use a TensorFlow mini-application based on AlexNet to measure the performance cost of I/O and checkpointing in TensorFlow. To improve the checkpointing performance, we design and implement a burst buffer. We find that increasing the number of threads increases TensorFlow bandwidth by a maximum of 2.3x and 7.8x on our benchmark environments. The use of the tensorFlow prefetcher results in a complete overlap of computation on accelerator and input pipeline on CPU eliminating the effective cost of I/O on the overall performance. The use of a burst buffer to checkpoint to a fast small capacity storage and copy asynchronously the checkpoints to a slower large capacity storage resulted in a performance improvement of 2.6x with respect to checkpointing directly to slower storage on our benchmark environment.Comment: Accepted for publication at pdsw-DISCS 201

arXiv.org e-Print Archive

Crossref

Exploiting parallel computing with limited program changes using a network of microcomputers

Author: Rogers J. L., Jr.
Sobieszczanski-Sobieski J.
Publication venue
Publication date
Field of study

Network computing and multiprocessor computers are two discernible trends in parallel processing. The computational behavior of an iterative distributed process in which some subtasks are completed later than others because of an imbalance in computational requirements is of significant interest. The effects of asynchronus processing was studied. A small existing program was converted to perform finite element analysis by distributing substructure analysis over a network of four Apple IIe microcomputers connected to a shared disk, simulating a parallel computer. The substructure analysis uses an iterative, fully stressed, structural resizing procedure. A framework of beams divided into three substructures is used as the finite element model. The effects of asynchronous processing on the convergence of the design variables are determined by not resizing particular substructures on various iterations

NASA Technical Reports Server

Boosting Multi-Core Reachability Performance with Shared Hash Tables

Author: Laarman Alfons
van de Pol Jaco
Weber Michael
Publication venue
Publication date: 01/01/2010
Field of study

This paper focuses on data structures for multi-core reachability, which is a key component in model checking algorithms and other verification methods. A cornerstone of an efficient solution is the storage of visited states. In related work, static partitioning of the state space was combined with thread-local storage and resulted in reasonable speedups, but left open whether improvements are possible. In this paper, we present a scaling solution for shared state storage which is based on a lockless hash table implementation. The solution is specifically designed for the cache architecture of modern CPUs. Because model checking algorithms impose loose requirements on the hash table operations, their design can be streamlined substantially compared to related work on lockless hash tables. Still, an implementation of the hash table presented here has dozens of sensitive performance parameters (bucket size, cache line size, data layout, probing sequence, etc.). We analyzed their impact and compared the resulting speedups with related tools. Our implementation outperforms two state-of-the-art multi-core model checkers (SPIN and DiVinE) by a substantial margin, while placing fewer constraints on the load balancing and search algorithms.Comment: preliminary repor

arXiv.org e-Print Archive

CiteSeerX

University of Twente Research Information

Web-based user interface prototyping and simulation

Author: Faria Paulo Cristiano Pinheiro
Publication venue
Publication date: 04/04/2014
Field of study

PVSio-web is a platform for the simulation and prototyping of user interfaces that has been developed by researchers at Queen Mary University of London. This platform aims to reduce barriers to the use of PVS by users unfamiliar with formal methods. The main features of PVSio-web focus on creating, opening and saving projects, loading images and creating widget areas over them, and not least, editing PVS files. Editing files is limited to only one file per project and the platform also does not have image editing features. PVSio-web can then be improved by implementing features to support editing multiple files and images (such as cropping and resizing). The interaction areas can also be improved to thereby enhance the quality of the prototype, by adjusting the precision of the dimensions and positioning of the area relatively to the image. In this dissertation the improvements achieved on the editing of files, images and interaction areas in PVSio-web, in order to increase the quality and optimize its use in a real environment, are described.PVSio-web é uma plataforma para prototipagem e simulação de interfaces de utilizador que tem vindo a ser desenvolvida por investigadores da Queen Mary Universidade de Londres. Esta plataforma tem por objectivo diminuir as barreiras à utilização do PVS por parte de utilizadores não familiarizados com métodos formais. Para dar suporte à criação de protótipos, o PVSio-web possui funcionalidades para criar, abrir e gravar projectos, carregar de imagens e definir áreas de interação sobre esta e, não menos importante, edição de ficheiros PVS. A edição de ficheiros está limitada a apenas um único ficheiro por projecto e a plataforma não possui também funcionalidades de edição de imagem. O PVSio-web pode então ser melhorado com a implementação de funcionalidades para o suporte de edição de múltiplos ficheiros e de imagem (por exemplo corte e redimensionamento). As áreas de interação podem também ser melhoradas para assim aumentar a qualidade do protótipo, ajustando a precisão das dimensões e posicionamento da área em relação à imagem. Nesta dissertação são descritos os melhoramentos realizados a nível de edição de ficheiros, imagens e áreas de interação no PVSio-web de modo a aumentar a qualidade e otimizar o seu uso em ambiente real

Universidade do Minho: RepositoriUM

Operating-system support for distributed multimedia

Author: Leslie Ian M.
Mcauley Derek
Mullender Sape J.
Publication venue: University of Twente, Faculty of Computer Science
Publication date: 01/01/1993
Field of study

Multimedia applications place new demands upon processors, networks and operating systems. While some network designers, through ATM for example, have considered revolutionary approaches to supporting multimedia, the same cannot be said for operating systems designers. Most work is evolutionary in nature, attempting to identify additional features that can be added to existing systems to support multimedia. Here we describe the Pegasus project's attempt to build an integrated hardware and operating system environment from\ud the ground up specifically targeted towards multimedia

CiteSeerX

University of Twente Research Information