Search CORE

2,310 research outputs found

ReSHAPE: A Framework for Dynamic Resizing and Scheduling of Homogeneous Applications in a Parallel Environment

Author: Ribbens Calvin J.
Sudarsan Rajesh
Publication venue
Publication date: 01/01/2007
Field of study

Applications in science and engineering often require huge computational resources for solving problems within a reasonable time frame. Parallel supercomputers provide the computational infrastructure for solving such problems. A traditional application scheduler running on a parallel cluster only supports static scheduling where the number of processors allocated to an application remains fixed throughout the lifetime of execution of the job. Due to the unpredictability in job arrival times and varying resource requirements, static scheduling can result in idle system resources thereby decreasing the overall system throughput. In this paper we present a prototype framework called ReSHAPE, which supports dynamic resizing of parallel MPI applications executed on distributed memory platforms. The framework includes a scheduler that supports resizing of applications, an API to enable applications to interact with the scheduler, and a library that makes resizing viable. Applications executed using the ReSHAPE scheduler framework can expand to take advantage of additional free processors or can shrink to accommodate a high priority application, without getting suspended. In our research, we have mainly focused on structured applications that have two-dimensional data arrays distributed across a two-dimensional processor grid. The resize library includes algorithms for processor selection and processor mapping. Experimental results show that the ReSHAPE framework can improve individual job turn-around time and overall system throughput.Comment: 15 pages, 10 figures, 5 tables Submitted to International Conference on Parallel Processing (ICPP'07

arXiv.org e-Print Archive

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration

Author: Carretero Jesús
Marinescu Maria-Cristina
Martín Gonzalo
Singh David E.
Publication venue: 'Elsevier BV'
Publication date: 01/01/2015
Field of study

The work in this paper focuses on providing malleability to MPI applications by using a novel performance-aware dynamic reconfiguration technique. This paper describes the design and implementation of Flex-MPI, an MPI library extension which can automatically monitor and predict the performance of applications, balance and redistribute the workload, and reconfigure the application at runtime by changing the number of processes. Unlike existent approaches, our reconfiguring policy is guided by user-defined performance criteria. We focus on iterative SPMD programs, a class of applications with critical mass within the scientific community. Extensive experiments show that Flex-MPI can improve the performance, parallel efficiency, and cost-efficiency of MPI programs with a minimal effort from the programmer.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013- 41350-P, Scalable Data Management Techniques for High-End Computing Systems, and EU under the COST Program Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)Peer ReviewedPostprint (author's final draft

Migration energy aware reconfigurations of virtual network function instances in NFV architectures

Author: Ammar Mostafa
Eramo Vincenzo
Lavacca FRANCESCO GIACINTO
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Network function virtualization (NFV) is a new network architecture framework that implements network functions in software running on a pool of shared commodity servers. NFV can provide the infrastructure flexibility and agility needed to successfully compete in today's evolving communications landscape. Any service is represented by a service function chain (SFC) that is a set of VNFs to be executed according to a given order. The running of VNFs needs the instantiation of VNF instances (VNFIs) that are software modules executed on virtual machines. This paper deals with the migration problem of the VNFIs needed in the low traffic periods to turn OFF servers and consequently to save energy consumption. Though the consolidation allows for energy saving, it has also negative effects as the quality of service degradation or the energy consumption needed for moving the memories associated to the VNFI to be migrated. We focus on cold migration in which virtual machines are redundant and suspended before performing migration. We propose a migration policy that determines when and where to migrate VNFI in response to changes to SFC request intensity. The objective is to minimize the total energy consumption given by the sum of the consolidation and migration energies. We formulate the energy aware VNFI migration problem and after proving that it is NP-hard, we propose a heuristic based on the Viterbi algorithm able to determine the migration policy with low computational complexity. The results obtained by the proposed heuristic show how the introduced policy allows for a reduction of the migration energy and consequently lower total energy consumption with respect to the traditional policies. The energy saving can be on the order of 40% with respect to a policy in which migration is not performed

Archivio della ricerca- Università di Roma La Sapienza

Optimizing Irregular Communication with Neighborhood Collectives and Locality-Aware Parallelism

Author: Bienz Amanda
Collom Gerald
Li Rui Peng
Publication venue
Publication date: 02/06/2023
Field of study

Irregular communication often limits both the performance and scalability of parallel applications. Typically, applications individually implement irregular messages using point-to-point communications, and any optimizations are added directly into the application. As a result, these optimizations lack portability. There is no easy way to optimize point-to-point messages within MPI, as the interface for single messages provides no information on the collection of all communication to be performed. However, the persistent neighbor collective API, released in the MPI 4 standard, provides an interface for portable optimizations of irregular communication within MPI libraries. This paper presents methods for optimizing irregular communication within neighborhood collectives, analyzes the impact of replacing point-to-point communication in existing codebases such as Hypre BoomerAMG with neighborhood collectives, and finally shows an up to 1.32x speedup on sparse matrix-vector multiplication within a BoomerAMG solve through the use of our optimized neighbor collectives. The authors analyze multiple implementations of neighborhood collectives, including a standard implementation, which simply wraps standard point-to-point communication, as well as multiple implementations of locality-aware aggregation. All optimizations are available in an open-source codebase, MPI Advance, which sits on top of MPI, allowing for optimizations to be added into existing codebases regardless of the system MPI install

arXiv.org e-Print Archive

Load sharing for optimistic parallel simulations on multicore machines

Author: PELLEGRINI ALESSANDRO
QUAGLIA Francesco
VITALI Roberto
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2012
Field of study

Parallel Discrete Event Simulation (PDES) is based on the partitioning of the simulation model into distinct Logical Processes (LPs), each one modeling a portion of the entire system, which are allowed to execute simulation events concurrently. This allows exploiting parallel computing architectures to speedup model execution, and to make very large models tractable. In this article we cope with the optimistic approach to PDES, where LPs are allowed to concurrently process their events in a speculative fashion, and rollback/ recovery techniques are used to guarantee state consistency in case of causality violations along the speculative execution path. Particularly, we present an innovative load sharing approach targeted at optimizing resource usage for fruitful simulation work when running an optimistic PDES environment on top of multi-processor/multi-core machines. Beyond providing the load sharing model, we also define a load sharing oriented architectural scheme, based on a symmetric multi-threaded organization of the simulation platform. Finally, we present a real implementation of the load sharing architecture within the open source ROme OpTimistic Simulator (ROOT-Sim) package. Experimental data for an assessment of both viability and effectiveness of our proposal are presented as well. Copyright is held by author/owner(s)

Archivio della ricerca- Università di Roma La Sapienza