1,596 research outputs found
Enhancing the performance of malleable MPI applications by using performance-aware dynamic reconfiguration
The work in this paper focuses on providing malleability to MPI applications by using a novel performance-aware dynamic reconfiguration technique. This paper describes the design and implementation of Flex-MPI, an MPI library extension which can automatically monitor and predict the performance of applications, balance and redistribute the workload, and reconfigure the application at runtime by changing the number of processes. Unlike existent approaches, our reconfiguring policy is guided by user-defined performance criteria. We focus on iterative SPMD programs, a class of applications with critical mass within the scientific community. Extensive experiments show that Flex-MPI can improve the performance, parallel efficiency, and cost-efficiency of MPI programs with a minimal effort from the programmer.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness under the project TIN2013-
41350-P, Scalable Data Management Techniques for High-End Computing Systems, and EU under the COST Program Action IC1305, Network for Sustainable Ultrascale Computing (NESUS)Peer ReviewedPostprint (author's final draft
dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter
Current datacenters are based on server machines, whose mainboard and hardware components form the baseline, monolithic building block that the rest of the system software, middleware and application stack are built upon. This leads to the following limitations: (a) resource proportionality of a multi-tray system is bounded by the basic building block (mainboard), (b) resource allocation to processes or virtual machines (VMs) is bounded by the available resources within the boundary of the mainboard, leading to spare resource fragmentation and inefficiencies, and (c) upgrades must be applied to each and every server even when only a specific component needs to be upgraded. The dRedBox project (Disaggregated Recursive Datacentre-in-a-Box) addresses the above limitations, and proposes the next generation, low-power, across form-factor datacenters, departing from the paradigm of the mainboard-as-a-unit and enabling the creation of function-block-as-a-unit. Hardware-level disaggregation and software-defined wiring of resources is supported by a full-fledged Type-1 hypervisor that can execute commodity virtual machines, which communicate over a low-latency and high-throughput software-defined optical network. To evaluate its novel approach, dRedBox will demonstrate application execution in the domains of network functions virtualization, infrastructure analytics, and real-time video surveillance.This work has been supported in part by EU H2020 ICTproject dRedBox, contract #687632.Peer ReviewedPostprint (author's final draft
Recommended from our members
Context-awareness for mobile sensing: a survey and future directions
The evolution of smartphones together with increasing computational power have empowered developers to create innovative context-aware applications for recognizing user related social and cognitive activities in any situation and at any location. The existence and awareness of the context provides the capability of being conscious of physical environments or situations around mobile device users. This allows network services to respond proactively and intelligently based on such awareness. The key idea behind context-aware applications is to encourage users to collect, analyze and share local sensory knowledge in the purpose for a large scale community use by creating a smart network. The desired network is capable of making autonomous logical decisions to actuate environmental objects, and also assist individuals. However, many open challenges remain, which are mostly arisen due to the middleware services provided in mobile devices have limited resources in terms of power, memory and bandwidth. Thus, it becomes critically important to study how the drawbacks can be elaborated and resolved, and at the same time better understand the opportunities for the research community to contribute to the context-awareness. To this end, this paper surveys the literature over the period of 1991-2014 from the emerging concepts to applications of context-awareness in mobile platforms by providing up-to-date research and future research directions. Moreover, it points out the challenges faced in this regard and enlighten them by proposing possible solutions
SLURM Support for Remote GPU Virtualization: Implementation and Performance Study
© 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.SLURM is a resource manager that can be leveraged to share a collection of heterogeneous resources among the jobs in execution in a cluster. However, SLURM is not designed to handle resources such as graphics processing units (GPUs). Concretely, although SLURM can use a generic resource plugin (GRes) to manage GPUs, with this solution the hardware accelerators can only be accessed by the job that is in execution on the node to which the GPU is attached. This is a serious constraint for remote GPU virtualization technologies, which aim at providing a user-transparent access to all GPUs in cluster, independently of the specific location of the node where the application is running with respect to the GPU node. In this work we introduce a new type of device in SLURM, "rgpu", in order to gain access from any application node to any GPU node in the cluster using rCUDA as the remote GPU virtualization solution. With this new scheduling mechanism, a user can access any number of GPUs, as SLURM schedules the tasks taking into account all the graphics accelerators available in the complete cluster. We present experimental results that show the benefits of this new approach in terms of increased flexibility for the job scheduler.The researchers at UPV were supported by the the Generalitat Valenciana under Grant PROMETEOII/2013/009 of the PROMETEO program phase II. Researchers at UJI were supported by MINECO, by FEDER funds under Grant TIN2011-23283, and by the Fundacion Caixa-CastellĂł Bancaixa (Grant P11B2013-21).Iserte Agut, S.; Castello Gimeno, A.; Mayo Gual, R.; Quintana OrtĂ, ES.; Silla JimĂ©nez, F.; Duato MarĂn, JF.; Reaño González, C.... (2014). SLURM Support for Remote GPU Virtualization: Implementation and Performance Study. En Computer Architecture and High Performance Computing (SBAC-PAD), 2014 IEEE 26th International Symposium on. IEEE. 318-325. https://doi.org/10.1109/SBAC-PAD.2014.49S31832
Easing parallel programming on heterogeneous systems
El modo más frecuente de resolver aplicaciones de HPC (High performance Computing) en tiempos de ejecución razonables y de una forma escalable es mediante el uso de sistemas de cómputo paralelo. La tendencia actual en los sistemas de HPC es la inclusión en la misma máquina de ejecución de varios dispositivos de cómputo, de diferente tipo y arquitectura.
Sin embargo, su uso impone al programador retos especĂficos. Un programador debe ser experto en las herramientas y abstracciones existentes para memoria distribuida, los modelos de programaciĂłn para sistemas de memoria compartida, y los modelos de programaciĂłn especĂficos para para cada tipo de co-procesador, con el fin de crear programas hĂbridos que puedan explotar eficientemente todas las capacidades de la máquina.
Actualmente, todos estos problemas deben ser resueltos por el programador, haciendo asà la programación de una máquina heterogénea un auténtico reto.
Esta Tesis trata varios de los problemas principales relacionados con la programaciĂłn en paralelo de los sistemas altamente heterogĂ©neos y distribuidos. En ella se realizan propuestas que resuelven problemas que van desde la creaciĂłn de cĂłdigos portables entre diferentes tipos de dispositivos, aceleradores, y arquitecturas, consiguiendo a su vez máxima eficiencia, hasta los problemas que aparecen en los sistemas de memoria distribuida relacionados con las comunicaciones y la particiĂłn de estructuras de datosDepartamento de Informática (Arquitectura y TecnologĂa de Computadores, Ciencias de la ComputaciĂłn e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Doctorado en Informátic
Optimization techniques for adaptability in MPI application
The first version of MPI (Message Passing Interface) was released in 1994. At that time, scientific applications for HPC (High Performance Computing) were characterized by a static execution environment. These applications usually had regular computation and communication patterns, operated on dense data structures accessed with good data locality, and ran on homogeneous computing platforms. For these reasons, MPI has become the de facto standard for developing scientific parallel applications for HPC during the last decades.
In recent years scientific applications have evolved in order to cope with several challenges posed by different fields of engineering, economics and medicine among others. These challenges include large amounts of data stored in irregular and sparse data structures with poor data locality to be processed in parallel (big data), algorithms with irregular computation and communication patterns, and heterogeneous computing platforms (grid, cloud and heterogeneous cluster).
On the other hand, over the last years MPI has introduced relevant improvements and new features in order to meet the requirements of dynamic execution environments. Some of them include asynchronous non-blocking communications, collective I/O routines and the dynamic process management interface introduced in MPI 2.0. The dynamic process management interface allows the application to spawn new processes at runtime and enable communication with them. However, this feature has some technical limitations that make the implementation of malleable MPI applications still a challenge.
This thesis proposes FLEX-MPI, a runtime system that extends the functionalities of the MPI standard library and features optimization techniques for adaptability of MPI applications to dynamic execution environments. These techniques can significantly improve the performance and scalability of scientific applications and the overall efficiency of the HPC system on which they run. Specifically, FLEX-MPI focuses on dynamic load balancing and performance-aware malleability for parallel applications. The main goal of the design and implementation of the adaptability techniques is to efficiently execute MPI applications on a wide range of HPC platforms ranging from small to large-scale systems.
Dynamic load balancing allows FLEX-MPI to adapt the workload assignments at runtime to the performance of the computing elements that execute the parallel application. On the other hand, performance-aware malleability leverages the dynamic process management interface of MPI to change the number of processes of the application at runtime. This feature allows to improve the performance of applications that exhibit irregular computation patterns and execute in computing systems with dynamic availability of resources. One of the main features of these techniques is that they do not require user intervention nor prior knowledge of the underlying hardware.
We have validated and evaluated the performance of the adaptability techniques with three parallel MPI benchmarks and different execution environments with homogeneous and heterogeneous cluster configurations. The results show that FLEXMPI significantly improves the performance of applications when running with the support of dynamic load balancing and malleability, along with a substantial enhancement of their scalability and an improvement of the overall system efficiency.La primera versiĂłn de MPI (Message Passing Interface) fue publicada en 1994, cuando la base comĂşn de las aplicaciones cientĂficas para HPC (High Performance Computing) se caracterizaba por un entorno de ejecuciĂłn estático. Dichas aplicaciones presentaban generalmente patrones regulares de cĂłmputo y comunicaciones, accesos a estructuras de datos densas con alta localidad, y ejecuciĂłn sobre plataformas de computaciĂłn homogĂ©neas. Esto ha hecho que MPI haya sido la alternativa más adecuada para la implementaciĂłn de aplicaciones cientĂficas para HPC durante más de 20 años.
Sin embargo, en los Ăşltimos años las aplicaciones cientĂficas han evolucionado para adaptarse a diferentes retos propuestos por diferentes campos de la ingenierĂa, la economĂa o la medicina entre otros. Estos nuevos retos destacan por caracterĂsticas como grandes cantidades de datos almacenados en estructuras de datos irregulares con baja localidad para el análisis en paralelo (big data), algoritmos con patrones irregulares de cĂłmputo y comunicaciones, e infraestructuras de computaciĂłn heterogĂ©neas (cluster heterogĂ©neos, grid y cloud).
Por otra parte, MPI ha evolucionado significativamente en cada una de sus sucesivas versiones, siendo algunas de las mejoras más destacables presentadas hasta la reciente versiĂłn 3.0 las operaciones de comunicaciĂłn asĂncronas no bloqueantes, rutinas de E/S colectiva, y la interfaz de procesos dinámicos presentada en MPI 2.0.
Esta última proporciona un procedimiento para la creación de procesos en tiempo de ejecución de la aplicación. Sin embargo, la implementación de la interfaz de procesos dinámicos por parte de las diferentes distribuciones de MPI aún presenta numerosas limitaciones que condicionan el desarrollo de aplicaciones maleables en MPI.
Esta tesis propone FLEX-MPI, un sistema que extiende las funcionalidades de la librerĂa MPI y proporciona tĂ©cnicas de optimizaciĂłn para la adaptaciĂłn de aplicaciones
MPI a entornos de ejecuciĂłn dinámicos. Las tĂ©cnicas integradas en FLEX-MPI permiten mejorar el rendimiento y escalabilidad de las aplicaciones cientĂficas y la eficiencia de las plataformas sobre las que se ejecutan. Entre estas tĂ©cnicas destacan el balanceo de carga dinámico y maleabilidad para aplicaciones MPI. El diseño e implementaciĂłn de estas tĂ©cnicas está dirigido a plataformas de cĂłmputo HPC de pequeña a gran escala.
El balanceo de carga dinámico permite a las aplicaciones adaptar de forma eficiente su carga de trabajo a las caracterĂsticas y rendimiento de los elementos de procesamiento sobre los que se ejecutan. Por otro lado, la tĂ©cnica de maleabilidad aprovecha la interfaz de procesos dinámicos de MPI para modificar el nĂşmero de procesos de la aplicaciĂłn en tiempo de ejecuciĂłn, una funcionalidad que permite mejorar el rendimiento de aplicaciones con patrones irregulares o que se ejecutan sobre plataformas de cĂłmputo con disponibilidad dinámica de recursos. Una de las principales caracterĂsticas de estas tĂ©cnicas es que no requieren intervenciĂłn del usuario ni conocimiento previo de la arquitectura sobre la que se ejecuta la aplicaciĂłn.
Hemos llevado a cabo un proceso de validaciĂłn y evaluaciĂłn de rendimiento de las tĂ©cnicas de adaptabilidad con tres diferentes aplicaciones basadas en MPI, bajo diferentes escenarios de computaciĂłn homogĂ©neos y heterogĂ©neos. Los resultados demuestran que FLEX-MPI permite obtener un significativo incremento del rendimiento de las aplicaciones, unido a una mejora sustancial de la escalabilidad y un aumento de la eficiencia global del sistema.Programa Oficial de Doctorado en Ciencia y TecnologĂa InformáticaPresidente: Francisco Fernández Rivera.- Secretario: FlorĂn Daniel Isaila.- Vocal: MarĂa Santos PĂ©rez Hernánde
- …