230 research outputs found

    The role of concurrency in an evolutionary view of programming abstractions

    Full text link
    In this paper we examine how concurrency has been embodied in mainstream programming languages. In particular, we rely on the evolutionary talking borrowed from biology to discuss major historical landmarks and crucial concepts that shaped the development of programming languages. We examine the general development process, occasionally deepening into some language, trying to uncover evolutionary lineages related to specific programming traits. We mainly focus on concurrency, discussing the different abstraction levels involved in present-day concurrent programming and emphasizing the fact that they correspond to different levels of explanation. We then comment on the role of theoretical research on the quest for suitable programming abstractions, recalling the importance of changing the working framework and the way of looking every so often. This paper is not meant to be a survey of modern mainstream programming languages: it would be very incomplete in that sense. It aims instead at pointing out a number of remarks and connect them under an evolutionary perspective, in order to grasp a unifying, but not simplistic, view of the programming languages development process

    Performance evaluation of an open distributed platform for realistic traffic generation

    Get PDF
    Network researchers have dedicated a notable part of their efforts to the area of modeling traffic and to the implementation of efficient traffic generators. We feel that there is a strong demand for traffic generators capable to reproduce realistic traffic patterns according to theoretical models and at the same time with high performance. This work presents an open distributed platform for traffic generation that we called distributed internet traffic generator (D-ITG), capable of producing traffic (network, transport and application layer) at packet level and of accurately replicating appropriate stochastic processes for both inter departure time (IDT) and packet size (PS) random variables. We implemented two different versions of our distributed generator. In the first one, a log server is in charge of recording the information transmitted by senders and receivers and these communications are based either on TCP or UDP. In the other one, senders and receivers make use of the MPI library. In this work a complete performance comparison among the centralized version and the two distributed versions of D-ITG is presented

    Resilient gossip-inspired all-reduce algorithms for high-performance computing - Potential, limitations, and open questions

    Get PDF
    We investigate the usefulness of gossip-based reduction algorithms in a high-performance computing (HPC) context. We compare them to state-of-the-art deterministic parallel reduction algorithms in terms of fault tolerance and resilience against silent data corruption (SDC) as well as in terms of performance and scalability. New gossip-based reduction algorithms are proposed, which significantly improve the state-of-the-art in terms of resilience against SDC. Moreover, a new gossip-inspired reduction algorithm is proposed, which promises a much more competitive runtime performance in an HPC context than classical gossip-based algorithms, in particular for low accuracy requirements.This work has been partially funded by the Spanish Ministry of Science and Innovation [contract TIN2015-65316]; by the Government of Catalonia [contracts 2014-SGR-1051, 2014-SGR-1272]; by the RoMoL ERC Advanced Grant [grant number GA 321253] and by the Vienna Science and Technology Fund (WWTF) through project ICT15-113.Peer ReviewedPostprint (author's final draft

    High-Performance Message Passing over generic Ethernet Hardware with Open-MX

    Get PDF
    International audienceIn the last decade, cluster computing has become the most popular high-performance computing architecture. Although numerous technological innovations have been proposed to improve the interconnection of nodes, many clusters still rely on commodity Ethernet hardware to implement message passing within parallel applications. We present Open-MX, an open-source message passing stack over generic Ethernet. It offers the same abilities as the specialized Myrinet Express stack, without requiring dedicated support from the networking hardware. Open-MX works transparently in the most popular MPI implementations through its MX interface compatibility. It also enables interoperability between hosts running the specialized MX stack and generic Ethernet hosts. We detail how Open-MX copes with the inherent limitations of the Ethernet hardware to satisfy the requirements of message passing by applying an innovative copy offload model. Combined with a careful tuning of the fabric and of the MX wire protocol, Open-MX achieves better performance than TCP implementations, especially on 10 gigabit/s hardware

    Location Management in a Mobile Object Runtime Environment

    Get PDF

    Methodology for malleable applications on distributed memory systems

    Get PDF
    A la portada logo BSC(English) The dominant programming approach for scientific and industrial computing on clusters is MPI+X. While there are a variety of approaches within the node, denoted by the ``X'', Message Passing interface (MPI) is the standard for programming multiple nodes with distributed memory. This thesis argues that the OmpSs-2 tasking model can be extended beyond the node to naturally support distributed memory, with three benefits: First, at small to medium scale the tasking model is a simpler and more productive alternative to MPI. It eliminates the need to distribute the data explicitly and convert all dependencies into explicit message passing. It also avoids the complexity of hybrid programming using MPI+X. Second, the ability to offload parts of the computation among the nodes enables the runtime to automatically balance the loads in a full-scale MPI+X program. This approach does not require a cost model, and it is able to transparently balance the computational loads across the whole program, on all its nodes. Third, because the runtime handles all low-level aspects of data distribution and communication, it can change the resource allocation dynamically, in a way that is transparent to the application. This thesis describes the design, development and evaluation of OmpSs-2@Cluster, a programming model and runtime system that extends the OmpSs-2 model to allow a virtually unmodified OmpSs-2 program to run across multiple distributed memory nodes. For well-balanced applications it provides similar performance to MPI+OpenMP on up to 16 nodes, and it improves performance by up to 2x for irregular and unbalanced applications like Cholesky factorization. This work also extended OmpSs-2@Cluster for interoperability with MPI and Barcelona Supercomputing Center (BSC)'s state-of-the-art Dynamic Load Balance (DLB) library in order to dynamically balance MPI+OmpSs-2 applications by transparently offloading tasks among nodes. This approach reduces the execution time of a microscale solid mechanics application by 46% on 64 nodes and on a synthetic benchmark, it is within 10% of perfect load balancing on up to 8 nodes. Finally, the runtime was extended to transparently support malleability for pure OmpSs-2@Cluster programs and interoperate with the Resources Management System (RMS). The only change to the application is to explicitly call an API function to control the addition or removal of nodes. In this regard we additionally provide the runtime with the ability to semi-transparently save and recover part of the application status to perform checkpoint and restart. Such a feature hides the complexity of data redistribution and parallel IO from the user while allowing the program to recover and continue previous executions. Our work is a starting point for future research on fault tolerance. In summary, OmpSs-2@Cluster expands the OmpSs-2 programming model to encompass distributed memory clusters. It allows an existing OmpSs-2 program, with few if any changes, to run across multiple nodes. OmpSs-2@Cluster supports transparent multi-node dynamic load balancing for MPI+OmpSs-2 programs, and enables semi-transparent malleability for OmpSs-2@Cluster programs. The runtime system has a high level of stability and performance, and it opens several avenues for future work.(Español) El modelo de programación dominante para clusters tanto en ciencia como industria es actualmente MPI+X. A pesar de que hay alguna variedad de alternativas para programar dentro de un nodo (indicado por la "X"), el estandar para programar múltiples nodos con memoria distribuida sigue siendo Message Passing Interface (MPI). Esta tesis propone la extensión del modelo de programación basado en tareas OmpSs-2 para su funcionamiento en sistemas de memoria distribuida, destacando 3 beneficios principales: En primer lugar; a pequeña y mediana escala, un modelo basado en tareas es más simple y productivo que MPI y elimina la necesidad de distribuir los datos explícitamente y convertir todas las dependencias en mensajes. Además, evita la complejidad de la programacion híbrida MPI+X. En segundo lugar; la capacidad de enviar partes del cálculo entre los nodos permite a la librería balancear la carga de trabajo en programas MPI+X a gran escala. Este enfoque no necesita un modelo de coste y permite equilibrar cargas transversalmente en todo el programa y todos los nodos. En tercer lugar; teniendo en cuenta que es la librería quien maneja todos los aspectos relacionados con distribución y transferencia de datos, es posible la modificación dinámica y transparente de los recursos que utiliza la aplicación. Esta tesis describe el diseño, desarrollo y evaluación de OmpSs-2@Cluster; un modelo de programación y librería que extiende OmpSs-2 permitiendo la ejecución de programas OmpSs-2 existentes en múltiples nodos sin prácticamente necesidad de modificarlos. Para aplicaciones balanceadas, este modelo proporciona un rendimiento similar a MPI+OpenMP hasta 16 nodos y duplica el rendimiento en aplicaciones irregulares o desbalanceadas como la factorización de Cholesky. Este trabajo incluye la extensión de OmpSs-2@Cluster para interactuar con MPI y la librería de balanceo de carga Dynamic Load Balancing (DLB) desarrollada en el Barcelona Supercomputing Center (BSC). De este modo es posible equilibrar aplicaciones MPI+OmpSs-2 mediante la transferencia transparente de tareas entre nodos. Este enfoque reduce el tiempo de ejecución de una aplicación de mecánica de sólidos a micro-escala en un 46% en 64 nodos; en algunos experimentos hasta 8 nodos se pudo equilibrar perfectamente la carga con una diferencia inferior al 10% del equilibrio perfecto. Finalmente, se implementó otra extensión de la librería para realizar operaciones de maleabilidad en programas OmpSs-2@Cluster e interactuar con el Sistema de Manejo de Recursos (RMS). El único cambio requerido en la aplicación es la llamada explicita a una función de la interfaz que controla la adición o eliminación de nodos. Además, se agregó la funcionalidad de guardar y recuperar parte del estado de la aplicación de forma semitransparente con el objetivo de realizar operaciones de salva-reinicio. Dicha funcionalidad oculta al usuario la complejidad de la redistribución de datos y las operaciones de lectura-escritura en paralelo, mientras permite al programa recuperar y continuar ejecuciones previas. Este es un punto de partida para futuras investigaciones en tolerancia a fallos. En resumen, OmpSs-2@Cluster amplía el modelo de programación de OmpSs-2 para abarcar sistemas de memoria distribuida. El modelo permite la ejecución de programas OmpSs-2 en múltiples nodos prácticamente sin necesidad de modificarlos. OmpSs-2@Cluster permite además el balanceo dinámico de carga en aplicaciones híbridas MPI+OmpSs-2 ejecutadas en varios nodos y es capaz de realizar maleabilidad semi-transparente en programas OmpSs-2@Cluster puros. La librería tiene un niveles de rendimiento y estabilidad altos y abre varios caminos para trabajos futuro.Arquitectura de computador

    C6 – Middleware

    Get PDF
    corecore