23 research outputs found

    DIET : new developments and recent results

    Get PDF
    Among existing grid middleware approaches, one simple, powerful, and flexibleapproach consists of using servers available in different administrative domainsthrough the classic client-server or Remote Procedure Call (RPC) paradigm.Network Enabled Servers (NES) implement this model also called GridRPC.Clients submit computation requests to a scheduler whose goal is to find aserver available on the grid. The aim of this paper is to give an overview of anNES middleware developed in the GRAAL team called DIET and to describerecent developments. DIET (Distributed Interactive Engineering Toolbox) is ahierarchical set of components used for the development of applications basedon computational servers on the grid.Parmi les intergiciels de grilles existants, une approche simple, flexible et performante consiste a utiliser des serveurs disponibles dans des domaines administratifs différents à travers le paradigme classique de l’appel de procédure àdistance (RPC). Les environnements de ce type, connus sous le terme de Network Enabled Servers, implémentent ce modèle appelé GridRPC. Des clientssoumettent des requêtes de calcul à un ordonnanceur dont le but consiste àtrouver un serveur disponible sur la grille.Le but de cet article est de donner un tour d’horizon d’un intergiciel développédans le projet GRAAL appelé DIET 1. DIET (Distributed Interactive Engineering Toolbox) est un ensemble hiérarchique de composants utilisés pour ledéveloppement d’applications basées sur des serveurs de calcul sur la grille

    Use of A Network Enabled Server System for a Sparse Linear Algebra Grid Application

    Get PDF
    Solving systems of linear equations is one of the key operations in linear algebra. Many different algorithms are available in that purpose. These algorithms require a very accurate tuning to minimise runtime and memory consumption. The TLSE project provides, on one hand, a scenario-driven expert site to help users choose the right algorithm according to their problem and tune accurately this algorithm, and, on the other hand, a test-bed for experts in order to compare algorithms and define scenarios for the expert site. Both features require to run the available solvers a large number of times with many different values for the control parameters (and maybe with many different architectures). Currently, only the grid can provide enough computing power for this kind of application. The DIET middleware is the GRID backbone for TLSE. It manages the solver services and their scheduling in a scalable way.La résolution de systèmes linéaires creux est une opération clé en algèbre linéaire. Beaucoup d’algorithmes sont utilisés pour cela, qui dépendent de nombreux paramètres, afin d’offrir une robustesse, une performance et une consommation mémoire optimales. Le projet GRID-TLSE fournit d’une part, un site d’expertise basé sur l’utilisation de scénarios pour aider les utilisateurs à choisir l’algorithme qui convient le mieux à leur problème ainsi que les paramètres associés; et d’autre part, un environnement pour les experts du domaine leur permettant de comparer efficacement des algorithmes et de définir dynamiquement de nouveaux scénarios d’utilisation. Ces fonctionnalités nécessitent de pouvoir exécuter les logiciels de résolution disponibles un grand nombre de fois,avec beaucoup de valeurs différentes des paramètres de contrôle (et éventuellement sur plusieurs architectures de machines). Actuellement, seule la grille peut fournir la puissance de calcul pour ce type d’applications. L’intergiciel DIETest utilisé pour gérer la grille, les différents services, et leur ordonnancement efficace

    funcX: A Federated Function Serving Fabric for Science

    Full text link
    Exploding data volumes and velocities, new computational methods and platforms, and ubiquitous connectivity demand new approaches to computation in the sciences. These new approaches must enable computation to be mobile, so that, for example, it can occur near data, be triggered by events (e.g., arrival of new data), be offloaded to specialized accelerators, or run remotely where resources are available. They also require new design approaches in which monolithic applications can be decomposed into smaller components, that may in turn be executed separately and on the most suitable resources. To address these needs we present funcX---a distributed function as a service (FaaS) platform that enables flexible, scalable, and high performance remote function execution. funcX's endpoint software can transform existing clouds, clusters, and supercomputers into function serving systems, while funcX's cloud-hosted service provides transparent, secure, and reliable function execution across a federated ecosystem of endpoints. We motivate the need for funcX with several scientific case studies, present our prototype design and implementation, show optimizations that deliver throughput in excess of 1 million functions per second, and demonstrate, via experiments on two supercomputers, that funcX can scale to more than more than 130000 concurrent workers.Comment: Accepted to ACM Symposium on High-Performance Parallel and Distributed Computing (HPDC 2020). arXiv admin note: substantial text overlap with arXiv:1908.0490

    Methods and design issues for next generation network-aware applications

    Get PDF
    Networks are becoming an essential component of modern cyberinfrastructure and this work describes methods of designing distributed applications for high-speed networks to improve application scalability, performance and capabilities. As the amount of data generated by scientific applications continues to grow, to be able to handle and process it, applications should be designed to use parallel, distributed resources and high-speed networks. For scalable application design developers should move away from the current component-based approach and implement instead an integrated, non-layered architecture where applications can use specialized low-level interfaces. The main focus of this research is on interactive, collaborative visualization of large datasets. This work describes how a visualization application can be improved through using distributed resources and high-speed network links to interactively visualize tens of gigabytes of data and handle terabyte datasets while maintaining high quality. The application supports interactive frame rates, high resolution, collaborative visualization and sustains remote I/O bandwidths of several Gbps (up to 30 times faster than local I/O). Motivated by the distributed visualization application, this work also researches remote data access systems. Because wide-area networks may have a high latency, the remote I/O system uses an architecture that effectively hides latency. Five remote data access architectures are analyzed and the results show that an architecture that combines bulk and pipeline processing is the best solution for high-throughput remote data access. The resulting system, also supporting high-speed transport protocols and configurable remote operations, is up to 400 times faster than a comparable existing remote data access system. Transport protocols are compared to understand which protocol can best utilize high-speed network connections, concluding that a rate-based protocol is the best solution, being 8 times faster than standard TCP. An HD-based remote teaching application experiment is conducted, illustrating the potential of network-aware applications in a production environment. Future research areas are presented, with emphasis on network-aware optimization, execution and deployment scenarios

    GRID superscalar: a programming model for the Grid

    Get PDF
    Durant els darrers anys el Grid ha sorgit com una nova plataforma per la computació distribuïda. La tecnologia Gris permet unir diferents recursos de diferents dominis administratius i formar un superordinador virtual amb tots ells. Molts grups de recerca han dedicat els seus esforços a desenvolupar un conjunt de serveis bàsics per oferir un middleware de Grid: una capa que permet l'ús del Grid. De tota manera, utilitzar aquests serveis no és una tasca fácil per molts usuaris finals, cosa que empitjora si l'expertesa d'aquests usuaris no està relacionada amb la informàtica.Això té una influència negativa a l'hora de que la comunitat científica adopti la tecnologia Grid. Es veu com una tecnologia potent però molt difícil de fer servir. Per facilitar l'ús del Grid és necessària una capa extra que amagui la complexitat d'aquest i permeti als usuaris programar o portar les seves aplicacions de manera senzilla.Existeixen moltes propostes d'eines de programació pel Grid. En aquesta tesi fem un resum d'algunes d'elles, i podem veure que existeixen eines conscients i no-conscients del Grid (es programen especificant o no els detalls del Grid, respectivament). A més, molt poques d'aquestes eines poden explotar el paral·lelisme implícit de l'aplicació, i en la majoria d'elles, l'usuari ha de definir aquest paral·lelisme de manera explícita. Una altra característica que considerem important és si es basen en llenguatges de programació molt populars (com C++ o Java), cosa que facilita l'adopció per part dels usuaris finals.En aquesta tesi, el nostre objectiu principal ha estat crear un model de programació pel Grid basat en la programació seqüencial i els llenguatges més coneguts de la programació imperativa, capaç d'explotar el paral·lelisme implícit de les aplicacions i d'accelerar-les fent servir els recursos del Grid de manera concurrent. A més, com el Grid és de naturalesa distribuïda, heterogènia i dinàmica i degut també a que el nombre de recursos que pot formar un Grid pot ser molt gran, la probabilitat de que es produeixi una errada durant l'execució d'una aplicació és elevada. Per tant, un altre dels nostres objectius ha estat tractar qualsevol tipus d'error que pugui sorgir durant l'execució d'una aplicació de manera automàtica (ja siguin errors relacionats amb l'aplicació o amb el Grid). GRID superscalar (GRIDSs), la principal contribució d'aquesta tesi, és un model de programació que assoleix elsobjectius mencionats proporcionant una interfície molt petita i simple i un entorn d'execució que és capaç d'executar en paral·lel el codi proporcionat fent servir el Grid. La nostra interfície de programació permet a un usuari programar una aplicació no-conscient del Grid, amb llenguatges imperatius coneguts i populars (com C/C++, Java, Perl o Shell script) i de manera seqüencial, per tant dóna un pas important per ajudar als usuaris a adoptar la tecnologia Grid.Hem aplicat el nostre coneixement de l'arquitectura de computadors i el disseny de microprocessadors a l'entorn d'execució de GRIDSs. Tal com es fa a un processador superescalar, l'entorn d'execució de GRIDSs és capaç de realitzar un anàlisi de dependències entre les tasques que formen l'aplicació, i d'aplicar tècniques de renombrament per incrementar el seu paral·lelisme. GRIDSs genera automàticament a partir del codi principal de l'usuari un graf que descriu les dependències de dades en l'aplicació. També presentem casos d'ús reals del model de programació en els camps de la química computacional i la bioinformàtica, que demostren que els nostres objectius han estat assolits.Finalment, hem estudiat l'aplicació de diferents tècniques per detectar i tractar fallades: checkpoint, reintent i replicació de tasques. La nostra proposta és proporcionar un entorn capaç de tractar qualsevol tipus d'errors, de manera transparent a l'usuari sempre que sigui possible. El principal avantatge d'implementar aquests mecanismos al nivell del model de programació és que el coneixement a nivell de l'aplicació pot ser explotat per crear dinàmicament una estratègia de tolerància a fallades per cada aplicació, i evitar introduir sobrecàrrega en entorns lliures d'errors.During last years, the Grid has emerged as a new platform for distributed computing. The Grid technology allows joining different resources from different administrative domains and forming a virtual supercomputer with all of them.Many research groups have dedicated their efforts to develop a set of basic services to offer a Grid middleware: a layer that enables the use of the Grid. Anyway, using these services is not an easy task for many end users, even more if their expertise is not related to computer science. This has a negative influence in the adoption of the Grid technology by the scientific community. They see it as a powerful technology but very difficult to exploit. In order to ease the way the Grid must be used, there is a need for an extra layer which hides all the complexity of the Grid, and allows users to program or port their applications in an easy way.There has been many proposals of programming tools for the Grid. In this thesis we give an overview on some of them, and we can see that there exist both Grid-aware and Grid-unaware environments (programmed with or without specifying details of the Grid respectively). Besides, very few existing tools can exploit the implicit parallelism of the application and in the majority of them, the user must define the parallelism explicitly. Another important feature we consider is if they are based in widely used programming languages (as C++ or Java), so the adoption is easier for end users.In this thesis, our main objective has been to create a programming model for the Grid based on sequential programming and well-known imperative programming languages, able to exploit the implicit parallelism of applications and to speed them up by using the Grid resources concurrently. Moreover, because the Grid has a distributed, heterogeneous and dynamic nature and also because the number of resources that form a Grid can be very big, the probability that an error arises during an application's execution is big. Thus, another of our objectives has been to automatically deal with any type of errors which may arise during the execution of the application (application related or Grid related).GRID superscalar (GRIDSs), the main contribution of this thesis, is a programming model that achieves these mentioned objectives by providing a very small and simple interface and a runtime that is able to execute in parallel the code provided using the Grid. Our programming interface allows a user to program a Grid-unaware application with already known and popular imperative languages (such as C/C++, Java, Perl or Shell script) and in a sequential fashion, therefore giving an important step to assist end users in the adoption of the Grid technology.We have applied our knowledge from computer architecture and microprocessor design to the GRIDSs runtime. As it is done in a superscalar processor, the GRIDSs runtime system is able to perform a data dependence analysis between the tasks that form an application, and to apply renaming techniques in order to increase its parallelism. GRIDSs generates automatically from user's main code a graph describing the data dependencies in the application.We present real use cases of the programming model in the fields of computational chemistry and bioinformatics, which demonstrate that our objectives have been achieved.Finally, we have studied the application of several fault detection and treatment techniques: checkpointing, task retry and task replication. Our proposal is to provide an environment able to deal with all types of failures, transparently for the user whenever possible. The main advantage in implementing these mechanisms at the programming model level is that application-level knowledge can be exploited in order to dynamically create a fault tolerance strategy for each application, and avoiding to introduce overhead in error-free environments

    Programming and parallelising applications for distributed infrastructures

    Get PDF
    The last decade has witnessed unprecedented changes in parallel and distributed infrastructures. Due to the diminished gains in processor performance from increasing clock frequency, manufacturers have moved from uniprocessor architectures to multicores; as a result, clusters of computers have incorporated such new CPU designs. Furthermore, the ever-growing need of scienti c applications for computing and storage capabilities has motivated the appearance of grids: geographically-distributed, multi-domain infrastructures based on sharing of resources to accomplish large and complex tasks. More recently, clouds have emerged by combining virtualisation technologies, service-orientation and business models to deliver IT resources on demand over the Internet. The size and complexity of these new infrastructures poses a challenge for programmers to exploit them. On the one hand, some of the di culties are inherent to concurrent and distributed programming themselves, e.g. dealing with thread creation and synchronisation, messaging, data partitioning and transfer, etc. On the other hand, other issues are related to the singularities of each scenario, like the heterogeneity of Grid middleware and resources or the risk of vendor lock-in when writing an application for a particular Cloud provider. In the face of such a challenge, programming productivity - understood as a tradeo between programmability and performance - has become crucial for software developers. There is a strong need for high-productivity programming models and languages, which should provide simple means for writing parallel and distributed applications that can run on current infrastructures without sacri cing performance. In that sense, this thesis contributes with Java StarSs, a programming model and runtime system for developing and parallelising Java applications on distributed infrastructures. The model has two key features: first, the user programs in a fully-sequential standard-Java fashion - no parallel construct, API call or pragma must be included in the application code; second, it is completely infrastructure-unaware, i.e. programs do not contain any details about deployment or resource management, so that the same application can run in di erent infrastructures with no changes. The only requirement for the user is to select the application tasks, which are the model's unit of parallelism. Tasks can be either regular Java methods or web service operations, and they can handle any data type supported by the Java language, namely les, objects, arrays and primitives. For the sake of simplicity of the model, Java StarSs shifts the burden of parallelisation from the programmer to the runtime system. The runtime is responsible from modifying the original application to make it create asynchronous tasks and synchronise data accesses from the main program. Moreover, the implicit inter-task concurrency is automatically found as the application executes, thanks to a data dependency detection mechanism that integrates all the Java data types. This thesis provides a fairly comprehensive evaluation of Java StarSs on three di erent distributed scenarios: Grid, Cluster and Cloud. For each of them, a runtime system was designed and implemented to exploit their particular characteristics as well as to address their issues, while keeping the infrastructure unawareness of the programming model. The evaluation compares Java StarSs against state-of-the-art solutions, both in terms of programmability and performance, and demonstrates how the model can bring remarkable productivity to programmers of parallel distributed applications

    Research and development of accounting system in grid environment

    Get PDF
    The Grid has been recognised as the next-generation distributed computing paradigm by seamlessly integrating heterogeneous resources across administrative domains as a single virtual system. There are an increasing number of scientific and business projects that employ Grid computing technologies for large-scale resource sharing and collaborations. Early adoptions of Grid computing technologies have custom middleware implemented to bridge gaps between heterogeneous computing backbones. These custom solutions form the basis to the emerging Open Grid Service Architecture (OGSA), which aims at addressing common concerns of Grid systems by defining a set of interoperable and reusable Grid services. One of common concerns as defined in OGSA is the Grid accounting service. The main objective of the Grid accounting service is to ensure resources to be shared within a Grid environment in an accountable manner by metering and logging accurate resource usage information. This thesis discusses the origins and fundamentals of Grid computing and accounting service in the context of OGSA profile. A prototype was developed and evaluated based on OGSA accounting-related standards enabling sharing accounting data in a multi-Grid environment, the World-wide Large Hadron Collider Grid (WLCG). Based on this prototype and lessons learned, a generic middleware solution was also implemented as a toolkit that eases migration of existing accounting system to be standard compatible.EThOS - Electronic Theses Online ServiceEngineering and Physical Sciences Research Council (EPSRC)Stanford UniversityGBUnited Kingdo

    On the construction of decentralised service-oriented orchestration systems

    Get PDF
    Modern science relies on workflow technology to capture, process, and analyse data obtained from scientific instruments. Scientific workflows are precise descriptions of experiments in which multiple computational tasks are coordinated based on the dataflows between them. Orchestrating scientific workflows presents a significant research challenge: they are typically executed in a manner such that all data pass through a centralised computer server known as the engine, which causes unnecessary network traffic that leads to a performance bottleneck. These workflows are commonly composed of services that perform computation over geographically distributed resources, and involve the management of dataflows between them. Centralised orchestration is clearly not a scalable approach for coordinating services dispersed across distant geographical locations. This thesis presents a scalable decentralised service-oriented orchestration system that relies on a high-level data coordination language for the specification and execution of workflows. This system’s architecture consists of distributed engines, each of which is responsible for executing part of the overall workflow. It exploits parallelism in the workflow by decomposing it into smaller sub-workflows, and determines the most appropriate engines to execute them using computation placement analysis. This permits the workflow logic to be distributed closer to the services providing the data for execution, which reduces the overall data transfer in the workflow and improves its execution time. This thesis provides an evaluation of the presented system which concludes that decentralised orchestration provides scalability benefits over centralised orchestration, and improves the overall performance of executing a service-oriented workflow

    GWpilot: Enabling multi-level scheduling in distributed infrastructures with GridWay and pilot jobs

    Get PDF
    Current systems based on pilot jobs are not exploiting all the scheduling advantages that the technique offers, or they lack compatibility or adaptability. To overcome the limitations or drawbacks in existing approaches, this study presents a different general-purpose pilot system, GWpilot. This system provides individual users or institutions with a more easy-to-use, easy-toinstall, scalable, extendable, flexible and adjustable framework to efficiently run legacy applications. The framework is based on the GridWay meta-scheduler and incorporates the powerful features of this system, such as standard interfaces, fair-share policies, ranking, migration, accounting and compatibility with diverse infrastructures. GWpilot goes beyond establishing simple network overlays to overcome the waiting times in remote queues or to improve the reliability in task production. It properly tackles the characterisation problem in current infrastructures, allowing users to arbitrarily incorporate customised monitoring of resources and their running applications into the system. This functionality allows the new framework to implement innovative scheduling algorithms that accomplish the computational needs of a wide range of calculations faster and more efficiently. The system can also be easily stacked under other software layers, such as self-schedulers. The advanced techniques included by default in the framework result in significant performance improvements even when very short tasks are scheduled
    corecore