541 research outputs found

    Designing and Handling Failure issues in a Structured Overlay Network Based Grid

    Get PDF
    Grid computing is the computing paradigm that is concerned with coordinated resource sharing and problem solving in dynamic, autonomous multi-institutional virtual organizations. Data exchange and service allocation between virtual organizations are challenging problems in the field of Grid computing, due to the decentralization of Grid systems. The resource management in a Grid system ensures efficiency and usability. The required efficiency and usability of Grid systems can be achieved by building a decentralized multi-virtual Grid system. In this thesis we present a decentralized multi-virtual resource management framework in which the system is divided into virtual organizations, each controlled by a broker. An overlay network of brokers is responsible for global resource management and managing the allocation of services. We address two main issues for both local and global resource management: 1) decentralized allocation of tasks to suitable nodes to achieve both local and global load balancing; and 2) handling of both regular and broker failures. Experimental results verify that the system achieves dependable performance with various loads of services and broker failures

    Meta-scheduling Issues in Interoperable HPCs, Grids and Clouds

    Get PDF
    Over the last years, interoperability among resources has been emerged as one of the most challenging research topics. However, the commonality of the complexity of the architectures (e.g., heterogeneity) and the targets that each computational paradigm including HPC, grids and clouds aims to achieve (e.g., flexibility) remain the same. This is to efficiently orchestrate resources in a distributed computing fashion by bridging the gap among local and remote participants. Initially, this is closely related with the scheduling concept which is one of the most important issues for designing a cooperative resource management system, especially in large scale settings such as in grids and clouds. Within this context, meta-scheduling offers additional functionalities in the area of interoperable resource management, this is because of its great agility to handle sudden variations and dynamic situations in user demands. Accordingly, the case of inter-infrastructures, including InterCloud, entitle that the decentralised meta-scheduling scheme overcome issues like consolidated administration management, bottleneck and local information exposition. In this work, we detail the fundamental issues for developing an effective interoperable meta-scheduler for e-infrastructures in general and InterCloud in particular. Finally, we describe a simulation and experimental configuration based on real grid workload traces to demonstrate the interoperable setting as well as provide experimental results as part of a strategic plan for integrating future meta-schedulers

    Resource discovery for distributed computing systems: A comprehensive survey

    Get PDF
    Large-scale distributed computing environments provide a vast amount of heterogeneous computing resources from different sources for resource sharing and distributed computing. Discovering appropriate resources in such environments is a challenge which involves several different subjects. In this paper, we provide an investigation on the current state of resource discovery protocols, mechanisms, and platforms for large-scale distributed environments, focusing on the design aspects. We classify all related aspects, general steps, and requirements to construct a novel resource discovery solution in three categories consisting of structures, methods, and issues. Accordingly, we review the literature, analyzing various aspects for each category

    Task-based Runtime Optimizations Towards High Performance Computing Applications

    Get PDF
    The last decades have witnessed a rapid improvement of computational capabilities in high-performance computing (HPC) platforms thanks to hardware technology scaling. HPC architectures benefit from mainstream advances on the hardware with many-core systems, deep hierarchical memory subsystem, non-uniform memory access, and an ever-increasing gap between computational power and memory bandwidth. This has necessitated continuous adaptations across the software stack to maintain high hardware utilization. In this HPC landscape of potentially million-way parallelism, task-based programming models associated with dynamic runtime systems are becoming more popular, which fosters developers’ productivity at extreme scale by abstracting the underlying hardware complexity. In this context, this dissertation highlights how a software bundle powered by a task-based programming model can address the heterogeneous workloads engendered by HPC applications., i.e., data redistribution, geospatial modeling and 3D unstructured mesh deformation here. Data redistribution aims to reshuffle data to optimize some objective for an algorithm, whose objective can be multi-dimensional, such as improving computational load balance or decreasing communication volume or cost, with the ultimate goal of increasing the efficiency and therefore reducing the time-to-solution for the algorithm. Geostatistical modeling, one of the prime motivating applications for exascale computing, is a technique for predicting desired quantities from geographically distributed data, based on statistical models and optimization of parameters. Meshing the deformable contour of moving 3D bodies is an expensive operation that can cause huge computational challenges in fluid-structure interaction (FSI) applications. Therefore, in this dissertation, Redistribute-PaRSEC, ExaGeoStat-PaRSEC and HiCMA-PaRSEC are proposed to efficiently tackle these HPC applications respectively at extreme scale, and they are evaluated on multiple HPC clusters, including AMD-based, Intel-based, Arm-based CPU systems and IBM-based multi-GPU system. This multidisciplinary work emphasizes the need for runtime systems to go beyond their primary responsibility of task scheduling on massively parallel hardware system for servicing the next-generation scientific applications
    corecore