57 research outputs found

    Un environnement pour le calcul intensif pair Ă  pair

    Get PDF
    Le concept de pair à pair (P2P) a connu récemment de grands développements dans les domaines du partage de fichiers, du streaming vidéo et des bases de données distribuées. Le développement du concept de parallélisme dans les architectures de microprocesseurs et les avancées en matière de réseaux à haut débit permettent d'envisager de nouvelles applications telles que le calcul intensif distribué. Cependant, la mise en oeuvre de ce nouveau type d'application sur des réseaux P2P pose de nombreux défis comme l'hétérogénéité des machines, le passage à l'échelle et la robustesse. Par ailleurs, les protocoles de transport existants comme TCP et UDP ne sont pas bien adaptés à ce nouveau type d'application. Ce mémoire de thèse a pour objectif de présenter un environnement décentralisé pour la mise en oeuvre de calculs intensifs sur des réseaux pair à pair. Nous nous intéressons à des applications dans les domaines de la simulation numérique et de l'optimisation qui font appel à des modèles de type parallélisme de tâches et qui sont résolues au moyen d'algorithmes itératifs distribués or parallèles. Contrairement aux solutions existantes, notre environnement permet des communications directes et fréquentes entre les pairs. L'environnement est conçu à partir d'un protocole de communication auto-adaptatif qui peut se reconfigurer en adoptant le mode de communication le plus approprié entre les pairs en fonction de choix algorithmiques relevant de la couche application ou d'éléments de contexte comme la topologie au niveau de la couche réseau. Nous présentons et analysons des résultats expérimentaux obtenus sur diverses plateformes comme GRID'5000 et PlanetLab pour le problème de l'obstacle et des problèmes non linéaires de flots dans les réseaux. ABSTRACT : The concept of peer-to-peer (P2P) has known great developments these years in the domains of file sharing, video streaming or distributed databases. Recent advances in microprocessors architecture and networks permit one to consider new applications like distributed high performance computing. However, the implementation of this new type of application on P2P networks gives raise to numerous challenges like heterogeneity, scalability and robustness. In addition, existing transport protocols like TCP and UDP are not well suited to this new type of application. This thesis aims at designing a decentralized and robust environment for the implementation of high performance computing applications on peer-to-peer networks. We are interested in applications in the domains of numerical simulation and optimization that rely on tasks parallel models and that are solved via parallel or distributed iterative algorithms. Unlike existing solutions, our environment allows frequent direct communications between peers. The environment is based on a self adaptive communication protocol that can reconfigure itself dynamically by choosing the most appropriate communication mode between any peers according to decisions concerning algorithmic choice made at the application level or elements of context at transport level, like topology. We present and analyze computational results obtained on several testeds like GRID’5000 and PlanetLab for the obstacle problem and nonlinear network flow problems

    Enhancing reliability with Latin Square redundancy on desktop grids.

    Get PDF
    Computational grids are some of the largest computer systems in existence today. Unfortunately they are also, in many cases, the least reliable. This research examines the use of redundancy with permutation as a method of improving reliability in computational grid applications. Three primary avenues are explored - development of a new redundancy model, the Replication and Permutation Paradigm (RPP) for computational grids, development of grid simulation software for testing RPP against other redundancy methods and, finally, running a program on a live grid using RPP. An important part of RPP involves distributing data and tasks across the grid in Latin Square fashion. Two theorems and subsequent proofs regarding Latin Squares are developed. The theorems describe the changing position of symbols between the rows of a standard Latin Square. When a symbol is missing because a column is removed the theorems provide a basis for determining the next row and column where the missing symbol can be found. Interesting in their own right, the theorems have implications for redundancy. In terms of the redundancy model, the theorems allow one to state the maximum makespan in the face of missing computational hosts when using Latin Square redundancy. The simulator software was developed and used to compare different data and task distribution schemes on a simulated grid. The software clearly showed the advantage of running RPP, which resulted in faster completion times in the face of computational host failures. The Latin Square method also fails gracefully in that jobs complete with massive node failure while increasing makespan. Finally an Inductive Logic Program (ILP) for pharmacophore search was executed, using a Latin Square redundancy methodology, on a Condor grid in the Dahlem Lab at the University of Louisville Speed School of Engineering. All jobs completed, even in the face of large numbers of randomly generated computational host failures

    The Architecture of the XtreemOS Grid Checkpointing Service

    Get PDF
    The EU-funded XtreemOS project implements a grid operating system (OS) transparently exploiting distributed resources through the SAGA and POSIX interfaces. XtreemOS uses an integrated grid checkpointing service (XtreemGCP) for implementing migration and fault tolerance. Checkpointing and restarting applications in a grid requires saving and restoring applications in a distributed heterogeneous environment. The latter may spawn millions of grid nodes using different system-specific checkpointers saving and restoring application and kernel data structures on a grid node. In this paper we present the architecture of the XtreemGCP service integrating existing checkpointing solutions. Our architecture is open to support different checkpointing strategies that can be adapted according to evolving failure situations or changing application requirements. We propose to bridge the gap between grid semantics and system-specific checkpointers by introducing a common kernel checkpointer API that allows using different checkpointers in a uniform way. Furthermore, we discuss other grid related checkpointing issues including resource conflicts during restart, security, and checkpoint file management. Although this paper presents a solution within the XtreemOS context it can be applied to any other grid middleware or distributed OS, too

    Decentralized Orchestration of Open Services- Achieving High Scalability and Reliability with Continuation-Passing Messaging

    Get PDF
    The papers of this thesis are not available in Munin. Paper I: Yu, W.,Haque, A. A. M. “Decentralised web- services orchestration with continuation-passing messaging”. Available in International Journal of Web and Grid Services 2011, 7(3):304–330. Paper II: Haque, A. A. M., Yu, W.: “Peer-to-peer orchestration of web mashups”. Available in International Journal of Adaptive, Resilient and Autonomic Systems 2014, 5(3):40-60. Paper V: Haque, A. A. M., Yu, W.: “Decentralized and reliable orchestration of open services”. In:Service Computation 2014. International Academy, Research and Industry Association (IARIA) 2014 ISBN 978-1-61208-337-7.An ever-increasing number of web applications are providing open services to a wide range of applications. Whilst traditional centralized approaches to services orchestration are successful for enterprise service-oriented systems, they are subject to serious limitations for orchestrating the wider range of open services. Dealing with these limitations calls for decentralized approaches. However, decentralized approaches are themselves faced with a number of challenges, including the possibility of loss of dynamic run-time states that are spread over the distributed environment. This thesis presents a fully decentralized approach to orchestration of open services. Our flow-aware dynamic replication scheme supports both exceptional handling, failure of orchestration agents and recovers from fail situations. During execution, open services are conducted by a network of orchestration agents which collectively orchestrate open services using continuation-passing messaging. Our performance study showed that decentralized orchestration improves the scalability and enhances the reliability of open services. Our orchestration approach has a clear performance advantage over traditional centralized orchestration as well as over the current practice of web mashups where application servers themselves conduct the execution of the composition of open web services. Finally, in our empirical study we presented the overhead of the replication approach for services orchestration

    DECENTRALIZED RESOURCE ORCHESTRATION FOR HETEROGENEOUS GRIDS

    Get PDF
    Modern desktop machines now use multi-core CPUs to enable improved performance. However, achieving high performance on multi-core machines without optimized software support is still difficult even in a single machine, because contention for shared resources can make it hard to exploit multiple computing resources efficiently. Moreover, more diverse and heterogeneous hardware platforms (e.g. general-purpose GPU and Cell processors) have emerged and begun to impact grid computing. Given that heterogeneity and diversity are now a major trend going forward, grid computing must support these environmental changes. In this dissertation, I design and evaluate a decentralized resource management scheme to exploit heterogeneous multiple computing resources effectively. I suggest resource management algorithms that can efficiently utilize a diverse computational environment, including multiple symmetric computing entities and heterogeneous multi-computing entities, and achieve good load-balancing and high total system throughput. Moreover, I propose expressive resource description techniques to accommodate more heterogeneous environments, allowing incoming jobs with complex requirements to be matched to available resources. First, I develop decentralized resource management frameworks and job scheduling schemes to exploit multi-core nodes in peer-to-peer grids. I present two new load-balancing schemes that explicitly account for resource sharing and contention across multiple cores within a single machine, and propose a simple performance prediction model that can represent a continuum of resource sharing among cores of a CPU. Second, I provide scalable resource discovery and load balancing techniques to accommodate nodes with many types of computing elements, such as multi-core CPUs and GPUs, in a peer-to-peer grid architecture. My scheme takes into account diverse aspects of heterogeneous nodes to maximize overall system throughput as well as minimize messaging costs without sacrificing the failure resilience provided by an underlying peer-to-peer overlay network. Finally, I propose an expressive resource discovery method to support multi-attribute, range-based job constraints. The common approach of using simple attribute indexes does not suffice, as range-based constraints may be satisfied by more than a single value. I design a compact ID-based representation for resource characteristics, and integrate this representation into the decentralized resource discovery framework. By extensive experimental results via simulation, I show that my schemes can match heterogeneous jobs to heterogeneous resources both effectively (good matches are found, load is balanced), and efficiently (the new functionality imposes little overhead)

    Context Aware Service Oriented Computing in Mobile Ad Hoc Networks

    Get PDF
    These days we witness a major shift towards small, mobile devices, capable of wireless communication. Their communication capabilities enable them to form mobile ad hoc networks and share resources and capabilities. Service Oriented Computing (SOC) is a new emerging paradigm for distributed computing that has evolved from object-oriented and component-oriented computing to enable applications distributed within and across organizational boundaries. Services are autonomous computational elements that can be described, published, discovered, and orchestrated for the purpose of developing applications. The application of the SOC model to mobile devices provides a loosely coupled model for distributed processing in a resource-poor and highly dynamic environment. Cooperation in a mobile ad hoc environment depends on the fundamental capability of hosts to communicate with each other. Peer-to-peer interactions among hosts within communication range allow such interactions but limit the scope of interactions to a local region. Routing algorithms for mobile ad hoc networks extend the scope of interactions to cover all hosts transitively connected over multi-hop routes. Additional contextual information, e.g., knowledge about the movement of hosts in physical space, can help extend the boundaries of interactions beyond the limits of an island of connectivity. To help separate concerns specific to different layers, a coordination model between the routing layer and the SOC layer provides abstractions that mask the details characteristic to the network layer from the distributed computing semantics above. This thesis explores some of the opportunities and challenges raised by applying the SOC paradigm to mobile computing in ad hoc networks. It investigates the implications of disconnections on service advertising and discovery mechanisms. It addresses issues related to code migration in addition to physical host movement. It also investigates some of the security concerns in ad hoc networking service provision. It presents a novel routing algorithm for mobile ad hoc networks and a novel coordination model that addresses space and time explicitly

    Trusted community : a novel multiagent organisation for open distributed systems

    Get PDF
    [no abstract

    Design and optimization of optical grids and clouds

    Get PDF
    • …
    corecore