121 research outputs found

    Correlated Resource Models of Internet End Hosts

    Get PDF
    Understanding and modelling resources of Internet end hosts is essential for the design of desktop software and Internet-distributed applications. In this paper we develop a correlated resource model of Internet end hosts based on real trace data taken from the SETI@home project. This data covers a 5-year period with statistics for 2.7 million hosts. The resource model is based on statistical analysis of host computational power, memory, and storage as well as how these resources change over time and the correlations between them. We find that resources with few discrete values (core count, memory) are well modeled by exponential laws governing the change of relative resource quantities over time. Resources with a continuous range of values are well modeled with either correlated normal distributions (processor speed for integer operations and floating point operations) or log-normal distributions (available disk space). We validate and show the utility of the models by applying them to a resource allocation problem for Internet-distributed applications, and demonstrate their value over other models. We also make our trace data and tool for automatically generating realistic Internet end hosts publicly available

    On Correlated Availability in Internet Distributed Systems

    No full text
    International audienceAs computer networks rapidly increase in size and speed, Internet-distributed systems such as P2P, volunteer computing, and Grid systems are increasingly common. A precise and accurate characterization of Internet resources is important for the design and evaluation of such Internet-distributed systems, yet our picture of the Internet landscape is not perfectly clear. To improve this picture, we measure and characterize the time dynamics of availability in a large-scale Internet-distributed system with over 110,000 hosts. Our characterization focuses on identifying patterns of correlated availability. We determine scalable and accurate clustering techniques and distance metrics for automatically detecting significant availability patterns. By means of clustering, we identify groups of resources with correlated availability that exhibit similar time effects. Then we show how these correlated clusters of resources can be used to improve resource management for parallel applications in the context of volunteer computing

    Decision Model for Cloud Computing under SLA Constraints

    Full text link
    International audienceWith the recent introduction of Spot Instances in the Amazon Elastic Compute Cloud (EC2), users can bid for resources and thus control the balance of reliability versus monetary costs. A critical challenge is to determine bid prices that minimize monetary costs for a user while meeting Service Level Agreement (SLA) constraints (for example, sufficient resource availability to complete a computation within a desired deadline). We propose a probabilistic model for the optimization of monetary costs, performance, and reliability, given user and application requirements and dynamic conditions. Using real instance price traces and workload models, we evaluate our model and demonstrate how users should bid optimally on Spot Instances to reach different objectives with desired levels of confidence

    Decision Model for Cloud Computing under SLA Constraints

    Get PDF
    With the recent introduction of Spot Instances in the Amazon Elastic Compute Cloud (EC2), users can bid for resources and thus control the balance of reliability versus monetary costs. A critical challenge is to determine bid prices that minimize monetary costs for a user while meeting Service Level Agreement (SLA) constraints (for example, sufficient re- source availability to complete a computation within a desired deadline). We propose a probabilistic model for the optimization of monetary costs, performance, and reliability, given user and application requirements and dynamic conditions. Using real instance price traces and workload models, we evaluate our model and demonstrate how users should bid optimally on Spot Instances to reach different objectives with desired levels of confidence

    Intermediate QoS Prototype for the EDGI Infrastructure

    Get PDF
    This document provides the first deliverable of EDGI JRA2. It is produced by the INRIA team, the SZTAKI team, the LAL/IN2P3 team and the University of Coimbra team. This document aims at describing achievements and results of JRA2 tasks "Advanced QoS Scheduler and Oracle" and "Support In Science Gateway". Hybrid Distributed Computing Infrastructures (DCIs) allow users to combine Grids, Desktop Grids, Clouds, etc. to obtain for their users large computing capabilities. The EDGI infrastructure belongs to this kind of DCIs. The document presents the SpeQuloS framework to provide quality of service (QoS) for application executed on the EDGI infrastructure. It also introduces EDGI QoS portal, an user-friendly and integrated access to QoS features for users of EDGI infrastructure. In this document, we first introduce new results from JRA2.1 task, which collected and analyzed batch execution on Desktop Grid. Then, we present the advanced Cloud Scheduling and Oracle strategies designed inside the SpeQuloS framework (task JRA2.2). We demonstrate efficiency of these strategies using performance evaluation carried out with simulations. Next, we introduce Credit System architecture and QoS user portal as part of the JRA2 Support In Science Gateway (task JRA2.3). Finally, we conclude and provide references to JRA2 production.Ce document fournit le premier livrable pour la tâche JRA2 du projet européen European Desktop Grid Initiative (FP7 EDGI). Il est produit par les équipes de l'INRIA, de SZTAKI, du LAL/IN2P3 et de l'Université de Coimbra. Ce document vise à décrire les réalisations et les résultats qui concernent la qualité de service pour l'infrastructure de grilles de PCs européenne EDGI

    On Correlated Availability in Internet Distributed Systems

    Get PDF
    International audienceAs computer networks rapidly increase in size and speed, Internet-distributed systems such as P2P, volunteer computing, and Grid systems are increasingly common. A precise and accurate characterization of Internet resources is important for the design and evaluation of such Internet-distributed systems, yet our picture of the Internet landscape is not perfectly clear. To improve this picture, we measure and characterize the time dynamics of availability in a large-scale Internet-distributed system with over 110,000 hosts. Our characterization focuses on identifying patterns of correlated availability. We determine scalable and accurate clustering techniques and distance metrics for automatically detecting significant availability patterns. By means of clustering, we identify groups of resources with correlated availability that exhibit similar time effects. Then we show how these correlated clusters of resources can be used to improve resource management for parallel applications in the context of volunteer computing

    Characterizing Result Errors in Internet Desktop Grids

    Get PDF
    Desktop grids use the free resources in Intranet and Internet environments for large-scale computation and storage. While desktop grids offer a high return on investment, one critical issue is the validation of results returned by participating hosts. Several mechanisms for result validation have been previously proposed. However, the characterization of errors is poorly understood. To study error rates, we implemented and deployed a desktop grid application across several thousand hosts distributed over the Internet. We then analyzed the results to give quantitative, empirical characterization of errors rates. We find that in practice, error rates are widespread across hosts but occur relatively infrequently. Moreover, we find that error rates tend to not be stationary over time nor correlated between hosts. In light of these characterization results, we evaluated state-of-the-art error detection mechanisms and describe the trade-offs for using each mechanism. Finally, based on our empirical results, we conduct a benefit analysis of a recently proposed mechanism for error detection tailored for long-running applications. This mechanism is based on using the digest of intermediate checkpoints, and we show in theory and simulation that the relative benefit of this method compared to the state-of-the-art is as high as 45\%

    Mining for Availability Models in Large-Scale Distributed Systems:A Case Study of SETI@home

    Get PDF
    In the age of cloud, Grid, P2P, and volunteer distributed computing, large-scale systems with tens of thousands of unreliable hosts are increasingly common. Invariably, these systems are composed of heterogeneous hosts whose individual availability often exhibit different statistical properties (for example stationary versus non-stationary behaviour) and fit different models (for example Exponential, Weibull, or Pareto probability distributions). In this paper, we describe an effective method for discovering subsets of hosts whose availability have similar statistical properties and can be modelled with similar probability distributions. We apply this method with about 230,000 host availability traces obtained from a real large-scale Internet-distributed system, namely SETI@home. We find that about 34% of hosts exhibit availability that is a truly random process, and that these hosts can often be modelled accurately with a few distinct distributions from different families. We believe that this characterization is fundamental in the design of stochastic scheduling algorithms across large-scale systems where host availability is uncertain

    DSL-Lab: a Low-power Lightweight Platform to Experiment on Domestic Broadband Internet

    Get PDF
    International audienceThis article presents the design and building of DSL-Lab, a platform to experiment on distributed computing over broadband domestic Internet. Experimental platforms such as PlanetLab and Grid'5000 are promising methodological approaches to study distributed systems. However, both platforms focus on high-end service and network deployments only available on a restricted part of the Internet, leaving aside the possibility for researchers to experiment in conditions close to what is usually available with domestic connection to the Internet. DSL-Lab is a complementary approach to PlanetLab and Grid'5000 to experiment with distributed computing in an environment closer to how Internet appears, when applications are run on end-user PCs. DSL-Lab is a set of 40 low-power and low-noise nodes, which are hosted by participants, using the participants' xDSL or cable access to the Internet. The objective is to provide a validation and experimentation platform for new protocols, services, simulators and emulators for these systems. In this paper, we report on the software design (security, resources allocation, power management) as well as on the first experiments achieved

    DSL-Lab: a Platform to Experiment on Domestic Broadband Internet

    Get PDF
    This report presents the design and building of DSL-Lab, a platform for distributed computing and peer-to-peer experiments over the domestic broadband Internet. Experimental platforms such as PlanetLab and Grid'5000 are promising methodological approaches for studying distributed systems. However, both platforms focus on high-end services and network deployments on only a restricted part of the Internet, and as such, they do not provide experimental conditions of residential broadband networks. DSL-Lab is composed of 40 low-power and noiseless nodes, which are hosted by participants, using users' xDSL or cable access to the Internet. The objective is twofold: 1) to provide accurate and customized measures of availability, activity and performance in order to characterize and tune the models of such resources~; 2) to provide an experimental platform for new protocols, services and applications, as well as a validation tool for simulators and emulators targeting these systems. In this article, we report on the software infrastructure (security, resources allocation, power management) as well as on the first results and experiments achieved