17 research outputs found

    Build-and-test workloads for grid middleware: Problem, analysis, and applications

    No full text
    The Grid promise is starting to materialize today: largescale multi-site infrastructures have grown to assist the work of scientists from all around the world. This tremendous growth can be sustained and continued only through a higher quality of the middleware, in terms of deployability and of correct functionality. A potential solution to this problem is the adoption of industry practices regarding middleware building and testing. However, it is unclear what good build-and-test environments for grid middleware should look like, and how to use them efficiently. In this work we address both these problems. First, we study the characteristics of the NMI build-and-test environment, which handles millions of testing tasks annually, for major Grid middleware such as Condor, Globus, VDT, and gLite. Through the analysis of a system-wide trace covering the past two years we find the main characteristics of the workload, as well as the performance of the system under load. Second, we propose mechanisms for more efficient test management and operation, and for resource provisioning and evaluation. Notably, we propose a generic test optimization technique that reduces the test time by 95%, while achieving 93% of the maximum accuracy, under real conditions.Electrical Engineering, Mathematics and Computer Scienc

    GrenchMark: A Framework For . . .

    No full text

    C-Meter:A Framework for Performance Analysis of Computing Clouds”, IEEE/ACM symposium on cluster computing and cloud,

    No full text
    Abstract-Cloud computing has emerged as a new technology that provides large amounts of computing and data storage capacity to its users with a promise of increased scalability, high availability, and reduced administration and maintenance costs. As the use of cloud computing environments increases, it becomes crucial to understand the performance of these environments. So, it is of great importance to assess the performance of computing clouds in terms of various metrics, such as the overhead of acquiring and releasing the virtual computing resources, and other virtualization and network communications overheads. To address these issues, we have designed and implemented C-Meter, which is a portable, extensible, and easy-to-use framework for generating and submitting test workloads to computing clouds. In this paper, first we state the requirements for frameworks to assess the performance of computing clouds. Then, we present the architecture of the C-Meter framework and discuss several cloud resource management alternatives. Finally, we present our early experiences with C-Meter in Amazon EC2. We show how C-Meter can be used for assessing the overhead of acquiring and releasing the virtual computing resources, for comparing different configurations, and for evaluating different scheduling algorithms

    2Fast: Collaborative Downloads in P2P Networks

    No full text

    DGSim : comparing grid resource management architectures through trace-based simulation

    No full text
    Many advances in grid resource management are still required to realize the grid computing vision of the integration of a world-wide computing infrastructure for scientific use. The pressure for advances is increased by the fast evolution of single, large clusters, which are the primary technological alternative to grids. However, advances in grid resource management cannot be achieved without an appropriate toolbox, of which simulation environments form an essential part. The current grid simulation environments still lack important workload and system modeling features, and research productivity features such as automated experiment setup and management. In this paper we address these issues through the design and a reference implementation of DGSim, a framework for simulating grid resource management architectures. DGSim introduces the concepts of grid evolution and of job selection policy, and extends towards realism the current body of knowledge on grid inter-operation, on grid dynamics, and on workload modeling. We also show through two real use cases how DGSim can be used to compare grid resource management architectures

    Scheduling jobs in the cloud using on-demand and reserved instances

    No full text
    Deploying applications in leased cloud infrastructure is increasingly considered by a variety of business and service integrators. However, the challenge of selecting the leasing strategy — larger or faster instances? on-demand or reserved instances? etc.— and to configure the leasing strategy with appropriate scheduling policies is still daunting for the (potential) cloud user. In this work, we investigate leasing strategies and their policies from a broker’s perspective. We propose, CoH, a family of Cloud-based, online, Hybrid scheduling policies that minimizes rental cost by making use of both on-demand and reserved instances. We formulate the resource provisioning and job allocation policies as Integer Programming problems. As the policies need to be executed online, we limit the time to explore the optimal solution of the integer program, and compare the obtained solution with various heuristics-based policies; then automatically pick the best one. We show, via simulation and using multiple real-world traces, that the hybrid leasing policy can obtain significantly lower cost than typical heuristics-based policies

    A model for space-correlated failures in large-scale distributed systems

    No full text
    Distributed systems such as grids, peer-to-peer systems, and even Internet DNS servers have grown significantly in size and complexity in the last decade. This rapid growth has allowed distributed systems to serve a large and increasing number of users, but has also made resource and system failures inevitable. Moreover, perhaps as a result of system complexity, in distributed systems a single failure can trigger within a short time span several more failures, forming a group of time-correlated failures. To eliminate or alleviate the significant effects of failures on performance and functionality, the techniques for dealing with failures require good failure models. However, not many such models are available, and the available models are valid for few or even a single distributed system. In contrast, in this work we propose a model that considers groups of time-correlated failures and is valid for many types of distributed systems. Our model includes three components, the group size, the group inter-arrival time, and the resource downtime caused by the group. To validate this model, we use failure traces corresponding to fifteen distributed systems. We find that space-correlated failures are dominant in terms of resource downtime in seven of the fifteen studied systems. For each of these seven systems, we provide a set of model parameters that can be used in research studies or for tuning distributed systems. Last, as a result of our work six of the studied traces have been made available through the Failure Trace Archive ( http://fta.inria.fr )

    Sampling bias in BitTorrent measurements

    No full text
    Real-world measurements play an important role in understanding the characteristics and in improving the operation of BitTorrent, which is currently a popular Internet application. Much like measuring the Internet, the complexity and scale of the BitTorrent network make a single, complete measurement impractical. While a large number of measurements have already employed diverse sampling techniques to study parts of BitTorrent network, until now there exists no investigation of their sampling bias, that is, of their ability to objectively represent the characteristics of BitTorrent. In this work we present the first study of the sampling bias in BitTorrent measurements. We first introduce a novel taxonomy of sources of sampling bias in BitTorrent measurements. We then investigate the sampling among fifteen long-term BitTorrent measurements completed between 2004 and 2009, and find that different data sources and measurement techniques can lead to significantly different measurement results. Last, we formulate three recommendations to improve the design of future BitTorrent measurements, and estimate the cost of using these recommendations in practice
    corecore