27,887 research outputs found

    Discovering Job Preemptions in the Open Science Grid

    Full text link
    The Open Science Grid(OSG) is a world-wide computing system which facilitates distributed computing for scientific research. It can distribute a computationally intensive job to geo-distributed clusters and process job's tasks in parallel. For compute clusters on the OSG, physical resources may be shared between OSG and cluster's local user-submitted jobs, with local jobs preempting OSG-based ones. As a result, job preemptions occur frequently in OSG, sometimes significantly delaying job completion time. We have collected job data from OSG over a period of more than 80 days. We present an analysis of the data, characterizing the preemption patterns and different types of jobs. Based on observations, we have grouped OSG jobs into 5 categories and analyze the runtime statistics for each category. we further choose different statistical distributions to estimate probability density function of job runtime for different classes.Comment: 8 page

    Efficient Task Replication for Fast Response Times in Parallel Computation

    Full text link
    One typical use case of large-scale distributed computing in data centers is to decompose a computation job into many independent tasks and run them in parallel on different machines, sometimes known as the "embarrassingly parallel" computation. For this type of computation, one challenge is that the time to execute a task for each machine is inherently variable, and the overall response time is constrained by the execution time of the slowest machine. To address this issue, system designers introduce task replication, which sends the same task to multiple machines, and obtains result from the machine that finishes first. While task replication reduces response time, it usually increases resource usage. In this work, we propose a theoretical framework to analyze the trade-off between response time and resource usage. We show that, while in general, there is a tension between response time and resource usage, there exist scenarios where replicating tasks judiciously reduces completion time and resource usage simultaneously. Given the execution time distribution for machines, we investigate the conditions for a scheduling policy to achieve optimal performance trade-off, and propose efficient algorithms to search for optimal or near-optimal scheduling policies. Our analysis gives insights on when and why replication helps, which can be used to guide scheduler design in large-scale distributed computing systems.Comment: Extended version of the 2-page paper accepted to ACM SIGMETRICS 201

    D-SPACE4Cloud: A Design Tool for Big Data Applications

    Get PDF
    The last years have seen a steep rise in data generation worldwide, with the development and widespread adoption of several software projects targeting the Big Data paradigm. Many companies currently engage in Big Data analytics as part of their core business activities, nonetheless there are no tools and techniques to support the design of the underlying hardware configuration backing such systems. In particular, the focus in this report is set on Cloud deployed clusters, which represent a cost-effective alternative to on premises installations. We propose a novel tool implementing a battery of optimization and prediction techniques integrated so as to efficiently assess several alternative resource configurations, in order to determine the minimum cost cluster deployment satisfying QoS constraints. Further, the experimental campaign conducted on real systems shows the validity and relevance of the proposed method

    A Taxonomy for Management and Optimization of Multiple Resources in Edge Computing

    Full text link
    Edge computing is promoted to meet increasing performance needs of data-driven services using computational and storage resources close to the end devices, at the edge of the current network. To achieve higher performance in this new paradigm one has to consider how to combine the efficiency of resource usage at all three layers of architecture: end devices, edge devices, and the cloud. While cloud capacity is elastically extendable, end devices and edge devices are to various degrees resource-constrained. Hence, an efficient resource management is essential to make edge computing a reality. In this work, we first present terminology and architectures to characterize current works within the field of edge computing. Then, we review a wide range of recent articles and categorize relevant aspects in terms of 4 perspectives: resource type, resource management objective, resource location, and resource use. This taxonomy and the ensuing analysis is used to identify some gaps in the existing research. Among several research gaps, we found that research is less prevalent on data, storage, and energy as a resource, and less extensive towards the estimation, discovery and sharing objectives. As for resource types, the most well-studied resources are computation and communication resources. Our analysis shows that resource management at the edge requires a deeper understanding of how methods applied at different levels and geared towards different resource types interact. Specifically, the impact of mobility and collaboration schemes requiring incentives are expected to be different in edge architectures compared to the classic cloud solutions. Finally, we find that fewer works are dedicated to the study of non-functional properties or to quantifying the footprint of resource management techniques, including edge-specific means of migrating data and services.Comment: Accepted in the Special Issue Mobile Edge Computing of the Wireless Communications and Mobile Computing journa
    • …
    corecore