28 research outputs found

    Energy Efficient Scheduling of MapReduce over Big Data

    Get PDF
    The majority of large-scale data intensive applications carried out by information centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are carried out on rich clusters requiring ample amounts of energy, helping the energy costs an appreciable fraction of the data centers overall costs. Therefore, reducing the energy consumption when carrying out each MapReduce task is a critical worry for data centers. In this paper, we advise a framework for mending the energy ef?ciency of MapReduce applications, while satisfying the (SLA) Service Level Agreement. We ?rst prototype the problem of energy-aware scheduling of a single MapReduce task as an Integer Program. After that we court two algorithms, known as MapReduce scheduling algorithms and load scheduling algorithm, that ?nd the assignments of map and reduce tasks to the machines plenty in order to reduce the energy consumed when carrying out the application. The energy aware con?guration and scheduling will improve the energy e?ciency of MapReduce clusters thus help in reduction of the service costs of the data-centers

    WLCG Authorisation from X.509 to Tokens

    Full text link
    The WLCG Authorisation Working Group was formed in July 2017 with the objective to understand and meet the needs of a future-looking Authentication and Authorisation Infrastructure (AAI) for WLCG experiments. Much has changed since the early 2000s when X.509 certificates presented the most suitable choice for authorisation within the grid; progress in token based authorisation and identity federation has provided an interesting alternative with notable advantages in usability and compatibility with external (commercial) partners. The need for interoperability in this new model is paramount as infrastructures and research communities become increasingly interdependent. Over the past two years, the working group has made significant steps towards identifying a system to meet the technical needs highlighted by the community during staged requirements gathering activities. Enhancement work has been possible thanks to externally funded projects, allowing existing AAI solutions to be adapted to our needs. A cornerstone of the infrastructure is the reliance on a common token schema in line with evolving standards and best practices, allowing for maximum compatibility and easy cooperation with peer infrastructures and services. We present the work of the group and an analysis of the anticipated changes in authorisation model by moving from X.509 to token based authorisation. A concrete example of token integration in Rucio is presented.Comment: 8 pages, 3 figures, to appear in the proceedings of CHEP 201

    Third-party transfers in WLCG using HTTP

    Full text link
    Since its earliest days, the Worldwide LHC Computational Grid (WLCG) has relied on GridFTP to transfer data between sites. The announcement that Globus is dropping support of its open source Globus Toolkit (GT), which forms the basis for several FTP client and servers, has created an opportunity to reevaluate the use of FTP. HTTP-TPC, an extension to HTTP compatible with WebDAV, has arisen as a strong contender for an alternative approach. In this paper, we describe the HTTP-TPC protocol itself, along with the current status of its support in different implementations, and the interoperability testing done within the WLCG DOMA working group's TPC activity. This protocol also provides the first real use-case for token-based authorisation for this community. We will demonstrate the benefits of such authorisation by showing how it allows HTTP-TPC to support new technologies (such as OAuth, OpenID Connect, Macaroons and SciTokens) without changing the protocol. We will also discuss the next steps for HTTP-TPC and the plans to use the protocol for WLCG transfers.Comment: 7 pages, 3 figures, to appear in the proceedings of CHEP 202

    Simulating the ATLAS Distributed Data Management System

    No full text
    The ATLAS Distributed Data Management system organizes more than 90PB of physics data across more than 100 sites globally. Over 14 million files are transferred daily with strongly varying usage patterns. For performance and scalability reasons it is imperative to adapt and improve the data management system continuously. Therefore future system modifications in hardware, software, as well as policy, need to be evaluated to accomplish the intended results and to avoid unwanted side effects. Due to the complexity of large- scale distributed systems this evaluation process is primarily based on expert knowledge, as conventional evaluation methods are inadequate. However, this error-prone process lacks quantitative estimations and leads to inaccuracy as well as incorrect conclusions. In this work we present a novel, full-scale simulation framework. This modular simulator is able to accurately model the ATLAS Distributed Data Management system. The design and architecture of the component-based software is presented and discussed. The evaluation is based on the comparison with historical workloads and concentrates on the accuracy of the simulation framework. Our results show that we can accurately model the distributed data management system within 80%

    The ATLAS Distributed Data Management System & Databases

    No full text
    The ATLAS Distributed Data Management (DDM) System is responsible for the global management of petabytes of high energy physics data. The current system, DQ2, has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle. RDBMS are well-suited to enforcing data integrity in online transaction processing applications, however, concerns have been raised about the scalability of its data warehouse-like workload. In particular, analysis of archived data or aggregation of transactional data for summary purposes is problematic. Therefore, we have evaluated new approaches to handle vast amounts of data. We have investigated a class of database technologies commonly referred to as NoSQL databases. This includes distributed filesystems, like HDFS, that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value stores, like HBase. In this talk we will describe our use cases in ATLAS, share our experiences with various databases used in production and present the database technologies envisaged for the next-generation DDM system, Rucio. Rucio is an evolution of the ATLAS DDM system which addresses the scalability issues observed in DQ2

    The ATLAS Data Management System Rucio: Supporting LHC Run-2 and beyond

    No full text
    With this contribution we present some recent developments made to Rucio, the data management system of the High-Energy Physics Experiment ATLAS. Already managing 300 Petabytes of both official and user data, Rucio has seen incremental improvements throughout LHC Run-2, and is currently laying the groundwork for HEP computing in the HL-LHC era. The focus of this contribution are (a) the automations that have been put in place such as data rebalancing or dynamic replication of user data, as well as their supporting infrastructures such as real-time networking metrics or transfer time predictions; (b) the flexible approach towards inclusion of heterogeneous storage systems, including object stores, while unifying the potential access paths using generally available tools and protocols; (c) machine learning approaches to help with transfer throughput estimation; and (d) the adoption of Rucio for two other experiments, AMS and Xenon1t. We conclude by presenting operational numbers and figures to quantify these improvements, and extrapolate the necessary changes and developments for future LHC runs

    C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management

    No full text
    This contribution introduces a new dynamic data placement agent for the ATLAS distributed data management system. This agent is designed to pre-place potentially popular data to make it more widely available. It uses data from a variety of sources. Those include input datasets and sites workload information from the ATLAS workload management system, network metrics from different sources like FTS and PerfSonar, historical popularity data collected through a tracer mechanism and more. With this data it decides if, when and where to place new replicas that then can be used by the WMS to distribute the workload more evenly over available computing resources and then ultimately reduce job waiting times. The new replicas are created with a short lifetime that gets extended, when the data is accessed and therefore the system behaves like a big cache. This paper gives an overview of the architecture and the final implementation of this new agent. The paper also includes an evaluation of different placement algorithms by comparing the transfer times and the new replica usage
    corecore