28 research outputs found
Energy Efficient Scheduling of MapReduce over Big Data
The majority of large-scale data intensive applications carried out by information centers are based on MapReduce or its open-source implementation, Hadoop. Such applications are carried out on rich clusters requiring ample amounts of energy, helping the energy costs an appreciable fraction of the data centers overall costs. Therefore, reducing the energy consumption when carrying out each MapReduce task is a critical worry for data centers. In this paper, we advise a framework for mending the energy ef?ciency of MapReduce applications, while satisfying the (SLA) Service Level Agreement. We ?rst prototype the problem of energy-aware scheduling of a single MapReduce task as an Integer Program. After that we court two algorithms, known as MapReduce scheduling algorithms and load scheduling algorithm, that ?nd the assignments of map and reduce tasks to the machines plenty in order to reduce the energy consumed when carrying out the application. The energy aware con?guration and scheduling will improve the energy e?ciency of MapReduce clusters thus help in reduction of the service costs of the data-centers
WLCG Authorisation from X.509 to Tokens
The WLCG Authorisation Working Group was formed in July 2017 with the
objective to understand and meet the needs of a future-looking Authentication
and Authorisation Infrastructure (AAI) for WLCG experiments. Much has changed
since the early 2000s when X.509 certificates presented the most suitable
choice for authorisation within the grid; progress in token based authorisation
and identity federation has provided an interesting alternative with notable
advantages in usability and compatibility with external (commercial) partners.
The need for interoperability in this new model is paramount as infrastructures
and research communities become increasingly interdependent. Over the past two
years, the working group has made significant steps towards identifying a
system to meet the technical needs highlighted by the community during staged
requirements gathering activities. Enhancement work has been possible thanks to
externally funded projects, allowing existing AAI solutions to be adapted to
our needs. A cornerstone of the infrastructure is the reliance on a common
token schema in line with evolving standards and best practices, allowing for
maximum compatibility and easy cooperation with peer infrastructures and
services. We present the work of the group and an analysis of the anticipated
changes in authorisation model by moving from X.509 to token based
authorisation. A concrete example of token integration in Rucio is presented.Comment: 8 pages, 3 figures, to appear in the proceedings of CHEP 201
Third-party transfers in WLCG using HTTP
Since its earliest days, the Worldwide LHC Computational Grid (WLCG) has
relied on GridFTP to transfer data between sites. The announcement that Globus
is dropping support of its open source Globus Toolkit (GT), which forms the
basis for several FTP client and servers, has created an opportunity to
reevaluate the use of FTP. HTTP-TPC, an extension to HTTP compatible with
WebDAV, has arisen as a strong contender for an alternative approach.
In this paper, we describe the HTTP-TPC protocol itself, along with the
current status of its support in different implementations, and the
interoperability testing done within the WLCG DOMA working group's TPC
activity. This protocol also provides the first real use-case for token-based
authorisation for this community. We will demonstrate the benefits of such
authorisation by showing how it allows HTTP-TPC to support new technologies
(such as OAuth, OpenID Connect, Macaroons and SciTokens) without changing the
protocol. We will also discuss the next steps for HTTP-TPC and the plans to use
the protocol for WLCG transfers.Comment: 7 pages, 3 figures, to appear in the proceedings of CHEP 202
Simulating the ATLAS Distributed Data Management System
The ATLAS Distributed Data Management system organizes more than 90PB of physics data across more than 100 sites globally. Over 14 million files are transferred daily with strongly varying usage patterns. For performance and scalability reasons it is imperative to adapt and improve the data management system continuously. Therefore future system modifications in hardware, software, as well as policy, need to be evaluated to accomplish the intended results and to avoid unwanted side effects. Due to the complexity of large- scale distributed systems this evaluation process is primarily based on expert knowledge, as conventional evaluation methods are inadequate. However, this error-prone process lacks quantitative estimations and leads to inaccuracy as well as incorrect conclusions. In this work we present a novel, full-scale simulation framework. This modular simulator is able to accurately model the ATLAS Distributed Data Management system. The design and architecture of the component-based software is presented and discussed. The evaluation is based on the comparison with historical workloads and concentrates on the accuracy of the simulation framework. Our results show that we can accurately model the distributed data management system within 80%
The ATLAS Distributed Data Management System & Databases
The ATLAS Distributed Data Management (DDM) System is responsible for the global management of petabytes of high energy physics data. The current system, DQ2, has a critical dependency on Relational Database Management Systems (RDBMS), like Oracle. RDBMS are well-suited to enforcing data integrity in online transaction processing applications, however, concerns have been raised about the scalability of its data warehouse-like workload. In particular, analysis of archived data or aggregation of transactional data for summary purposes is problematic. Therefore, we have evaluated new approaches to handle vast amounts of data. We have investigated a class of database technologies commonly referred to as NoSQL databases. This includes distributed filesystems, like HDFS, that support parallel execution of computational tasks on distributed data, as well as schema-less approaches via key-value stores, like HBase. In this talk we will describe our use cases in ATLAS, share our experiences with various databases used in production and present the database technologies envisaged for the next-generation DDM system, Rucio. Rucio is an evolution of the ATLAS DDM system which addresses the scalability issues observed in DQ2
The ATLAS Data Management System Rucio: Supporting LHC Run-2 and beyond
With this contribution we present some recent developments made to Rucio, the data management system of the High-Energy Physics Experiment ATLAS. Already managing 300 Petabytes of both official and user data, Rucio has seen incremental improvements throughout LHC Run-2, and is currently laying the groundwork for HEP computing in the HL-LHC era. The focus of this contribution are (a) the automations that have been put in place such as data rebalancing or dynamic replication of user data, as well as their supporting infrastructures such as real-time networking metrics or transfer time predictions; (b) the flexible approach towards inclusion of heterogeneous storage systems, including object stores, while unifying the potential access paths using generally available tools and protocols; (c) machine learning approaches to help with transfer throughput estimation; and (d) the adoption of Rucio for two other experiments, AMS and Xenon1t. We conclude by presenting operational numbers and figures to quantify these improvements, and extrapolate the necessary changes and developments for future LHC runs
C3PO - A Dynamic Data Placement Agent for ATLAS Distributed Data Management
This contribution introduces a new dynamic data placement agent for the ATLAS distributed data management system. This agent is designed to pre-place potentially popular data to make it more widely available. It uses data from a variety of sources. Those include input datasets and sites workload information from the ATLAS workload management system, network metrics from different sources like FTS and PerfSonar, historical popularity data collected through a tracer mechanism and more. With this data it decides if, when and where to place new replicas that then can be used by the WMS to distribute the workload more evenly over available computing resources and then ultimately reduce job waiting times. The new replicas are created with a short lifetime that gets extended, when the data is accessed and therefore the system behaves like a big cache. This paper gives an overview of the architecture and the final implementation of this new agent. The paper also includes an evaluation of different placement algorithms by comparing the transfer times and the new replica usage