12 research outputs found

    The ATLAS EventIndex: a BigData catalogue for all ATLAS experiment events

    Full text link
    The ATLAS EventIndex system comprises the catalogue of all events collected, processed or generated by the ATLAS experiment at the CERN LHC accelerator, and all associated software tools to collect, store and query this information. ATLAS records several billion particle interactions every year of operation, processes them for analysis and generates even larger simulated data samples; a global catalogue is needed to keep track of the location of each event record and be able to search and retrieve specific events for in-depth investigations. Each EventIndex record includes summary information on the event itself and the pointers to the files containing the full event. Most components of the EventIndex system are implemented using BigData open-source tools. This paper describes the architectural choices and their evolution in time, as well as the past, current and foreseen future implementations of all EventIndex components.Comment: 21 page

    EOS workshop

    No full text
    Interfacing EOS ACLs to Linux RichACLs triggered some enhancements, and a spin-off

    Hadoop and friends - first experience at CERN with a new platform for high throughput analysis steps

    No full text
    The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources

    A study of data representation in Hadoop to optimize data storage and search performance for the ATLAS EventIndex

    No full text
    International audienceThis paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions of event records, each of which consists of ∌100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. We report also on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports

    A study of data representations in Hadoop to optimize data storage and search performance of the ATLAS EventIndex

    No full text
    This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions event records, each of which consisting of ~100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. This paper reports on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports

    The ATLAS Event Index: The Architecture of the Core Engine

    No full text
    International audienceThe global view of the ATLAS Event Index system has been presented in the 17th ACAT Conference. This article concentrates on the architecture of the system core component. This component handles the final stage of the event metadata import. It organizes its storage and provides a fast and feature-rich access to all information. A user is able to interrogate metadata in various ways, including by executing user-provided code on the data to make selections and to interpret the results. A wide spectrum of clients is available, from a set of Linux-like commands to an interactive graphical Web Service. The stored event metadata contain the basic description of the related events, the references to the experiment event storage and the full trigger record and can be extended with other event characteristics. Derived collections of events can be created. Such collections can be annotated and tagged with further information

    ATLAS EventIndex General Dataflow and Monitoring Infrastructure

    No full text
    The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast datasets discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure

    The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

    No full text
    The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LS1. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported

    The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

    No full text
    The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LS1. The EventIndex is in operation for the start of LHC Run 2. This talk describes the high level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, will be reported
    corecore