34 research outputs found

    The ATLAS EventIndex: a BigData catalogue for all ATLAS experiment events

    Full text link
    The ATLAS EventIndex system comprises the catalogue of all events collected, processed or generated by the ATLAS experiment at the CERN LHC accelerator, and all associated software tools to collect, store and query this information. ATLAS records several billion particle interactions every year of operation, processes them for analysis and generates even larger simulated data samples; a global catalogue is needed to keep track of the location of each event record and be able to search and retrieve specific events for in-depth investigations. Each EventIndex record includes summary information on the event itself and the pointers to the files containing the full event. Most components of the EventIndex system are implemented using BigData open-source tools. This paper describes the architectural choices and their evolution in time, as well as the past, current and foreseen future implementations of all EventIndex components.Comment: 21 page

    Deployment and Operation of the ATLAS EventIndex for LHC Run 3

    Get PDF
    The ATLAS Eventlndex is the global catalogue of all ATLAS real and simulated events. During the LHC long shutdown between Run 2 (20152018) and Run 3 (2022-2025) all its components were substantially revised and a new system was deployed for the start of Run 3 in Spring 2022. The new core storage system, based on HBase tables with a SQL interface provided by Phoenix, allows much faster data ingestion rates and scales much better than the old one to the data rates expected for the end of Run 3 and beyond. All user interfaces were also revised and a new command-line interface and web services were also deployed. The new system was initially populated with all existing data relative to Run 1 and Run 2 datasets, and then put online to receive Run 3 data in real time. After extensive testing, the old system, which ran in parallel to the new one for a few months, was finally switched off in October 2022. This paper describes the new system, the move of all existing data from the old to the new storage schemas and the operational experience gathered so far

    ATLAS EventIndex Data Collection Supervisor and Web Interface

    No full text
    The EventIndex project consists in the development and deployment of a complete catalogue of events for the ATLAS experiment [1][2] at the LHC accelerator at CERN. In 2015 the ATLAS experiment has produced 12 billion real events in 1 million files, and 5 billion simulated events in 8 million files. The ATLAS EventIndex is running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure. A subset of this information is copied to an Oracle relational database. This paper presents two components of the ATLAS EventIndex [3]: its data collection supervisor and its web interface partner

    Distributed Data Collection for the ATLAS EventIndex

    No full text
    The ATLAS EventIndex contains records of all events processed by ATLAS, in all processing stages. These records include the references to the files containing each event (the GUID of the file) and the internal “pointer” to each event in the file. This information is collected by all jobs that run at Tier-0 or on the Grid and process ATLAS events. Each job produces a snippet of information for each permanent output file. This information is packed and transferred to a central broker at CERN using an ActiveMQ messaging system, and then is unpacked, sorted and reformatted in order to be stored and catalogued into a central Hadoop server. This contribution describes in detail the Producer/Consumer architecture to convey this information from the running jobs through the messaging system to the Hadoop server

    Designing Alternative Transport Methods for the Distributed Data Collection of ATLAS EventIndex Project

    No full text
    One of the key and challenging tasks of the ATLAS EventIndex project is to index and catalog all the produced events not only at CERN but also at hundreds of worldwide grid sites, and convey the data in real time to a central Hadoop instance at CERN. While this distributed data collection is currently operating correctly in production, there are some issues that might impose performance bottlenecks in the future, with an expected rise in the event production and reprocessing rates. In this work, we first describe the current approach based on a messaging system, which conveys the data from the sources to the central catalog, and we identify some weaknesses of this system. Then, we study a promising alternative transport method based on an object store, presenting a performance comparison with the current approach, and the architectural design changes needed to adapt the system to the next run of the ATLAS experiment at CERN

    Distributed Data Collection For Next Generation ATLAS EventIndex Project

    No full text
    The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Object Stores to temporary maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata. This motivation arises in order to accommodate future data processing and storage technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heavily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex will evolve towards a generalized event WhiteBoard, with the ability to build collections and virtual datasets for end users. This paper describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4

    Distributed Data Collection for the Next Generation ATLAS EventIndex Project

    No full text
    The ATLAS EventIndex currently runs in production in order to build a complete catalogue of events for experiments with large amounts of data. The current approach is to index all final produced data files at CERN Tier0, and at hundreds of grid sites, with a distributed data collection architecture using Ob- ject Stores to temporarily maintain the conveyed information, with references to them sent with a Messaging System. The final backend of all the indexed data is a central Hadoop infrastructure at CERN; an Oracle relational database is used for faster access to a subset of this information. In the future of ATLAS, instead of files, the event should be the atomic information unit for metadata. This motivation arises in order to accommodate future data processing and stor- age technologies. Files will no longer be static quantities, possibly dynamically aggregating data, and also allowing event-level granularity processing in heav- ily parallel computing environments. It also simplifies the handling of loss and or extension of data. In this sense the EventIndex may evolve towards a general- ized whiteboard, with the ability to build collections and virtual datasets for end users. This paper describes the current Distributed Data Collection Architecture of the ATLAS EventIndex project, with details of the Producer, Consumer and Supervisor entities, and the protocol and information temporarily stored in the ObjectStore. It also shows the data flow rates and performance achieved since the new Object Store as temporary store approach was put in production in July 2017. We review the challenges imposed by the expected increasing rates that will reach 35 billion new real events per year in Run 3, and 100 billion new real events per year in Run 4. For simulated events the numbers are even higher, with 100 billion events/year in run 3, and 300 billion events/year in run 4. We also outline the challenges we face in order to accommodate future use cases in the EventIndex

    ATLAS EventIndex General Dataflow and Monitoring Infrastructure

    No full text
    The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast datasets discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure

    A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage

    Get PDF
    The ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which look promising at the design level. This environment is used to build a prototype to measure the scaling capabilities as functions of data input rates, total data volumes and data query and retrieval rates. In this proceedings we report on the selected data schemas and on the current performance measurements with the Kudu prototype

    A prototype for the evolution of ATLAS EventIndex based on Apache Kudu storage

    No full text
    International audienceThe ATLAS EventIndex has been in operation since the beginning of LHC Run 2 in 2015. Like all software projects, its components have been constantly evolving and improving in performance. The main data store in Hadoop, based on MapFiles and HBase, can work for the rest of Run 2 but new solutions are explored for the future. Kudu offers an interesting environment, with a mixture of BigData and relational database features, which look promising at the design level. This environment is used to build a prototype to measure the scaling capabilities as functions of data input rates, total data volumes and data query and retrieval rates. In this proceedings we report on the selected data schemas and on the current performance measurements with the Kudu prototype
    corecore