Search CORE

12 research outputs found

The ATLAS EventIndex: a BigData catalogue for all ATLAS experiment events

Author: Aleksandrov Igor
Alexandrov Evgeny
Baranowski Zbigniew
Barberis Dario
Canali Luca
Casani Alvaro Fernandez
Cherepanova Elizaveta
de la Hoz Santiago Gonzalez
Dimitrov Gancho
Favareto Andrea
Gallas Elizabeth J.
Hrivnac Julius
Iakovlev Alexander
Kazymov Andrei
Mineev Mikhail
Montoro Carlos Garcia
Perez Miguel Villaplana
Prokoshin Fedor
Rybkin Grigori
Salt Jose
Sanchez Javier
Sorokoletov Roman
Toebbicke Rainer
Vasileva Petya
Yuan Ruijun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/03/2023
Field of study

The ATLAS EventIndex system comprises the catalogue of all events collected, processed or generated by the ATLAS experiment at the CERN LHC accelerator, and all associated software tools to collect, store and query this information. ATLAS records several billion particle interactions every year of operation, processes them for analysis and generates even larger simulated data samples; a global catalogue is needed to keep track of the location of each event record and be able to search and retrieve specific events for in-depth investigations. Each EventIndex record includes summary information on the event itself and the pointers to the files containing the full event. Most components of the EventIndex system are implemented using BigData open-source tools. This paper describes the architectural choices and their evolution in time, as well as the past, current and foreseen future implementations of all EventIndex components.Comment: 21 page

arXiv.org e-Print Archive

EOS workshop

Author: Toebbicke Rainer
Publication venue
Publication date: 01/01/2019
Field of study

Interfacing EOS ACLs to Linux RichACLs triggered some enhancements, and a spin-off

CERN Document Server

Hadoop and friends - first experience at CERN with a new platform for high throughput analysis steps

Author: Duellmann Dirk
Menichetti Luca
Surdy Kacper
Toebbicke Rainer
Publication venue: 'IOP Publishing'
Publication date: 01/01/2017
Field of study

The statistical analysis of infrastructure metrics comes with several specific challenges, including the fairly large volume of unstructured metrics from a large set of independent data sources. Hadoop and Spark provide an ideal environment in particular for the first steps of skimming rapidly through hundreds of TB of low relevance data to find and extract the much smaller data volume that is relevant for statistical analysis and modelling. This presentation will describe the new Hadoop service at CERN and the use of several of its components for high throughput data aggregation and ad-hoc pattern searches. We will describe the hardware setup used, the service structure with a small set of decoupled clusters and the first experience with co-hosting different applications and performing software upgrades. We will further detail the common infrastructure used for data extraction and preparation from continuous monitoring and database input sources

CERN Document Server

A study of data representation in Hadoop to optimize data storage and search performance for the ATLAS EventIndex

Author: Baranowski Zbigniew
Barberis Dario
Canali Luca
Hrivnac Julius
Toebbicke Rainer
Publication venue: 'IOP Publishing'
Publication date: 10/10/2016
Field of study

International audienceThis paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions of event records, each of which consists of ∼100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. We report also on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports

HAL-IN2P3

Crossref

CERN Document Server

A study of data representations in Hadoop to optimize data storage and search performance of the ATLAS EventIndex

Author: Baranowski Zbigniew
Barberis Dario
Canali Luca
Hrivnac Julius
Toebbicke Rainer
Publication venue
Publication date: 21/09/2016
Field of study

This paper reports on the activities aimed at improving the architecture and performance of the ATLAS EventIndex implementation in Hadoop. The EventIndex contains tens of billions event records, each of which consisting of ~100 bytes, all having the same probability to be searched or counted. Data formats represent one important area for optimizing the performance and storage footprint of applications based on Hadoop. This work reports on the production usage and on tests using several data formats including Map Files, Apache Parquet, Avro, and various compression algorithms. The query engine plays also a critical role in the architecture. This paper reports on the use of HBase for the EventIndex, focussing on the optimizations performed in production and on the scalability tests. Additional engines that have been tested include Cloudera Impala, in particular for its SQL interface, and the optimizations for data warehouse workloads and reports

CERN Document Server

The ATLAS Event Index: The Architecture of the Core Engine

Author: Baranowski Zbigniew
Barberis Dario
Favareto Andrea
Hřivnáč Julius
Prokoshin Fedor
Rybkin Grigori
Toebbicke Rainer
Yuan Ruijun
Publication venue: 'IOP Publishing'
Publication date: 16/08/2017
Field of study

International audienceThe global view of the ATLAS Event Index system has been presented in the 17th ACAT Conference. This article concentrates on the architecture of the system core component. This component handles the final stage of the event metadata import. It organizes its storage and provides a fast and feature-rich access to all information. A user is able to interrogate metadata in various ways, including by executing user-provided code on the data to make selections and to interpret the results. A wide spectrum of clients is available, from a set of Linux-like commands to an interactive graphical Web Service. The stored event metadata contain the basic description of the related events, the references to the experiment event storage and the full trigger record and can be extended with other event characteristics. Derived collections of events can be created. Such collections can be annotated and tagged with further information

HAL-IN2P3

CERN Document Server

Hal-Diderot

ATLAS EventIndex General Dataflow and Monitoring Infrastructure

Author: Barberis Dario
Favareto Andrea
Fernandez Casani Alvaro
Garcia Montoro Carlos
Gonzalez de la Hoz Santiago
Hrivnac Julius
Prokoshin Fedor
Salt Jose
Sanchez Javier
Toebbicke Rainer
Yuan Ruijun
Publication venue
Publication date: 24/09/2016
Field of study

The ATLAS EventIndex has been running in production since mid-2015, reliably collecting information worldwide about all produced events and storing them in a central Hadoop infrastructure at CERN. A subset of this information is copied to an Oracle relational database for fast datasets discovery, event-picking, crosschecks with other ATLAS systems and checks for event duplication. The system design and its optimization is serving event picking from requests of a few events up to scales of tens of thousand of events, and in addition, data consistency checks are performed for large production campaigns. Detecting duplicate events with a scope of physics collections has recently arisen as an important use case. This paper describes the general architecture of the project and the data flow and operation issues, which are addressed by recent developments to improve the throughput of the overall system. In this direction, the data collection system is reducing the usage of the messaging infrastructure to overcome the performance shortcomings detected during production peaks; an object storage approach is instead used to convey the event index information, and messages to signal their location and status. Recent changes in the Producer/Consumer architecture are also presented in detail, as well as the monitoring infrastructure

CERN Document Server

The ATLAS Event Index: The Architecture of the Core Engine

Author: Andrea Favareto
Barberis D
Barberis D
Dario Barberis
Fedor Prokoshin
Grigori Rybkin
Julius Hřivnáč
Ormancey E
Rainer Toebbicke
Ruijun Yuan
The ATLAS Collaboration
Zbigniew Baranowski
Publication venue: 'IOP Publishing'
Publication date
Field of study

Crossref

The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

Author: Barberis Dario
Cardenas Zarate Simon Ernesto
Cranshaw Jack
Favareto Andrea
Fernandez Casani Alvaro
Gallas Elizabeth
Glasman Claudia
Gonzalez de la Hoz Santiago
Hrivnac Julius
Malon David
Prokoshin Fedor
Salt José
Sánchez Javier
Toebbicke Rainer
Yuan Ruijun
Publication venue
Publication date: 08/05/2015
Field of study

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LS1. The EventIndex is in operation for the start of LHC Run 2. This paper describes the high-level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, are also reported

CERN Document Server

The ATLAS EventIndex: architecture, design choices, deployment and first operation experience

Author: Barberis Dario
Cardenas Zarate Simon Ernesto
Cranshaw Jack
Favareto Andrea
Fernandez Casani Alvaro
Gallas Elizabeth
Glasman Claudia
Gonzalez de la Hoz Santiago
Hrivnac Julius
Malon David
Prokoshin Fedor
Rainer Toebbicke
Salt José
Sánchez Javier
Yuan Ruijun
Publication venue
Publication date: 23/03/2015
Field of study

The EventIndex is the complete catalogue of all ATLAS events, keeping the references to all files that contain a given event in any processing stage. It replaces the TAG database, which had been in use during LHC Run 1. For each event it contains its identifiers, the trigger pattern and the GUIDs of the files containing it. Major use cases are event picking, feeding the Event Service used on some production sites, and technical checks of the completion and consistency of processing campaigns. The system design is highly modular so that its components (data collection system, storage system based on Hadoop, query web service and interfaces to other ATLAS systems) could be developed separately and in parallel during LS1. The EventIndex is in operation for the start of LHC Run 2. This talk describes the high level system architecture, the technical design choices and the deployment process and issues. The performance of the data collection and storage systems, as well as the query services, will be reported

CERN Document Server