Search CORE

35 research outputs found

The ATLAS EventIndex: a BigData catalogue for all ATLAS experiment events

Author: Aleksandrov Igor
Alexandrov Evgeny
Baranowski Zbigniew
Barberis Dario
Canali Luca
Casani Alvaro Fernandez
Cherepanova Elizaveta
de la Hoz Santiago Gonzalez
Dimitrov Gancho
Favareto Andrea
Gallas Elizabeth J.
Hrivnac Julius
Iakovlev Alexander
Kazymov Andrei
Mineev Mikhail
Montoro Carlos Garcia
Perez Miguel Villaplana
Prokoshin Fedor
Rybkin Grigori
Salt Jose
Sanchez Javier
Sorokoletov Roman
Toebbicke Rainer
Vasileva Petya
Yuan Ruijun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/03/2023
Field of study

The ATLAS EventIndex system comprises the catalogue of all events collected, processed or generated by the ATLAS experiment at the CERN LHC accelerator, and all associated software tools to collect, store and query this information. ATLAS records several billion particle interactions every year of operation, processes them for analysis and generates even larger simulated data samples; a global catalogue is needed to keep track of the location of each event record and be able to search and retrieve specific events for in-depth investigations. Each EventIndex record includes summary information on the event itself and the pointers to the files containing the full event. Most components of the EventIndex system are implemented using BigData open-source tools. This paper describes the architectural choices and their evolution in time, as well as the past, current and foreseen future implementations of all EventIndex components.Comment: 21 page

arXiv.org e-Print Archive

A Roadmap for HEP Software and Computing R&D for the 2020s

Author: Albrecht Johannes
Alves Antonio Augusto, Jr
Amadio Guilherme
Andronico Giuseppe
Anh-Ky Nguyen
Aphecetche Laurent
Apostolakis John
Asai Makoto
Atzori Luca
Babik Marian
Bagliesi Giuseppe
Bandieramonte Marilena
Banerjee Sunanda
Barisits Martin
Bauerdick Lothar A. T.
Belforte Stefano
Benjamin Douglas
Bernius Catrin
Bhimji Wahid
Bianchi Riccardo Maria
Bird Ian
Biscarat Catherine
Blomer Jakob
Bloom Kenneth
Boccali Tommaso
Bockelman Brian
Bold Tomasz
Bonacorsi Daniele
Boveia Antonio
Bozzi Concezio
Bracko Marko
Britton David
Buckley Andy
Buncic Predrag
Calafiura Paolo
Campana Simone
Canal Philippe
Canali Luca
Carlino Gianpaolo
Castro Nuno
Cattaneo Marco
Cerminara Gianluca
Cervantes Villanueva Javier
Chang Philip
Chapman John
Chen Gang
Childers Taylor
Clarke Peter
Clemencic Marco
Cogneras Eric
Coles Jeremy
Collier Ian
Colling David
Corti Gloria
Cosmo Gabriele
Costanzo Davide
Couturier Ben
Cranmer Kyle
Cranshaw Jack
Cristella Leonardo
Crooks David
Crépé-Renaudin Sabine
Currie Robert
Dallmeier-Tiessen Sünje
De Cian Michel
De Roeck Albert
De Kaushik
Delgado Peris Antonio
Derue Frédéric
Di Girolamo Alessandro
Di Guida Salvatore
Dimitrov Gancho
Doglioni Caterina
Dotti Andrea
Duellmann Dirk
Duflot Laurent
Dykstra Dave
Dziedziniewicz-Wojcik Katarzyna
Dziurda Agnieszka
Egede Ulrik
Elmer Peter
Elmsheuser Johannes
Elvira V. Daniel
Eulisse Giulio
Farrell Steven
Ferber Torben
Filipcic Andrej
Fisk Ian
Fitzpatrick Conor
Flix José
Formica Andrea
Forti Alessandra
Foundation HEP Software
Franzoni Giovanni
Frost James
Fuess Stu
Gaede Frank
Ganis Gerardo
Gardner Robert
Garonne Vincent
Gellrich Andreas
Genser Krzysztof
George Simon
Geurts Frank
Gheata Andrei
Gheata Mihaela
Giacomini Francesco
Giagu Stefano
Giffels Manuel
Gingrich Douglas
Girone Maria
Gligorov Vladimir V.
Glushkov Ivan
Gohn Wesley
Gonzalez Lopez Jose Benito
González Caballero Isidro
González Fernández Juan R.
Govi Giacomo
Grandi Claudio
Grasland Hadrien
Gray Heather
Grillo Lucia
Guan Wen
Gutsche Oliver
Gyurjyan Vardan
Hanushevsky Andrew
Hariri Farah
Hartmann Thomas
Harvey John
Hauth Thomas
Hegner Benedikt
Heinemann Beate
Heinrich Lukas
Heiss Andreas
Hernández José M.
Hildreth Michael
Hodgkinson Mark
Hoeche Stefan
Holzman Burt
Hristov Peter
Huang Xingtao
Ivanchenko Vladimir N.
Ivanov Todor
Iven Jan
Jashal Brij
Jayatilaka Bodhitha
Jones Roger
Jouvin Michel
Jun Soon Yung
Kagan Michael
Kalderon Charles William
Kane Meghan
Karavakis Edward
Katz Daniel S.
Kcira Dorian
Keeble Oliver
Kersevan Borut Paul
Kirby Michael
Klimentov Alexei
Klute Markus
Komarov Ilya
Konstantinov Dmitri
Koppenburg Patrick
Kowalkowski Jim
Kreczko Luke
Kuhr Thomas
Kutschke Robert
Kuznetsov Valentin
Lampl Walter
Lancon Eric
Lange David
Lassnig Mario
Laycock Paul
Leggett Charles
Letts James
Lewendel Birgit
Li Teng
Lima Guilherme
Linacre Jacob
Linden Tomas
Livny Miron
Lo Presti Giuseppe
Lopienski Sebastian
Love Peter
Lyon Adam
Magini Nicolò
Marshall Zachary L
Martelli Edoardo
Martin-Haugh Stewart
Mato Pere
Mazumdar Kajari
McCauley Thomas
McFayden Josh
McKee Shawn
McNab Andrew
Mehdiyev Rashid
Meinhard Helge
Menasce Dario
Mendez Lorenzo Patricia
Mete Alaettin Serhan
Michelotto Michele
Mitrevski Jovan
Moneta Lorenzo
Morgan Ben
Mount Richard
Moyse Edward
Murray Sean
Nairz Armin
Neubauer Mark S
Norman Andrew
Novaes Sérgio
Novak Mihaly
Oyanguren Arantza
Ozturk Nurcan
Pacheco Pages Andres
Paganini Michela
Pansanel Jerome
Pascuzzi Vincent R.
Patrick Glenn
Pearce Alex
Pearson Ben
Pedro Kevin
Perdue Gabriel
Perez-Calero Yzquierdo Antonio
Perrozzi Luca
Petersen Troels
Petric Marko
Petzold Andreas
Piedra Jónatan
Piilonen Leo
Piparo Danilo
Pivarski Jim
Pokorski Witold
Polci Francesco
Potamianos Karolos
Psihas Fernanda
Puig Navarro Albert
Quast Günter
Raven Gerhard
Reuter Jürgen
Ribon Alberto
Rinaldi Lorenzo
Ritter Martin
Robinson James
Rodrigues Eduardo
Roiser Stefan
Rousseau David
Roy Gareth
Rybkine Grigori
Sailer Andre
Sakuma Tai
Santana Renato
Sartirana Andrea
Schellman Heidi
Schovancová Jaroslava
Schramm Steven
Schulz Markus
Sciabà Andrea
Seidel Sally
Sekmen Sezen
Serfon Cedric
Severini Horst
Sexton-Kennedy Elizabeth
Seymour Michael
Sgalaberna Davide
Shapoval Illya
Shiers Jamie
Shiu Jing-Ge
Short Hannah
Siroli Gian Piero
Skipsey Sam
Smith Tim
Snyder Scott
Sokoloff Michael D
Spentzouris Panagiotis
Stadie Hartmut
Stark Giordon
Stewart Gordon
Stewart Graeme
Sánchez Arturo
Sánchez-Hernández Alberto
Taffard Anyes
Tamponi Umberto
Templon Jeff
Tenaglia Giacomo
Tsulaia Vakhtang
Tunnell Christopher
Vaandering Eric
Valassi Andrea
Vallecorsa Sofia
Valsan Liviu
Van Gemmeren Peter
Vernet Renaud
Viren Brett
Vlimant Jean-Roch
Voss Christian
Votava Margaret
Vuosalo Carl
Vázquez Sierra Carlos
Wartel Romain
Watts Gordon T.
Wenaus Torre
Wenzel Sandro
Williams Mike
Winklmeier Frank
Wissing Christoph
Wuerthwein Frank
Wynne Benjamin
Xiaomei Zhang
Yang Wei
Yazgan Efe
Publication venue
Publication date: 18/12/2017
Field of study

Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe

Universidade do Minho: RepositoriUM

Hal - Université Grenoble Alpes

HAL Clermont Université

Helsingin yliopiston digitaalinen arkisto

HAL-CEA

Hal-Diderot

arXiv.org e-Print Archive

HAL-IN2P3

DSpace@MIT

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio della ricerca- Università di Roma La Sapienza

CERN Document Server

HAL-Polytechnique

Explore Bristol Research

IT Lightning Talks: session #18

Author: Dimitrov Gancho
Publication venue
Publication date: 01/01/2019
Field of study

How can a new database system which manages billions of rows with relatively short lifetime (weeks to months) be designed ? The system should resemble a Whiteboard: - data, grouped by given properties form data collections, are added when requested - data collections are consumed by the requestor - when not needed any more, data collections are removed by a "Whiteboard sponge" process. I will show you what options I explored and which one is potentially the best in terms of efficiency

CERN Document Server

The one-million table partitions challenge in an ATLAS experiment DB application @ CERN

Author: Dimitrov Gancho
Publication venue
Publication date: 18/11/2019
Field of study

The presentation describes the challenges in designing a new DB system in the ATLAS experiment at CERN which needs to manage billions of rows with relatively short lifetime (weeks to months). For that system an approach of data grouping organised in table partitions might be appropriate. In the past years in ATLAS we made use of several partitioning techniques. With the broadly used range partitioning and its extension of automatic interval partitioning we add our own logic in PLSQL procedures and scheduler jobs to sustain data sliding windows in order to enforce various data retention policies. We also make use of list, reference and virtual column based partitioning. One of our tables in the production DB has 74000+ list partitions (180+ billion rows), other has 20000+ list sub-partitions. However, for a first time we challenge ourselves with the one-million partitions limit per table in the database. What options we have and which one is most suitable for our new use case will be presented to the audience

CERN Document Server

The importance of having an appropriate relational data segmentation in ATLAS

Author: Dimitrov Gancho
Publication venue: 'IOP Publishing'
Publication date: 15/03/2015
Field of study

In this paper we describe specific technical solutions put in place in various database applications of the ATLAS experiment at LHC where we make use of several partitioning techniques available in Oracle 11g. With the broadly used range partitioning and its option of automatic interval partitioning we add our own logic in PLSQL procedures and scheduler jobs to sustain data sliding windows in order to enforce various data retention policies. We also make use of the new Oracle 11g reference partitioning in the Nightly Build System to achieve uniform data segmentation. However the most challenging issue was to segment the data of the new ATLAS Distributed Data Management system (Rucio), which resulted in tens of thousands list type partitions and sub-partitions. Partition and sub-partition management, index strategy, statistics gathering and queries execution plan stability are important factors when choosing an appropriate physical model for the application data management. The so-far accumulated knowledge and analysis on the new Oracle 12c version features that could be beneficial will be shared with the audience

Crossref

CERN Document Server

The importance of applying an appropriate data partitioning

Author: Dimitrov Gancho
Publication venue
Publication date: 19/05/2015
Field of study

In this presentation are described specific technical solutions put in place in various database applications of the ATLAS experiment at LHC where we make use of several partitioning techniques available in Oracle 11g. With the broadly used range partitioning and its option of automatic interval partitioning we add our own logic in PLSQL procedures and scheduler jobs to sustain data sliding windows in order to enforce various data retention policies. We also make use of the new Oracle 11g reference partitioning in the ATLAS Nightly Build System to achieve uniform data segmentation. However the most challenging was to segment the data of the new ATLAS Distributed Data Management system, which resulted in tens of thousands list type partitions and sub-partitions. Partition and sub-partition management, index strategy, statistics gathering and queries execution plan stability are important factors when choosing an appropriate physical model for the application data management. The so-far accumulated knowledge will be shared with the audience

CERN Document Server

The one-million table partitions challenge in an ATLAS experiment DB application @ CERN

Author: Dimitrov Gancho
Publication venue
Publication date: 27/05/2019
Field of study

The presentation describes the challenges in designing a new DB system in the ATLAS experiment at CERN which needs to manage billions of rows with relatively short lifetime (weeks to months). For that system an approach of data grouping organised in table partitions might be appropriate. In the past years in ATLAS we made use of several partitioning techniques. With the broadly used range partitioning and its extension of automatic interval partitioning we add our own logic in PLSQL procedures and scheduler jobs to sustain data sliding windows in order to enforce various data retention policies. We also make use of list, reference and virtual column based partitioning. Some of our tables have 70000+ list partitions, others have 20000+ list sub-partitions. However for a first time we challenge ourselves with the one-million partitions limit per table in the database. What choices(options) we have and which one is potentially most suitable will be presented to the audience

CERN Document Server

The importance of having an appropriate data segmentation (partitioning)

Author: Dimitrov Gancho
Publication venue
Publication date: 08/11/2014
Field of study

In this presentation will be shown real life examples from database applications in the ATLAS experiment @ LHC where we make use of many Oracle partitioning techniques available in Oracle 11g. With the broadly used range partitioning and its option of automatic interval partitioning we add our own logic in PLSQL for sustaining data sliding windows in order to enforce various data retention policies. We also make use of the reference partitioning in some use cases, however the most challenging was to segment the data of a large bookkeeping system which resulted in tens of thousands list partitions and list sub-partitions. Partition and sub-partition management, index strategy, statistics gathering and queries execution plan stability are important factors when choosing an appropriate for the use case data management model. The gained experience with all of those will be shared with the audience

CERN Document Server

Evaluation of the ATLAS model for remote access to database resident information for LHC Run 3

Author: Dimitrov Gancho
Gallas Elizabeth
Publication venue
Publication date: 25/03/2020
Field of study

The ATLAS model for remote access to database resident information relies upon a limited set of dedicated and distributed Oracle database repositories complemented with the deployment of Frontier system infrastructure on the WLCG. ATLAS clients with network access can get the database information they need dynamically by submitting requests to a squid server in the Frontier network which provides results from its cache or passes new requests along the network to launchpads co-located at one of the Oracle sites (the master Oracle database at CERN or one of the Tier 1 Oracle database replicas). Since the beginning of LHC Run 1, the system has evolved in terms of client, squid, and launchpad optimizations but the distribution model has remained fundamentally unchanged. On the whole, the system has been broadly successful in providing data to clients with relatively few disruptions even while site databases were down due to redundancy overall. At the same time, its quantitative performance characteristics, such as the global throughput of the system, the load distribution between sites, and the constituent interactions that make up the whole, were largely unknown. But more recently, information has been collected from launchpad and squid logs into an Elastic Search repository which has enabled a wide variety of studies of various aspects of the system. This presentation will describe dedicated studies of the data collected in Elastic Search over the previous year to evaluate the efficacy of the distribution model. Specifically, we will quantify any advantages that the redundancy of the system offers as well as related aspects such as the geographical dependence of wait times seen by clients in getting a response to its requests. These studies are essential so that during LS2 (the long shutdown between LHC Run 2 and Run 3), we can adapt the system in preparation for the expected increase in the system load in the ramp up to Run 3 operations

CERN Document Server