Search CORE

128 research outputs found

The archive solution for distributed workflow management agents of the CMS experiment at LHC

Author: Fischer Nils Leif
Guo Yuyi
Kuznetsov Valentin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/01/2018
Field of study

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate

\mathcal{O}

(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software for Big Science. The final authenticated version is available online at: https://doi.org/10.1007/s41781-018-0005-

arXiv.org e-Print Archive

CERN Document Server

Gaining insight from large data volumes with ease

Author: Kuznetsov Valentin
Publication venue: 'EDP Sciences'
Publication date: 18/09/2018
Field of study

Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use-cases, from analytics and monitoring to training Machine Learning models on a terabyte scale. We provide concrete examples within context of the CMS experiment where Big Data tools are already playing or would play a significant role in daily operations

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

CERN Document Server

CMS The Computing Project : Technical Design Report

Author: Bayatian G.L
CMS Collaboration
Czellar Sandor
Heikkinen Aatos
Härkönen Jaakko
Hæggström Edward
Jiganova Natalia
Karimäki Veikko
Kinnunen Ritva
Lampen Tapio
Lassila-Perini Katri Marjaana
Lehti Sami
Linden Tomas
Luukka Panja
Mäenpää Teppo
Nysten Jukka
Tuominen Eija
Tuominiemi Jorma
Ungaro Donatella
Wendland Lauri
Publication venue: Union of Concerned Scientists
Publication date: 01/06/2005
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Towards Provenance and Traceability in CRISTAL for HEP

Author: Branson Andrew
McClatchey Richard
Shamdasani Jetendr
Publication venue: 'IOP Publishing'
Publication date: 24/02/2014
Field of study

This paper discusses the CRISTAL object lifecycle management system and its use in provenance data management and the traceability of system events. This software was initially used to capture the construction and calibration of the CMS ECAL detector at CERN for later use by physicists in their data analysis. Some further uses of CRISTAL in different projects (CMS, neuGRID and N4U) are presented as examples of its flexible data model. From these examples, applications are drawn for the High Energy Physics domain and some initial ideas for its use in data preservation HEP are outlined in detail in this paper. Currently investigations are underway to gauge the feasibility of using the N4U Analysis Service or a derivative of it to address the requirements of data and analysis logging and provenance capture within the HEP long term data analysis environment.Comment: 5 pages and 1 figure. 20th International Conference on Computing in High Energy and Nuclear Physics (CHEP13). 14-18th October 2013. Amsterdam, Netherlands. To appear in Journal of Physics Conference Serie

arXiv.org e-Print Archive

CERN Document Server

Automatic log analysis with NLP for the CMS workflow handling

Author: Abercrombie Daniel Robert
Adelman-McCarthy Jennifer
Agarwal Sharad
Bakhshiansohi Hamed
Layer Lukas
Si Weinan
Vargas Hernandez Andres
Vlimant Jean-Roch
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

The central Monte-Carlo production of the CMS experiment utilizes the WLCG infrastructure and manages daily thousands of tasks, each up to thousands of jobs. The distributed computing system is bound to sustain a certain rate of failures of various types, which are currently handled by computing operators a posteriori. Within the context of computing operations, and operation intelligence, we propose a Machine Learning technique to learn from the operators with a view to reduce the operational workload and delays. This work is in continuation of CMS work on operation intelligence to try and reach accurate predictions with Machine Learning. We present an approach to consider the log files of the workflows as regular text to leverage modern techniques from Natural Language Processing (NLP). In general, log files contain a substantial amount of text that is not human language. Therefore, different log parsing approaches are studied in order to map the log files’ words to high dimensional vectors. These vectors are then exploited as feature space to train a model that predicts the action that the operator has to take. This approach has the advantage that the information of the log files is extracted automatically and the format of the logs can be arbitrary. In this work the performance of the log file analysis with NLP is presented and compared to previous approaches

EDP Sciences OAI-PMH repository (1.2.0)

ZENODO

Caltech Authors

CERN Document Server

Enhancing cache content management in a data lake architecture using Reinforcement Learning

Author: Daniele Spiga
Marco Baioletti
Poggioni Valentina
Tracolli Mirco
Publication venue
Publication date: 01/01/2021
Field of study

Florence Research

3rd EGEE User Forum

Author: Floros Vangelis
Goisset Anne Lise
Harris Frank
Kereksizova Merim
Publication venue: EGEE
Publication date: 01/01/2008
Field of study

We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

CERN Document Server

Grid Virtualization Engine: Design, Implementation, and Evaluation

Author: G. von Laszewski
Jie Tao
Lizhe Wang
M. Kunze
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref