Search CORE

698 research outputs found

Disk storage management for LHCb based on Data Popularity estimator

Author: Charpentier Philippe
Hushchyn Mikhail
Ustyuzhanin Andrey
Publication venue: 'IOP Publishing'
Publication date: 01/10/2015
Field of study

This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data

arXiv.org e-Print Archive

Crossref

CERN Document Server

GRID Storage Optimization in Transparent and User-Friendly Way for LHCb Datasets

Author: Charpentier Philippe
Haen Christophe
Hushchyn Mikhail
Ustyuzhanin Andrey
Publication venue: 'IOP Publishing'
Publication date: 12/05/2017
Field of study

The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Many petabytes of data are produced by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However the Grid disk storage is limited and does not allow keeping replicas of each file at all sites. Thus it is essential to optimize number of replicas to achieve a better Grid performance. In this study, we present a new approach of data replication and distribution strategy based on data popularity prediction. The popularity is performed based on the data access history and metadata, and uses machine learning techniques and time series analysis methods

arXiv.org e-Print Archive

Crossref

CERN Document Server

Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation

Author: Hushchyn Mikhail
Ustyuzhanin Andrey
Publication venue
Publication date: 17/01/2020
Field of study

The goal of the change-point detection is to discover changes of time series distribution. One of the state of the art approaches of the change-point detection are based on direct density ratio estimation. In this work we show how existing algorithms can be generalized using various binary classification and regression models. In particular, we show that the Gradient Boosting over Decision Trees and Neural Networks can be used for this purpose. The algorithms are tested on several synthetic and real-world datasets. The results show that the proposed methods outperform classical RuLSIF algorithm. Discussion of cases where the proposed algorithms have advantages over existing methods are also provided

arXiv.org e-Print Archive

Toward an understanding of the properties of neural network approaches for supernovae light curve approximation

Author: Demianenko Mariia
Derkach Denis
Hushchyn Mikhail
Malanchev Konstantin
Samorodova Ekaterina
Shiriaev Aleksandr
Sysak Mikhail
Publication venue
Publication date: 15/09/2022
Field of study

The modern time-domain photometric surveys collect a lot of observations of various astronomical objects, and the coming era of large-scale surveys will provide even more information. Most of the objects have never received a spectroscopic follow-up, which is especially crucial for transients e.g. supernovae. In such cases, observed light curves could present an affordable alternative. Time series are actively used for photometric classification and characterization, such as peak and luminosity decline estimation. However, the collected time series are multidimensional, irregularly sampled, contain outliers, and do not have well-defined systematic uncertainties. Machine learning methods help extract useful information from available data in the most efficient way. We consider several light curve approximation methods based on neural networks: Multilayer Perceptrons, Bayesian Neural Networks, and Normalizing Flows, to approximate observations of a single light curve. Tests using both the simulated PLAsTiCC and real Zwicky Transient Facility data samples demonstrate that even few observations are enough to fit networks and achieve better approximation quality than other state-of-the-art methods. We show that the methods described in this work have better computational complexity and work faster than Gaussian Processes. We analyze the performance of the approximation techniques aiming to fill the gaps in the observations of the light curves, and show that the use of appropriate technique increases the accuracy of peak finding and supernova classification. In addition, the study results are organized in a Fulu Python library available on GitHub, which can be easily used by the community.Comment: Submitted to MNRAS. 14 pages, 6 figures, 9 table

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Stokes inversion techniques with neural networks: analysis of uncertainty in parameter estimation

Author: Derkach Denis
Hushchyn Mikhail
Khizhik Aleksandr
Knyazeva Irina
Mistryukova Lukia
Plotnikov Andrey
Publication venue
Publication date: 26/10/2022
Field of study

Magnetic fields are responsible for a multitude of Solar phenomena, including such destructive events as solar flares and coronal mass ejections, with the number of such events rising as we approach the peak of the 11-year solar cycle, in approximately 2025. High-precision spectropolarimetric observations are necessary to understand the variability of the Sun. The field of quantitative inference of magnetic field vectors and related solar atmospheric parameters from such observations has long been investigated. In recent years, very sophisticated codes for spectropolarimetric observations have been developed. Over the past two decades, neural networks have been shown to be a fast and accurate alternative to classic inversion technique methods. However, most of these codes can be used to obtain point estimates of the parameters, so ambiguities, the degeneracies, and the uncertainties of each parameter remain uncovered. In this paper, we provide end-to-end inversion codes based on the simple Milne-Eddington model of the stellar atmosphere and deep neural networks to both parameter estimation and their uncertainty intervals. The proposed framework is designed in such a way that it can be expanded and adapted to other atmospheric models or combinations of them. Additional information can also be incorporated directly into the model. It is demonstrated that the proposed architecture provides high accuracy of results, including a reliable uncertainty estimation, even in the multidimensional case. The models are tested using simulation and real data samples.Comment: 17 pages with 7 figures and 3 tables, submitted to Solar Physic

arXiv.org e-Print Archive

TrackML high-energy physics tracking challenge on Kaggle

Author: Amrouche Sabrina
Calafiura Paolo
Estrade Victor
Farrell Steven
Germain Cécile
Gligorov Vava
Golling Tobias
Gray Heather
Guyon Isabelle
Hushchyn Mikhail
Innocente Vincenzo
Kiehn Moritz
Moyse Edward
Rousseau David
Salzburger Andreas
Ustyuzhanin Andrey
Vlimant Jean-Roch
Yilnaz Yetkin
Publication venue: 'EDP Sciences'
Publication date: 17/09/2019
Field of study

The High-Luminosity LHC (HL-LHC) is expected to reach unprecedented collision intensities, which in turn will greatly increase the complexity of tracking within the event reconstruction. To reach out to computer science specialists, a tracking machine learning challenge (TrackML) was set up on Kaggle by a team of ATLAS, CMS, and LHCb physicists tracking experts and computer scientists building on the experience of the successful Higgs Machine Learning challenge in 2014. A training dataset based on a simulation of a generic HL-LHC experiment tracker has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track.The participants to the challenge should find the tracks in the test dataset, which means building the list of 3D points belonging to each track.The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. A metric reflecting the accuracy of a model at finding the proper associations that matter most to physics analysis will allow to select good candidates to augment or replace existing algorithms

Track reconstruction at LHC as a collaborative data challenge use case with RAMP

Author: Amrouche Sabrina
Braun Nils
Calafiura Paolo
Farrell Steven
Gemmler Jochen
Germain Cécile
Gligorov Vladimir Vava
Golling Tobias
Gray Heather
Guyon Isabelle
Hushchyn Mikhail
Innocente Vincenzo
Kégl Balázs
Neuhaus Sara
Rousseau David
Salzburger Andreas
Ustyuzhanin Andrei
Vlimant Jean-Roch
Wessel Christian
Yilmaz Yetkin
Publication venue: EDP Sciences
Publication date: 01/01/2017
Field of study

Charged particle track reconstruction is a major component of data-processing in high-energy physics experiments such as those at the Large Hadron Collider (LHC), and is foreseen to become more and more challenging with higher collision rates. A simplified two-dimensional version of the track reconstruction problem is set up on a collaborative platform, RAMP, in order for the developers to prototype and test new ideas. A small-scale competition was held during the Connecting The Dots / Intelligent Trackers 2017 (CTDWIT 2017) workshop. Despite the short time scale, a number of different approaches have been developed and compared along a single score metric, which was kept generic enough to accommodate a summarized performance in terms of both efficiency and fake rates

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Hal-Diderot

HAL-Rennes 1