698 research outputs found

    Disk storage management for LHCb based on Data Popularity estimator

    Full text link
    This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data

    GRID Storage Optimization in Transparent and User-Friendly Way for LHCb Datasets

    Full text link
    The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Many petabytes of data are produced by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However the Grid disk storage is limited and does not allow keeping replicas of each file at all sites. Thus it is essential to optimize number of replicas to achieve a better Grid performance. In this study, we present a new approach of data replication and distribution strategy based on data popularity prediction. The popularity is performed based on the data access history and metadata, and uses machine learning techniques and time series analysis methods

    Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation

    Full text link
    The goal of the change-point detection is to discover changes of time series distribution. One of the state of the art approaches of the change-point detection are based on direct density ratio estimation. In this work we show how existing algorithms can be generalized using various binary classification and regression models. In particular, we show that the Gradient Boosting over Decision Trees and Neural Networks can be used for this purpose. The algorithms are tested on several synthetic and real-world datasets. The results show that the proposed methods outperform classical RuLSIF algorithm. Discussion of cases where the proposed algorithms have advantages over existing methods are also provided

    Toward an understanding of the properties of neural network approaches for supernovae light curve approximation

    Full text link
    The modern time-domain photometric surveys collect a lot of observations of various astronomical objects, and the coming era of large-scale surveys will provide even more information. Most of the objects have never received a spectroscopic follow-up, which is especially crucial for transients e.g. supernovae. In such cases, observed light curves could present an affordable alternative. Time series are actively used for photometric classification and characterization, such as peak and luminosity decline estimation. However, the collected time series are multidimensional, irregularly sampled, contain outliers, and do not have well-defined systematic uncertainties. Machine learning methods help extract useful information from available data in the most efficient way. We consider several light curve approximation methods based on neural networks: Multilayer Perceptrons, Bayesian Neural Networks, and Normalizing Flows, to approximate observations of a single light curve. Tests using both the simulated PLAsTiCC and real Zwicky Transient Facility data samples demonstrate that even few observations are enough to fit networks and achieve better approximation quality than other state-of-the-art methods. We show that the methods described in this work have better computational complexity and work faster than Gaussian Processes. We analyze the performance of the approximation techniques aiming to fill the gaps in the observations of the light curves, and show that the use of appropriate technique increases the accuracy of peak finding and supernova classification. In addition, the study results are organized in a Fulu Python library available on GitHub, which can be easily used by the community.Comment: Submitted to MNRAS. 14 pages, 6 figures, 9 table

    Stokes inversion techniques with neural networks: analysis of uncertainty in parameter estimation

    Full text link
    Magnetic fields are responsible for a multitude of Solar phenomena, including such destructive events as solar flares and coronal mass ejections, with the number of such events rising as we approach the peak of the 11-year solar cycle, in approximately 2025. High-precision spectropolarimetric observations are necessary to understand the variability of the Sun. The field of quantitative inference of magnetic field vectors and related solar atmospheric parameters from such observations has long been investigated. In recent years, very sophisticated codes for spectropolarimetric observations have been developed. Over the past two decades, neural networks have been shown to be a fast and accurate alternative to classic inversion technique methods. However, most of these codes can be used to obtain point estimates of the parameters, so ambiguities, the degeneracies, and the uncertainties of each parameter remain uncovered. In this paper, we provide end-to-end inversion codes based on the simple Milne-Eddington model of the stellar atmosphere and deep neural networks to both parameter estimation and their uncertainty intervals. The proposed framework is designed in such a way that it can be expanded and adapted to other atmospheric models or combinations of them. Additional information can also be incorporated directly into the model. It is demonstrated that the proposed architecture provides high accuracy of results, including a reliable uncertainty estimation, even in the multidimensional case. The models are tested using simulation and real data samples.Comment: 17 pages with 7 figures and 3 tables, submitted to Solar Physic

    TrackML high-energy physics tracking challenge on Kaggle

    Get PDF
    The High-Luminosity LHC (HL-LHC) is expected to reach unprecedented collision intensities, which in turn will greatly increase the complexity of tracking within the event reconstruction. To reach out to computer science specialists, a tracking machine learning challenge (TrackML) was set up on Kaggle by a team of ATLAS, CMS, and LHCb physicists tracking experts and computer scientists building on the experience of the successful Higgs Machine Learning challenge in 2014. A training dataset based on a simulation of a generic HL-LHC experiment tracker has been created, listing for each event the measured 3D points, and the list of 3D points associated to a true track.The participants to the challenge should find the tracks in the test dataset, which means building the list of 3D points belonging to each track.The emphasis is to expose innovative approaches, rather than hyper-optimising known approaches. A metric reflecting the accuracy of a model at finding the proper associations that matter most to physics analysis will allow to select good candidates to augment or replace existing algorithms

    Track reconstruction at LHC as a collaborative data challenge use case with RAMP

    Get PDF
    Charged particle track reconstruction is a major component of data-processing in high-energy physics experiments such as those at the Large Hadron Collider (LHC), and is foreseen to become more and more challenging with higher collision rates. A simplified two-dimensional version of the track reconstruction problem is set up on a collaborative platform, RAMP, in order for the developers to prototype and test new ideas. A small-scale competition was held during the Connecting The Dots / Intelligent Trackers 2017 (CTDWIT 2017) workshop. Despite the short time scale, a number of different approaches have been developed and compared along a single score metric, which was kept generic enough to accommodate a summarized performance in terms of both efficiency and fake rates
    corecore