862 research outputs found
Disk storage management for LHCb based on Data Popularity estimator
This paper presents an algorithm providing recommendations for optimizing the
LHCb data storage. The LHCb data storage system is a hybrid system. All
datasets are kept as archives on magnetic tapes. The most popular datasets are
kept on disks. The algorithm takes the dataset usage history and metadata
(size, type, configuration etc.) to generate a recommendation report. This
article presents how we use machine learning algorithms to predict future data
popularity. Using these predictions it is possible to estimate which datasets
should be removed from disk. We use regression algorithms and time series
analysis to find the optimal number of replicas for datasets that are kept on
disk. Based on the data popularity and the number of replicas optimization, the
algorithm minimizes a loss function to find the optimal data distribution. The
loss function represents all requirements for data distribution in the data
storage system. We demonstrate how our algorithm helps to save disk space and
to reduce waiting times for jobs using this data
GRID Storage Optimization in Transparent and User-Friendly Way for LHCb Datasets
The LHCb collaboration is one of the four major experiments at the Large
Hadron Collider at CERN. Many petabytes of data are produced by the detectors
and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make
data available to all collaboration members around the world. The data is
replicated to the Grid sites in different locations. However the Grid disk
storage is limited and does not allow keeping replicas of each file at all
sites. Thus it is essential to optimize number of replicas to achieve a better
Grid performance.
In this study, we present a new approach of data replication and distribution
strategy based on data popularity prediction. The popularity is performed based
on the data access history and metadata, and uses machine learning techniques
and time series analysis methods
Numerical optimization for Artificial Retina Algorithm
High-energy physics experiments rely on reconstruction of the trajectories of
particles produced at the interaction point. This is a challenging task,
especially in the high track multiplicity environment generated by p-p
collisions at the LHC energies. A typical event includes hundreds of signal
examples (interesting decays) and a significant amount of noise (uninteresting
examples).
This work describes a modification of the Artificial Retina algorithm for
fast track finding: numerical optimization methods were adopted for fast local
track search. This approach allows for considerable reduction of the total
computational time per event. Test results on simplified simulated model of
LHCb VELO (VErtex LOcator) detector are presented. Also this approach is
well-suited for implementation of paralleled computations as GPGPU which look
very attractive in the context of upcoming detector upgrades
Event Index - an LHCb Event Search System
During LHC Run 1, the LHCb experiment recorded around collision
events. This paper describes Event Index - an event search system. Its primary
function is to quickly select subsets of events from a combination of
conditions, such as the estimated decay channel or number of hits in a
subdetector. Event Index is essentially Apache Lucene optimized for read-only
indexes distributed over independent shards on independent nodes.Comment: Report for the proceedings of the CHEP-2015 conferenc
Generalization of Change-Point Detection in Time Series Data Based on Direct Density Ratio Estimation
The goal of the change-point detection is to discover changes of time series
distribution. One of the state of the art approaches of the change-point
detection are based on direct density ratio estimation. In this work we show
how existing algorithms can be generalized using various binary classification
and regression models. In particular, we show that the Gradient Boosting over
Decision Trees and Neural Networks can be used for this purpose. The algorithms
are tested on several synthetic and real-world datasets. The results show that
the proposed methods outperform classical RuLSIF algorithm. Discussion of cases
where the proposed algorithms have advantages over existing methods are also
provided
Cherenkov Detectors Fast Simulation Using Neural Networks
We propose a way to simulate Cherenkov detector response using a generative
adversarial neural network to bypass low-level details. This network is trained
to reproduce high level features of the simulated detector events based on
input observables of incident particles. This allows the dramatic increase of
simulation speed. We demonstrate that this approach provides simulation
precision which is consistent with the baseline and discuss possible
implications of these results.Comment: In proceedings of 10th International Workshop on Ring Imaging
Cherenkov Detector
- …