5 research outputs found

    Disk storage management for LHCb based on Data Popularity estimator

    Full text link
    This paper presents an algorithm providing recommendations for optimizing the LHCb data storage. The LHCb data storage system is a hybrid system. All datasets are kept as archives on magnetic tapes. The most popular datasets are kept on disks. The algorithm takes the dataset usage history and metadata (size, type, configuration etc.) to generate a recommendation report. This article presents how we use machine learning algorithms to predict future data popularity. Using these predictions it is possible to estimate which datasets should be removed from disk. We use regression algorithms and time series analysis to find the optimal number of replicas for datasets that are kept on disk. Based on the data popularity and the number of replicas optimization, the algorithm minimizes a loss function to find the optimal data distribution. The loss function represents all requirements for data distribution in the data storage system. We demonstrate how our algorithm helps to save disk space and to reduce waiting times for jobs using this data

    GRID Storage Optimization in Transparent and User-Friendly Way for LHCb Datasets

    Full text link
    The LHCb collaboration is one of the four major experiments at the Large Hadron Collider at CERN. Many petabytes of data are produced by the detectors and Monte-Carlo simulations. The LHCb Grid interware LHCbDIRAC is used to make data available to all collaboration members around the world. The data is replicated to the Grid sites in different locations. However the Grid disk storage is limited and does not allow keeping replicas of each file at all sites. Thus it is essential to optimize number of replicas to achieve a better Grid performance. In this study, we present a new approach of data replication and distribution strategy based on data popularity prediction. The popularity is performed based on the data access history and metadata, and uses machine learning techniques and time series analysis methods

    Machine learning at the energy and intensity frontiers of particle physics

    Get PDF
    Our knowledge of the fundamental particles of nature and their interactions is summarized by the standard model of particle physics. Advancing our understanding in this field has required experiments that operate at ever higher energies and intensities, which produce extremely large and information-rich data samples. The use of machine-learning techniques is revolutionizing how we interpret these data samples, greatly increasing the discovery potential of present and future experiments. Here we summarize the challenges and opportunities that come with the use of machine learning at the frontiers of particle physics
    corecore