15 research outputs found

    A New Visual Analytics toolkit for ATLAS metadata

    No full text
    The ATLAS experiment at the LHC has a complex heterogeneous distributed computing infrastructure, which is used to process and analyse exabytes of data. Metadata are collected and stored at all stages of physics analysis and data processing. All metadata could be divided into operational metadata to be used for the quasi on-line monitoring, and archival to study the systems’ behaviour over a given period of time (i.e., long-term data analysis). Ensuring the stability and efficiency of functioning of complex and large-scale systems, such as those in ATLAS computing, requires sophisticated monitoring tools, and the long-term monitoring data analysis becomes as important as the monitoring itself. Archival metadata, containing a lot of metrics (hardware and software environment descriptions, network state, application parameters, user account data, errors) accumulated for more than decade, can be successfully processed by various machine learning (ML) algorithms for classification, clustering and dimensionality reduction. However, the ML data analysis, despite the massive use, is not without shortcomings: the underlying algorithms are usually treated as “black boxes”, as there are no effective techniques for understanding their internal mechanisms, and the domain-experts involvement in the process of ML data analysis is very limited. As a result the data analysis suffers from the lack of human supervision. Moreover, sometimes the conclusions made by the algorithms with a high accuracy may have no sense regarding the real data model. In this work we will demonstrate how the interactive data visualization can be applied to extend the routine ML data analysis methods. Visualization allows to actively use human spatial thinking to identify new tendencies and patterns found in the collected data, avoiding the necessity of struggling with the instrumental analytics tools. The architecture and the interface prototype of visual analytics platform (VAP) for the multidimensional data analysis of ATLAS computing metadata will be presented. The general data processing and visualization methods of the VAP prototype will be implemented and tested on the slice of ATLAS jobs metadata. As a result, a web-interface will provide ATLAS jobs interactive visual clusterization and search for non-trivial behaviour and its possible reasons. Furthermore, we will demonstrate the prototype of dynamic interactive visualization, providing the possibility of the observation of changing clustering structure in different points in time

    High Energy Physics Data Popularity : ATLAS Datasets Popularity Case Study

    No full text
    The amount of scientific data generated by the LHC experiments has hit the exabyte scale. These data are transferred, processed and analyzed in hundreds of computing centers. The popularity of data among individual physicists and University groups has become one of the key factors of efficient data management and processing. It was actively used duringLHC Run-1 and Run-2 by the experiments for the central data processing, and allowed the optimization of data placement policies and to spread the workload more evenly over the existing computing resources. Besides the central data processing, the LHC experiments provide storage and computing resources for physics analysis to thousands of users. Taking into account the significant increase of data volume and processing time after the collider upgrade for the High Luminosity Runs (2027-2036)an intelligent data placement based on data access pattern becomes even more crucial than at the beginning of LHC. In this study we provide a detailed exploration of data popularity using ATLAS data samples. In addition, we analyze the geolocations of computing sites where the data were processed, and the locality of the home institutes of users carrying out physics analysis. Cartography visualization, based on this data, allow sthe correlation of existing data placement with physics needs, providing a better understanding of data utilization by different categories of user’s tasks

    A New Visual Analytics toolkit for ATLAS metadata

    Get PDF
    The ATLAS experiment at the LHC has a complex heterogeneous distributed computing infrastructure, which is used to process and analyse exabytes of data. Metadata are collected and stored at all stages of physics analysis and data processing. All metadata could be divided into operational metadata to be used for the quasi on-line monitoring, and archival to study the systems’ behaviour over a given period of time (i.e., long-term data analysis). Ensuring the stability and efficiency of functioning of complex and large-scale systems, such as those in ATLAS computing, requires sophisticated monitoring tools, and the long-term monitoring data analysis becomes as important as the monitoring itself. Archival metadata, containing a lot of metrics (hardware and software environment descriptions, network state, application parameters, user account data, errors) accumulated for more than decade, can be successfully processed by various machine learning (ML) algorithms for classification, clustering and dimensionality reduction. However, the ML data analysis, despite the massive use, is not without shortcomings: the underlying algorithms are usually treated as “black boxes”, as there are no effective techniques for understanding their internal mechanisms, and the domain-experts involvement in the process of ML data analysis is very limited. As a result the data analysis suffers from the lack of human supervision. Moreover, sometimes the conclusions made by the algorithms with a high accuracy may have no sense regarding the real data model. In this work we will demonstrate how the interactive data visualization can be applied to extend the routine ML data analysis methods. Visualization allows to actively use human spatial thinking to identify new tendencies and patterns found in the collected data, avoiding the necessity of struggling with the instrumental analytics tools. The architecture and the interface prototype of visual analytics platform (VAP) for the multidimensional data analysis of ATLAS computing metadata will be presented. The general data processing and visualization methods of the VAP prototype will be implemented and tested on the slice of ATLAS jobs metadata. As a result, a web-interface will provide ATLAS jobs interactive visual clusterization and search for non-trivial behaviour and its possible reasons. Furthermore, we will demonstrate the prototype of dynamic interactive visualization, providing the possibility of the observation of changing clustering structure in different points in time

    A New Visual Analytics Toolkit for ATLAS Computing Metadata

    No full text
    The ATLAS experiment at the Large Hadron Collider has a complex heterogeneous distributed computing infrastructure, which is used to process and analyse exabytes of data. Metadata are collected and stored at all stages of data processing and physics analysis. All metadata could be divided into operational metadata to be used for the quasi on-line monitoring, and archival to study the behaviour of corresponding systems over a given period of time (i.e., long-term data analysis). Ensuring the stability and efficiency of functioning of complex and large-scale systems, such as those in the ATLAS Computing, requires sophisticated monitoring tools, and the long-term monitoring data analysis becomes as important as the monitoring itself. Archival metadata, which contains a lot of metrics (hardware and software environment descriptions, network states, application parameters, errors) accumulated for more than decade, can be successfully processed by various machine learning (ML) algorithms for classification, clustering and dimensionality reduction. However, the ML data analysis, despite the massive use, is not without shortcomings: the underlying algorithms are usually treated as “black boxes”, as there are no effective techniques for understanding their internal mechanisms, and the domain-experts involvement in the process of ML data analysis is very limited. As a result, the data analysis suffers from the lack of human supervision. Moreover, sometimes the conclusions made by algorithms with a high accuracy may have no sense regarding the real data model. In this work we will demonstrate how the interactive data visualization can be applied to extend the routine ML data analysis methods. Visualization allows an active use of human spatial thinking to identify new tendencies and patterns found in the collected data, avoiding the necessity of struggling with the instrumental analytics tools. The architecture and the corresponding prototype of Interactive Visual Explorer (InVEx) - visual analytics toolkit for the multidimensional data analysis of ATLAS computing metadata will be presented. The web-application part of the prototype provides an interactive visual clusterization of ATLAS computing jobs, search for computing jobs non-trivial behaviour and its possible reasons

    Evaluation of the Level-of-Detail Generator for Visual Analysis of the ATLAS Computing Metadata

    No full text
    The ATLAS experiment at the LHC processes, analyses and stores vast amounts of data, which is either recorded by the detector or simulated worldwide using Monte Carlo methods. ATLAS Computing metadata is generated at very high rates and volumes. The necessity to analyze this metadata is constantly increasing, since the heterogeneous, distributed and dynamically changing computing infrastructure requires sophisticated optimization decisions, made by human or/and by machines. Visual analytics is one of the methods facilitating the analysis of massive amounts of data (structured, semi-structured, and unstructured) which leverages human judgement by means of interactive visual representations. Given the huge number of ATLAS computing jobs that need to be visualized simultaneously for error investigations or other optimization processes, resources of the client application responsible for such visualization may reach its limits. Data objects that share similar feature values can be represented and visualized as a single group, thus initial large data sample would be represented at different levels of detail. This approach will also avoid client overload. In this paper we evaluate implementations of k-means-based Level-of-Detail generator method applied to the metadata of ATLAS jobs. This method is used in the visual analytics application InVEx (Interactive Visual Explorer) that is under development, and which is based on 3-dimensional interactive visualization of multidimensional data
    corecore