22 research outputs found

    Grid collector: an event catalog with automated file management

    Full text link
    High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides ''direct'' access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, ''production date between March 10 and 20, and the number of charged tracks > 100.'' The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large variety of users

    Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting

    Full text link
    This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of the fast changing nature of the particle diffusion process. However, the in situ processing approach is challenging because the amount of data to be retained for the diffusion calculations increases over time, unlike in other in situ processing cases where the amount of data to be processed is constant over time. Here we report our preliminary efforts to control the memory usage while ensuring the necessary analysis tasks are completed in a timely manner. Compared with an earlier naive attempt to directly computing the same diffusion displacements in the simulation code, this in situ version reduces the memory usage from particle information by nearly 60% and computation time by about 20%

    Fast Change Point Detection for Electricity Market Analysis

    No full text
    Electricity is a vital part of our daily life; therefore it is important to avoid irregularities such as the California Electricity Crisis of 2000 and 2001. In this work, we seek to predict anomalies using advanced machine learning algorithms, more specifically a Change Point Detection (CPD) algorithm on the electricity prices during the California Electricity Crisis. Such algorithms are effective, but computationally expensive when applied on a large amount of data. To address this challenge, we accelerate the Gaussian Process (GP) for 1-dimensional time series data. Since GP is at the core of many statistical learning techniques, this improvement could benefit many algorithms. In the specific Change Point Detection algorithm used in this study, we reduce the overall computational complexity from O(n5) to O(n2), where the amountized cost of solving a GP projet is O(1). Our efficient algorithm makes it possible to compute the Change Points using the hourly price data during the California Electricity Crisis. By comparing the detected Change Points with known events, we show that the Change Point Detection algorithm is indeed effective in detecting signals preceding major events

    HDF5 As a Vehicle for in Transit Data Movement

    Get PDF
    For in transit processing, one of the fundamental challenges is the efficient movement of data from producers to consumers. Exploiting the flexibility offered by the SENSEI generic in situ framework, we have developed a number of different in transit data transport mechanisms. In this work, we focus on the transport mechanism that leverages the HDF5 parallel I/O library, and investigate the performance characteristics of this transport mechanism. For in transit use cases at scale on HPC platforms, one might expect that an in transit data transport mechanism that uses faster layers of the storage hierarchy, such as DRAM memory, would always outperform a transport that uses slower layers of the storage hierarchy, such as an NVRAM-based persistent storage presented as a distributed file system. However, our test results show that the performance of the transport using NVRAM is competitive with the transport that uses socket-based data movement across varying levels of producer and consumer concurrency.}, booktitle = {Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualizatio

    Testing Vpin on Big Data Response to Reflecting on the Vpin Dispute

    No full text

    Grid Collector: An Event Catalog

    No full text
    High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides "direct" access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, "production date between March 10 and 20, and the number of charged tracks > 100." The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large varieties of users
    corecore