Search CORE

22 research outputs found

Grid collector: an event catalog with automated file management

Author: Gu Junmin
Shoshani Arie
Sim Alexander
Wu Kesheng
Zhang Wei-Ming
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides ''direct'' access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, ''production date between March 10 and 20, and the number of charged tracks > 100.'' The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large variety of users

Crossref

eScholarship - University of California

UNT Digital Library

Unraveling Diffusion in Fusion Plasma: A Case Study of In Situ Processing and Particle Sorting

Author: Chang C. S.
Choi Jong
Churchill R. Michael
Gu Junmin
Klasky Scott
Ku Seung-Hoe
Lin Paul
Podhorszki Norbert
Wu Kesheng
Publication venue
Publication date: 02/11/2023
Field of study

This work starts an in situ processing capability to study a certain diffusion process in magnetic confinement fusion. This diffusion process involves plasma particles that are likely to escape confinement. Such particles carry a significant amount of energy from the burning plasma inside the tokamak to the diverter and damaging the diverter plate. This study requires in situ processing because of the fast changing nature of the particle diffusion process. However, the in situ processing approach is challenging because the amount of data to be retained for the diffusion calculations increases over time, unlike in other in situ processing cases where the amount of data to be processed is constant over time. Here we report our preliminary efforts to control the memory usage while ensuring the necessary analysis tasks are completed in a timely manner. Compared with an earlier naive attempt to directly computing the same diffusion displacements in the simulation code, this in situ version reduces the memory usage from particle information by nearly 60% and computation time by about 20%

arXiv.org e-Print Archive

Fast Change Point Detection for Electricity Market Analysis

Author: Choi Jaesik
Gu Ming
Gu William
Simon Horst
Wu Kesheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/10/2013
Field of study

Electricity is a vital part of our daily life; therefore it is important to avoid irregularities such as the California Electricity Crisis of 2000 and 2001. In this work, we seek to predict anomalies using advanced machine learning algorithms, more specifically a Change Point Detection (CPD) algorithm on the electricity prices during the California Electricity Crisis. Such algorithms are effective, but computationally expensive when applied on a large amount of data. To address this challenge, we accelerate the Gaussian Process (GP) for 1-dimensional time series data. Since GP is at the core of many statistical learning techniques, this improvement could benefit many algorithms. In the specific Change Point Detection algorithm used in this study, we reduce the overall computational complexity from O(n5) to O(n2), where the amountized cost of solving a GP projet is O(1). Our efficient algorithm makes it possible to compute the Change Points using the hourly price data during the California Electricity Crisis. By comparing the detected Change Points with known events, we show that the Change Point Detection algorithm is indeed effective in detecting signals preceding major events

ScholarWorks@UNIST

HDF5 As a Vehicle for in Transit Data Movement

Author: Bethel E Wes
Gu Junmin
Loring Burlen
Wu Kesheng
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

For in transit processing, one of the fundamental challenges is the efficient movement of data from producers to consumers. Exploiting the flexibility offered by the SENSEI generic in situ framework, we have developed a number of different in transit data transport mechanisms. In this work, we focus on the transport mechanism that leverages the HDF5 parallel I/O library, and investigate the performance characteristics of this transport mechanism. For in transit use cases at scale on HPC platforms, one might expect that an in transit data transport mechanism that uses faster layers of the storage hierarchy, such as DRAM memory, would always outperform a transport that uses slower layers of the storage hierarchy, such as an NVRAM-based persistent storage presented as a distributed file system. However, our test results show that the performance of the transport using NVRAM is competitive with the transport that uses socket-based data movement across varying levels of producer and consumer concurrency.}, booktitle = {Proceedings of the Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualizatio

Crossref

eScholarship - University of California

Photolysis of ((3-(Trimethylsilyl)propoxy)phenyl)phenyliodonium Salts in the Presence of 1-Naphthol and 1-Methoxynaphthalene

Author: Douglas C. Neckers
Haiyan Gu
Kesheng Feng
Wenqin Zhang
Publication venue: 'American Chemical Society (ACS)'
Publication date
Field of study

Crossref

Testing Vpin on Big Data Response to Reflecting on the Vpin Dispute

Author: Bethel Wes
Gu Ming
Leinweber David
Ruebel Oliver
Wu Kesheng
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 30/08/2013
Field of study

Crossref

eScholarship - University of California

Recommended from our members

Testing VPIN on Big Data – Response to 'Reflecting on the VPIN Dispute'

Author: Bethel Wes
Gu Ming
Leinweber David
Ruebel Oliver
Wu Kesheng
Publication venue: eScholarship, University of California
Publication date: 30/08/2013
Field of study

eScholarship - University of California

Recommended from our members

Grid Collector: Facilitating Efficient Selective Access from Data Grids

Author: Gu Junmin
Lauret Jerome
Poskanzer Arthur M.
Shoshani Arie
Sim Alexander
Wu Kesheng
Zhang Wei-Ming
Publication venue: eScholarship, University of California
Publication date: 17/05/2005
Field of study

The Grid Collector is a system that facilitates the effective analysis and spontaneous exploration of scientific data. It combines an efficient indexing technology with a Grid file management technology to speed up common analysis jobs on high-energy physics data and to enable some previously impractical analysis jobs. To analyze a set of high-energy collision events, one typically specifies the files containing the events of interest, reads all the events in the files, and filters out unwanted ones. Since most analysis jobs filter out significant number of events, a considerable amount of time is wasted by reading the unwanted events. The Grid Collector removes this inefficiency by allowing users to specify more precisely what events are of interest and to read only the selected events. This speeds up most analysis jobs. In existing analysis frameworks, the responsibility of bringing files from tertiary storages or remote sites to local disks falls on the users. This forces most of analysis jobs to be performed at centralized computer facilities where commonly used files are kept on large shared file systems. The Grid Collector automates file management tasks and eliminates the labor-intensive manual file transfers. This makes it much easier to perform analyses that require data files on tertiary storages and remote sites. It also makes more computer resources available for analysis jobs since they are no longer bound to the centralized facilities

eScholarship - University of California

UNT Digital Library

Grid Collector: An Event Catalog

Author: Arie Shoshani
Er Sim
Junmin Gu
Kesheng Wu
Wei-ming Zhang
With Automated File
Publication venue
Publication date
Field of study

High Energy Nuclear Physics (HENP) experiments such as STAR at BNL and ATLAS at CERN produce large amounts of data that are stored as files on mass storage systems in computer centers. In these files, the basic unit of data is an event. Analysis is typically performed on a selected set of events. The files containing these events have to be located, copied from mass storage systems to disks before analysis, and removed when no longer needed. These file management tasks are tedious and time consuming. Typically, all events contained in the files are read into memory before a selection is made. Since the time to read the events dominate the overall execution time, reading the unwanted event needlessly increases the analysis time. The Grid Collector is a set of software modules that works together to address these two issues. It automates the file management tasks and provides "direct" access to the selected events for analyses. It is currently integrated with the STAR analysis framework. The users can select events based on tags, such as, "production date between March 10 and 20, and the number of charged tracks > 100." The Grid Collector locates the files containing relevant events, transfers the files across the Grid if necessary, and delivers the events to the analysis code through the familiar iterators. There has been some research efforts to address the file management issues, the Grid Collector is unique in that it addresses the event access issue together with the file management issues. This makes it more useful to a large varieties of users

CiteSeerX