research

Investigating Access Performance of Long Time Series with Restructured Big Model Data

Abstract

Data sets generated by models are substantially increasing in volume, due to increases in spatial and temporal resolution, and the number of output variables. Many users wish to download subsetted data in preferred data formats and structures, as it is getting increasingly difficult to handle the original full-size data files. For example, application research users such as those involved with wind or solar energy, or extreme weather events are likely only interested in daily or hourly model data at a single point (or for a small area) for a long time period, and prefer to have the data downloaded in a single file. With native model file structures, such as hourly data from NASA Modern-Era Retrospective analysis for Research and Applications Version-2 (MERRA-2), it may take over 10 hours for the extraction of parameters-of-interest at a single point for 30 years. The NASA Goddard Earth Sciences Data and Information Services Center (GES DISC) is exploring methods to address this particular user need. One approach is to create value-added data by reconstructing the data files. Taking MERRA-2 data as an example, we have tested converting hourly data from one-day-per-file into different data cubes, such as one-month, or one-year. Performance is compared for reading local data files and accessing data through interoperable services, such as OPeNDAP. Results show that, compared to the original file structure, the new data cubes offer much better performance for accessing long time series. We have noticed that performance is associated with the cube size and structure, the compression method, and how the data are accessed. An optimized data cube structure will not only improve data access, but also may enable better online analysis service

    Similar works