Management of large geospatial datasets

Lau, Ka Hin

Management of large geospatial datasets

Authors: Ka Hin Lau
Publication date: 15 May 2022
Publisher: 'UiT The Arctic University of Norway'

Abstract

In large simulations, like predicting the movement of ocean particles, it is common that simulation executions are related when they share one or more inputs. When the number of simulations increases, it becomes harder for users who run the simulations to keep track of all the simulations. Also, more storage spaces are wasted if there are multiple copies of the same input files. This thesis describes a system that collects data from previous simulations, allowing users to search for the data they need to run the next simulation. Also, the system identifies the same files that were used in previous simulations, which allows users to re-use these files instead of copying the files to a new simulation folder to use them. Among the simulations that were executed in our current environment, the system identifies around 11\% of input files that are shared by the simulations. Users can refer to the same file to use it instead of copying the file to new simulation folders. The conclusion is that the system helps users who run simulations to reduce their efforts and time to find input files that are used in previous simulations when they set up for a new simulation. Also, it saves storage space on the computing cluster where the simulations run on by identifying the duplicated data

Similar works

Full text

Open in the Core reader

Download PDF

Available Versions

Munin - Open Research Archive

oai:munin.uit.no:10037/25914

Last time updated on 05/11/2022