Search CORE

390 research outputs found

클라우드 컴퓨팅 환경기반에서 수치 모델링과 머신러닝을 통한 지구과학 자료생성에 관한 연구

Author: 정광욱
Publication venue: 서울대학교 대학원
Publication date: 01/08/2022
Field of study

학위논문(박사) -- 서울대학교대학원 : 자연과학대학 지구환경과학부, 2022. 8. 조양기.To investigate changes and phenomena on Earth, many scientists use high-resolution-model results based on numerical models or develop and utilize machine learning-based prediction models with observed data. As information technology advances, there is a need for a practical methodology for generating local and global high-resolution numerical modeling and machine learning-based earth science data. This study recommends data generation and processing using high-resolution numerical models of earth science and machine learning-based prediction models in a cloud environment. To verify the reproducibility and portability of high-resolution numerical ocean model implementation on cloud computing, I simulated and analyzed the performance of a numerical ocean model at various resolutions in the model domain, including the Northwest Pacific Ocean, the East Sea, and the Yellow Sea. With the containerization method, it was possible to respond to changes in various infrastructure environments and achieve computational reproducibility effectively. The data augmentation of subsurface temperature data was performed using generative models to prepare large datasets for model training to predict the vertical temperature distribution in the ocean. To train the prediction model, data augmentation was performed using a generative model for observed data that is relatively insufficient compared to satellite dataset. In addition to observation data, HYCOM datasets were used for performance comparison, and the data distribution of augmented data was similar to the input data distribution. The ensemble method, which combines stand-alone predictive models, improved the performance of the predictive model compared to that of the model based on the existing observed data. Large amounts of computational resources were required for data synthesis, and the synthesis was performed in a cloud-based graphics processing unit environment. High-resolution numerical ocean model simulation, predictive model development, and the data generation method can improve predictive capabilities in the field of ocean science. The numerical modeling and generative models based on cloud computing used in this study can be broadly applied to various fields of earth science.지구의 변화와 현상을 연구하기 위해 많은 과학자들은 수치 모델을 기반으로 한 고해상도 모델 결과를 사용하거나 관측된 데이터로 머신러닝 기반 예측 모델을 개발하고 활용한다. 정보기술이 발전함에 따라 지역 및 전 지구적인 고해상도 수치 모델링과 머신러닝 기반 지구과학 데이터 생성을 위한 실용적인 방법론이 필요하다. 본 연구는 지구과학의 고해상도 수치 모델과 머신러닝 기반 예측 모델을 기반으로 한 데이터 생성 및 처리가 클라우드 환경에서 효과적으로 구현될 수 있음을 제안한다. 클라우드 컴퓨팅에서 고해상도 수치 해양 모델 구현의 재현성과 이식성을 검증하기 위해 북서태평양, 동해, 황해 등 모델 영역의 다양한 해상도에서 수치 해양 모델의 성능을 시뮬레이션하고 분석하였다. 컨테이너화 방식을 통해 다양한 인프라 환경 변화에 대응하고 계산 재현성을 효과적으로 확보할 수 있었다. 머신러닝 기반 데이터 생성의 적용을 검증하기 위해 생성 모델을 이용한 표층 이하 온도 데이터의 데이터 증강을 실행하여 해양의 수직 온도 분포를 예측하는 모델 훈련을 위한 대용량 데이터 세트를 준비했다. 예측모델 훈련을 위해 위성 데이터에 비해 상대적으로 부족한 관측 데이터에 대해서 생성 모델을 사용하여 데이터 증강을 수행하였다. 모델의 예측성능 비교에는 관측 데이터 외에도 HYCOM 데이터 세트를 사용하였으며, 증강 데이터의 데이터 분포는 입력 데이터 분포와 유사함을 확인하였다. 독립형 예측 모델을 결합한 앙상블 방식은 기존 관측 데이터를 기반으로 하는 예측 모델의 성능에 비해 향상되었다. 데이터합성을 위해 많은 양의 계산 자원이 필요했으며, 데이터 합성은 클라우드 기반 GPU 환경에서 수행되었다. 고해상도 수치 해양 모델 시뮬레이션, 예측 모델 개발, 데이터 생성 방법은 해양 과학 분야에서 예측 능력을 향상시킬 수 있다. 본 연구에서 사용된 클라우드 컴퓨팅 기반의 수치 모델링 및 생성 모델은 지구 과학의 다양한 분야에 광범위하게 적용될 수 있다.1. General Introduction 1 2. Performance of numerical ocean modeling on cloud computing 6 2.1. Introduction 6 2.2. Cloud Computing 9 2.2.1. Cloud computing overview 9 2.2.2. Commercial cloud computing services 12 2.3. Numerical model for performance analysis of commercial clouds 15 2.3.1. High Performance Linpack Benchmark 15 2.3.2. Benchmark Sustainable Memory Bandwidth and Memory Latency 16 2.3.3. Numerical Ocean Model 16 2.3.4. Deployment of Numerical Ocean Model and Benchmark Packages on Cloud Clusters 19 2.4. Simulation results 21 2.4.1. Benchmark simulation 21 2.4.2. Ocean model simulation 24 2.5. Analysis of ROMS performance on commercial clouds 26 2.5.1. Performance of ROMS according to H/W resources 26 2.5.2. Performance of ROMS according to grid size 34 2.6. Summary 41 3. Reproducibility of numerical ocean model on the cloud computing 44 3.1. Introduction 44 3.2. Containerization of numerical ocean model 47 3.2.1. Container virtualization 47 3.2.2. Container-based architecture for HPC 49 3.2.3. Container-based architecture for hybrid cloud 53 3.3. Materials and Methods 55 3.3.1. Comparison of traditional and container based HPC cluster workflows 55 3.3.2. Model domain and datasets for numerical simulation 57 3.3.3. Building the container image and registration in the repository 59 3.3.4. Configuring a numeric model execution cluster 64 3.4. Results and Discussion 74 3.4.1. Reproducibility 74 3.4.2. Portability and Performance 76 3.5. Conclusions 81 4. Generative models for the prediction of ocean temperature profile 84 4.1. Introduction 84 4.2. Materials and Methods 87 4.2.1. Model domain and datasets for predicting the subsurface temperature 87 4.2.2. Model architecture for predicting the subsurface temperature 90 4.2.3. Neural network generative models 91 4.2.4. Prediction Models 97 4.2.5. Accuracy 103 4.3. Results and Discussion 104 4.3.1. Data Generation 104 4.3.2. Ensemble Prediction 109 4.3.3. Limitations of this study and future works 111 4.4. Conclusion 111 5. Summary and conclusion 114 6. References 118 7. Abstract (in Korean) 140박

SNU Open Repository and Archive

Efficacy of Feedforward and LSTM Neural Networks at Predicting and Gap Filling Coastal Ocean Timeseries: Oxygen, Nutrients, and Temperature

Author: Contractor S
Roughan M
Publication venue: Frontiers
Publication date: 03/05/2021
Field of study

Ocean data timeseries are vital for a diverse range of stakeholders (ranging from government, to industry, to academia) to underpin research, support decision making, and identify environmental change. However, continuous monitoring and observation of ocean variables is difficult and expensive. Moreover, since oceans are vast, observations are typically sparse in spatial and temporal resolution. In addition, the hostile ocean environment creates challenges for collecting and maintaining data sets, such as instrument malfunctions and servicing, often resulting in temporal gaps of varying lengths. Neural networks (NN) have proven effective in many diverse big data applications, but few oceanographic applications have been tested using modern frameworks and architectures. Therefore, here we demonstrate a “proof of concept” neural network application using a popular “off-the-shelf” framework called “TensorFlow” to predict subsurface ocean variables including dissolved oxygen and nutrient (nitrate, phosphate, and silicate) concentrations, and temperature timeseries and show how these models can be used successfully for gap filling data products. We achieved a final prediction accuracy of over 96% for oxygen and temperature, and mean squared errors (MSE) of 2.63, 0.0099, and 0.78, for nitrates, phosphates, and silicates, respectively. The temperature gap-filling was done with an innovative contextual Long Short-Term Memory (LSTM) NN that uses data before and after the gap as separate feature variables. We also demonstrate the application of a novel dropout based approach to approximate the Bayesian uncertainty of these temperature predictions. This Bayesian uncertainty is represented in the form of 100 monte carlo dropout estimates of the two longest gaps in the temperature timeseries from a model with 25% dropout in the input and recurrent LSTM connections. Throughout the study, we present the NN training process including the tuning of the large number of NN hyperparameters which could pose as a barrier to uptake among researchers and other oceanographic data users. Our models can be scaled up and applied operationally to provide consistent, gap-free data to all data users, thus encouraging data uptake for data-based decision making

UNSWorks

An Artificial Neural Network to Infer the Mediterranean 3D Chlorophyll-a and Temperature Fields from Remote Sensing Observations

Author: Buongiorno Nardelli Bruno
Marullo Salvatore
Sammartino Michela
Santoleri Rosalia
Publication venue
Publication date: 17/12/2020
Field of study

Remote sensing data provide a huge number of sea surface observations, but cannot give direct information on deeper ocean layers, which can only be provided by sparse in situ data. The combination of measurements collected by satellite and in situ sensors represents one of the most effective strategies to improve our knowledge of the interior structure of the ocean ecosystems. In this work, we describe a Multi-Layer-Perceptron (MLP) network designed to reconstruct the 3D fields of ocean temperature and chlorophyll-a concentration, two variables of primary importance for many upper-ocean bio-physical processes. Artificial neural networks can efficiently model eventual non-linear relationships among input variables, and the choice of the predictors is thus crucial to build an accurate model. Here, concurrent temperature and chlorophyll-a in situ profiles and several different combinations of satellite-derived surface predictors are used to identify the optimal model configuration, focusing on the Mediterranean Sea. The lowest errors are obtained when taking in input surface chlorophyll-a, temperature, and altimeter-derived absolute dynamic topography and surface geostrophic velocity components. Network training and test validations give comparable results, significantly improving with respect to Mediterranean climatological data (MEDATLAS). 3D fields are then also reconstructed from full basin 2D satellite monthly climatologies (1998–2015) and resulting 3D seasonal patterns are analyzed. The method accurately infers the vertical shape of temperature and chlorophyll-a profiles and their spatial and temporal variability. It thus represents an effective tool to overcome the in-situ data sparseness and the limits of satellite observations, also potentially suitable for the initialization and validation of bio-geophysical models

Open Access Repository

Subsurface temperature estimation of mesoscale eddies in the Northwest Pacific Ocean from satellite observations using a residual muti-channel attention convolution network

Author: Anmin Zhang
Anmin Zhang
Hao Zhang
Jiayi Liu
Shuai Liu
Yicheng Liu
Publication venue: Frontiers Media S.A.
Publication date: 01/06/2024
Field of study

The mesoscale eddies are prevalent oceanic circulation phenomena, exerting significant influence on various aspects of the marine environment including energy transfer, material transport and ecosystem dynamics in the Northwest Pacific Ocean. However, due to sparse vertical observational data, the understanding of the three-dimensional temperature structure of individual cases of mesoscale eddies remains limited. In recent years, utilizing surface remote sensing observations to estimate subsurface temperature anomaly has been crucial for comprehending the intricate multi-dimensional dynamic processes in the ocean. Consequently, this paper proposes an eddy residual multi-channel attention convolution network (ERCACN) with the adaptive threshold and designs the combination of various surface features to estimate the eddy subsurface temperature anomaly (ESTA). By integrating results with climatic temperature, thermal structures containing 46 levels at depths up to 1000 m could be obtained, achieving excellent daily temporal resolution and 0.25° spatial resolution. Validation using independent Argo profiles from 2016 to 2017 reveals that the combination of multiple surface variables outperforms univariate methods, and the ERCACN model demonstrates superior performance compared to other approaches. Overall, with an 8% error deemed acceptable, the ERCACN model achieves a precision of 88.08% in estimating ESTA. This method provides a novel perspective for other essential oceanic variables, contributing to a better perception of the global climate system

Directory of Open Access Journals

Spatial-Temporal Data Mining for Ocean Science: Data, Methodologies, and Opportunities

Author: Cao Jiannong
Guan Jihong
Li Hui
Li Wengen
Wang Shuyu
Yang Hanchen
Zhou Shuigeng
Publication venue
Publication date: 20/07/2023
Field of study

With the increasing amount of spatial-temporal~(ST) ocean data, numerous spatial-temporal data mining (STDM) studies have been conducted to address various oceanic issues, e.g., climate forecasting and disaster warning. Compared with typical ST data (e.g., traffic data), ST ocean data is more complicated with some unique characteristics, e.g., diverse regionality and high sparsity. These characteristics make it difficult to design and train STDM models. Unfortunately, an overview of these studies is still missing, hindering computer scientists to identify the research issues in ocean while discouraging researchers in ocean science from applying advanced STDM techniques. To remedy this situation, we provide a comprehensive survey to summarize existing STDM studies in ocean. Concretely, we first summarize the widely-used ST ocean datasets and identify their unique characteristics. Then, typical ST ocean data quality enhancement techniques are discussed. Next, we classify existing STDM studies for ocean into four types of tasks, i.e., prediction, event detection, pattern mining, and anomaly detection, and elaborate the techniques for these tasks. Finally, promising research opportunities are highlighted. This survey will help scientists from the fields of both computer science and ocean science have a better understanding of the fundamental concepts, key techniques, and open challenges of STDM in ocean

arXiv.org e-Print Archive

Study on prediction of SST and SSS in Southern Ocean by multi-layers ConvLSTM model（多層ConvLSTMモデルによる南極海の海面水温および海面塩分の予測に関する研究）

Author: 魏梓桐
Publication venue: 北出裕二郎
Publication date: 18/06/2021
Field of study

東京海洋大学修士学位論文 2020年度(2021年3月) 海洋資源環境学修士第3523号指導教員: 北出裕二郎全文公表年月日: 2021-06-21東京海洋大学202

Open Access Collection of International and Scholarly Papers

Convolutional GRU Network for Seasonal Prediction of the El Ni\~no-Southern Oscillation

Author: Ammons Savana
Hur Vera Mikyoung
Sriver Ryan L.
Wang Lingda
Zhao Zhizhen
Publication venue
Publication date: 17/06/2023
Field of study

Predicting sea surface temperature (SST) within the El Ni\~no-Southern Oscillation (ENSO) region has been extensively studied due to its significant influence on global temperature and precipitation patterns. Statistical models such as linear inverse model (LIM), analog forecasting (AF), and recurrent neural network (RNN) have been widely used for ENSO prediction, offering flexibility and relatively low computational expense compared to large dynamic models. However, these models have limitations in capturing spatial patterns in SST variability or relying on linear dynamics. Here we present a modified Convolutional Gated Recurrent Unit (ConvGRU) network for the ENSO region spatio-temporal sequence prediction problem, along with the Ni\~no 3.4 index prediction as a down stream task. The proposed ConvGRU network, with an encoder-decoder sequence-to-sequence structure, takes historical SST maps of the Pacific region as input and generates future SST maps for subsequent months within the ENSO region. To evaluate the performance of the ConvGRU network, we trained and tested it using data from multiple large climate models. The results demonstrate that the ConvGRU network significantly improves the predictability of the Ni\~no 3.4 index compared to LIM, AF, and RNN. This improvement is evidenced by extended useful prediction range, higher Pearson correlation, and lower root-mean-square error. The proposed model holds promise for improving our understanding and predicting capabilities of the ENSO phenomenon and can be broadly applicable to other weather and climate prediction scenarios with spatial patterns and teleconnections.Comment: 13 pages, 7 figure

arXiv.org e-Print Archive

Machine Learning for Earth Systems Modeling, Analysis and Predictability

Author: Passarella Linsey
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/08/2022
Field of study

Artificial intelligence (AI) and machine learning (ML) methods and applications have been continuously explored in many areas of scientific research. While these methods have lead to many advances in climate science, there remains room for growth especially in Earth System Modeling, analysis and predictability. Due to their high computational expense and large volumes of complex data they produce, earth system models (ESMs) provide an abundance of potential for enhancing both our understanding of the climate system as well as improving performance of ESMs themselves using ML techniques. Here I demonstrate 3 specific areas of development using ML: statistical downscaling, predictability using non-linear latent spaces and emulation of complex parametrization. These three areas of research illustrate the ability of innovative ML methods to advance our understanding of climate systems through ESMs. In Aim 1, I present a first application of a fast super resolution convolutional neural network (FSRCNN) based approach for downscaling earth system model (ESM) simulations. We adapt the FSRCNN to improve reconstruction on ESM data, we term the FSRCNN-ESM. We find that FSRCNN-ESM outperforms FSRCNN and other super-resolution methods in reconstructing high resolution images producing finer spatial scale features with better accuracy for surface temperature, surface radiative fluxes and precipitation. In Aim 2, I construct a novel Multi-Input Multi-Output Autoencoder-decoder (MIMO-AE) in an application of multi-task learning to capture the non-linear relationship of Southern California precipitation (SC-PRECIP) and tropical Pacific Ocean sea surface temperature (TP-SST) on monthly time-scales. I find that the MIMO-AE index provides enhanced predictability of SC-PRECIP for a lead-time of up-to four months as compared to Ni{\~n}o 3.4 index and the El Ni{\~n}o Southern Oscillation Longitudinal Index. I also use a MTL method to expand on a convolutional long short term memory (conv-LSTM) to predict Nino 3.4 index by including multiple input variables known to be associated with ENSO, namely sea level pressure (SLP), outgoing longwave radiation (ORL) and surface level zonal winds (U). In Aim 3, I demonstrate the capability of DNNs for learning computationally expensive parameterizations in ESMs. This study develops a DNN to replace the full radiation model in the E3SM

University of Tennessee, Knoxville: Trace