Coastal pH variability reconstructed through machine learning in the Balearic Sea

Abstract

[Description of methods used for collection/generation of data] Data was acquired in both stations using a SAMI-pH (Sunburst Sensors LCC) was attached, at 1 m in the Bay of Palma and at 4 m depth in Cabrera. The pH sensors were measuring pH, in the total scale (pH), hourly since December 2018 in the Bay of Palma and since November 2019 in Cabrera. The sensor precision and accuracy are < 0.001 pH and ± 0.003 pH units, respectively. Monthly maintenance of the sensors was performed including data download and surface cleaning. Temperature and salinity for the Cabrera mooring line was obtained starting November 2019 with a CT SBE37 (Sea-Bird Scientific©). Accuracy of the CT is ± 0.002 ∘C for temperature and ± 0.003 mS cm−1−1 for conductivity. Additionally, oxygen data from a SBE 63 (Sea-Bird Scientific ©) sensor attached to the CT in Cabrera were used. Accuracy of oxygen sensors is ± 2% for the SBE 63.[Methods for processing the data] Once data (available at https://doi.org/XXX/DigitalCSIC/XXX) was validated, several processing steps were performed to ensure an optimal training process for the neural network models. First, all the data of the time series were re-sampled by averaging the data points obtaining a daily frequency. Afterwards, a standard feature-scaling procedure (min-max normalization) was applied to every feature (temperature, salinity and oxygen) and to pHT. Finally, we built our training and validations sets as tensors with dimensions (batchsize, windowsize, features), where batchsize is the number of examples to train per iteration, windowsize is the number of past and future points considered and features is the number of features used to predict the target series. Temperature values below =12.5T=12.5 °C were discarded as they are considered outliers in sensor data outside the normal range in the study area. A BiDireccional Long Short-Term Memory (BD-LSTM) neural network was selected as the best architecture to reconstruct the pHT time series, with no signs of overfitting and achieving less than 1% error in both training and validation sets. Data corresponding to the Bay of Palma were used in the selection of the best neural network architecture. The code and data used to determine the best neural network architecture can be found in a GitHub repository mentioned in the context information.Funding for this work was provided by the projects RTI2018-095441-B-C21, RTI2018-095441-B-C22 (SuMaEco) and Grant MDM-2017-0711 (María de Maeztu Excellence Unit) funded by MCIN/AEI/10.13039/501100011033 and by the “ERDF A way of making Europe", the BBVA Foundation project Posi-COIN and the Balearic Islands Government projects AAEE111/2017 and SEPPO (2018). SF was supported by a “Margalida Comas” postdoctoral scholarship, also from the Balearic Islands Government. FFP was supported by the BOCATS2 (PID2019-104279GB-C21) project funded by MCIN/AEI/10.13039/501100011033.This work is a contribution to CSIC’s Thematic Interdisciplinary Platform PTI WATER:iOS (https://pti-waterios.csic.es/).Peer reviewe

    Similar works

    Full text

    thumbnail-image

    Available Versions