13 research outputs found

    Evaluation of object storage technologies for climate data storage and analysis

    Get PDF
    RESUMEN: El análisis de datos en ciencias de la tierra ha estado dominado por el modelo descargar-analizar, por el cual un científico primero descarga el dataset, desde un servidor remoto, a su estación de trabajo o infraestructura HPC de su institución y después procede a su análisis. Con el paso del tiempo, el tamaño y variedad de los datasets ha aumentado de forma exponencial y, a su vez, se han introducido nuevas técnicas de análisis de datos. Estos cambios han introducido nuevos requisitos en los sistemas que almacenan los datasets y en las herramientas de análisis. En la comunidad científica del clima, el formato dominante para los datasets es netCDF, que con el paso del tiempo ha incorporado nuevas funcionalidades para permitir un almacenamiento y acceso a los datos de forma más eficiente, como el uso del formato HDF5 y su técnica de chunking, que permite el uso de sistemas de ficheros en paralelo. El acceso a datos también se ha visto beneficiado de protocolos que permiten el acceso a un subconjunto de los datasets, como por ejemplo DAP. En los últimos años, el cloud computing y en concreto el object storage, se han presentado como una alternativa tanto para el almacenamiento como para el análisis de datos, por lo que están propiciando la aparición de nuevas especificaciones de almacenamiento y de acceso a los datasets, como por ejemplo Zarr. El object storage permite asignar un identificador alfanumérico (hash id) a un bloque arbitrario de bytes (blob) combinado con APIs de tipo REST. El objetivo del trabajo consiste en la evaluación de los beneficios y la eficiencia de estas nuevas tecnologías y especificaciones respecto a las ya existentes, tanto para el almacenamiento como el acceso de datos para su análisis.ABSTRACT: Data analytics in earth science have been dominated by the download-analyze model, in which data analysts first download the desired dataset from a remote server to it’s local workstation or HPC infrastructure, in order to perform the desired analysis. Over time, the size and variety of datasets have increased exponentially and new data science methodologies have appeared, along with new requirements in how datasets are stored and analyzed. In the climate community, climate data is usually stored as netCDF, which has incorporated, new functionalities such as HDF5 storage and chunking, that allows netCDF files to be accessed in parallel by parallel file systems. Data access has also been improved by protocols like the DAP, which allows to access only the required subset from a remote dataset. In recent years, cloud computing and more specifically object storage, have appeared as an alternative to store climate data and to perform data analysis. This fact has encouraged the development of new storage specifications and libraries, such as Zarr. Object storage works by assigning a string (hash id) to an arbitrary block of bytes (blob), combined with REST APIs. The purpose of this work is to compare these new technologies with the traditional stack both for data analysis and data storage.Máster en Ciencia de Dato

    Downscaling multi-model climate projection ensembles with deep learning (DeepESD): contribution to CORDEX EUR-44

    Get PDF
    Deep learning (DL) has recently emerged as an innovative tool to downscale climate variables from large-scale atmospheric fields under the perfect-prognosis (PP) approach. Different convolutional neural networks (CNNs) have been applied under present-day conditions with promising results, but little is known about their suitability for extrapolating future climate change conditions. Here, we analyze this problem from a multi-model perspective, developing and evaluating an ensemble of CNN-based downscaled projections (hereafter DeepESD) for temperature and precipitation over the European EUR-44i (0.5º) domain, based on eight global circulation models (GCMs) from the Coupled Model Intercomparison Project Phase 5 (CMIP5). To our knowledge, this is the first time that CNNs have been used to produce downscaled multi-model ensembles based on the perfect-prognosis approach, allowing us to quantify inter-model uncertainty in climate change signals. The results are compared with those corresponding to an EUR-44 ensemble of regional climate models (RCMs) showing that DeepESD reduces distributional biases in the historical period. Moreover, the resulting climate change signals are broadly comparable to those obtained with the RCMs, with similar spatial structures. As for the uncertainty of the climate change signal (measured on the basis of inter-model spread), DeepESD preserves the uncertainty for temperature and results in a reduced uncertainty for precipitation. To facilitate further studies of this downscaling approach, we follow FAIR principles and make publicly available the code (a Jupyter notebook) and the DeepESD dataset. In particular, DeepESD is published at the Earth System Grid Federation (ESGF), as the first continental-wide PP dataset contributing to CORDEX (EUR-44).This research has been supported by the Spanish Government (MCIN/AEI /10.13039/501100011033) through project CORDyS (grant no. PID2020-116595RB-I00)

    Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository

    Get PDF
    The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) has adopted the FAIR Guiding Principles. We present the Atlas chapter of Working Group I (WGI) as a test case. We describe the application of the FAIR principles in the Atlas, the challenges faced during its implementation, and those that remain for the future. We introduce the open source repository resulting from this process, including coding (e.g., annotated Jupyter notebooks), data provenance, and some aggregated datasets used in some figures in the Atlas chapter and its interactive companion (the Interactive Atlas), open to scrutiny by the scientific community and the general public. We describe the informal pilot review conducted on this repository to gather recommendations that led to significant improvements. Finally, a working example illustrates the re-use of the repository resources to produce customized regional information, extending the Interactive Atlas products and running the code interactively in a web browser using Jupyter notebooks.Peer reviewe

    The worldwide C3S CORDEX grand ensemble: A major contribution to assess regional climate change in the IPCC AR6 Atlas

    Get PDF
    peer reviewedAbstract The collaboration between the Coordinated Regional Climate Downscaling Experiment (CORDEX) and the Earth System Grid Federation (ESGF) provides open access to an unprecedented ensemble of Regional Climate Model (RCM) simulations, across the 14 CORDEX continental-scale domains, with global coverage. These simulations have been used as a new line of evidence to assess regional climate projections in the latest contribution of the Working Group I (WGI) to the IPCC Sixth Assessment Report (AR6), particularly in the regional chapters and the Atlas. Here, we present the work done in the framework of the Copernicus Climate Change Service (C3S) to assemble a consistent worldwide CORDEX grand ensemble, aligned with the deadlines and activities of IPCC AR6. This work addressed the uneven and heterogeneous availability of CORDEX ESGF data by supporting publication in CORDEX domains with few archived simulations and performing quality control. It also addressed the lack of comprehensive documentation by compiling information from all contributing regional models, allowing for an informed use of data. In addition to presenting the worldwide CORDEX dataset, we assess here its consistency for precipitation and temperature by comparing climate change signals in regions with overlapping CORDEX domains, obtaining overall coincident regional climate change signals. The C3S CORDEX dataset has been used for the assessment of regional climate change in the IPCC AR6 (and for the interactive Atlas) and is available through the Copernicus Climate Data Store (CDS)

    Viajeros, pernoctaciones y estancia media segun procedencia

    No full text
    Práctica para Ciclo de Vida de los Dato

    The Worldwide C3S CORDEX Grand Ensemble: A Major Contribution to Assess Regional Climate Change in the IPCC AR6 Atlas

    Get PDF
    The collaboration between the Coordinated Regional Climate Downscaling Experiment (CORDEX) and the Earth System Grid Federation (ESGF) provides open access to an unprecedented ensemble of regional climate model (RCM) simulations, across the 14 CORDEX continental-scale domains, with global coverage. These simulations have been used as a new line of evidence to assess regional climate projections in the latest contribution of the Working Group I (WGI) to the IPCC Sixth Assessment Report (AR6), particularly in the regional chapters and the Atlas. Here, we present the work done in the framework of the Copernicus Climate Change Service (C3S) to assemble a consistent worldwide CORDEX grand ensemble, aligned with the deadlines and activities of IPCC AR6. This work addressed the uneven and heterogeneous availability of CORDEX ESGF data by supporting publication in CORDEX domains with few archived simulations and performing quality control. It also addressed the lack of comprehensive documentation by compiling information from all contributing regional models, allowing for an informed use of data. In addition to presenting the worldwide CORDEX dataset, we assess here its consistency for precipitation and temperature by comparing climate change signals in regions with overlapping CORDEX domains, obtaining overall coincident regional climate change signals. The C3S CORDEX dataset has been used for the assessment of regional climate change in the IPCC AR6 (and for the interactive Atlas) and is available through the Copernicus Climate Data Store (CDS).ISSN:0003-0007ISSN:1520-047

    Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository

    No full text
    The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) has adopted the FAIR Guiding Principles. We present the Atlas chapter of Working Group I (WGI) as a test case. We describe the application of the FAIR principles in the Atlas, the challenges faced during its implementation, and those that remain for the future. We introduce the open source repository resulting from this process, including coding (e.g., annotated Jupyter notebooks), data provenance, and some aggregated datasets used in some figures in the Atlas chapter and its interactive companion (the Interactive Atlas), open to scrutiny by the scientific community and the general public. We describe the informal pilot review conducted on this repository to gather recommendations that led to significant improvements. Finally, a working example illustrates the re-use of the repository resources to produce customized regional information, extending the Interactive Atlas products and running the code interactively in a web browser using Jupyter notebooks.We acknowledge partial funding from projects ATLAS (PID2019-111481RB-I00) funded by MCIN/AEI/10.13039/501100011033 and IS-ENES3 which is funded by the European Union’s H2020 programme under grant agreement No 824084. We also acknowledge the World Climate Research Programme’s Working Group on Coupled Modelling and Working Group on Regional Climate, responsible for CMIP and CORDEX, respectively. We also thank the climate modeling groups for producing and making available their model output, as described in the data-source folder of the repository. We also acknowledge the Earth System Grid Federation infrastructure, an international effort led by the U.S. Department of Energy’s Program for Climate Model Diagnosis and Intercomparison, the European Network for Earth System Modelling and other partners in the Global Organisation for Earth System Science Portals (GO-ESSP). The opinions expressed are those of the author(s) only and should not be considered as representative of the European Commission’s official position. JF and ASC acknowledge support from the CORDyS project (PID2020-116595RB-I00) funded by MCIN/AEI/10.13039/501100011033. JM acknowledges support from MDM-2017-0765 funded by MCIN/AEI/10.13039/501100011033. JBM acknowledges support from Universidad de Cantabria and Consejería de Universidades, Igualdad, Cultura y Deporte del Gobierno de Cantabria via the project “instrumentación y ciencia de datos para sondear la naturaleza del universo”. Finally we want to thank the reviewers participating in the Atlas FAIR review described in the paper and the editor and the two anonymous referees for their work and constructive comments, helping us to improve the manuscript

    An update of IPCC climate reference regions for subcontinental analysis of climate model data: definition and aggregated datasets

    Get PDF
    Several sets of reference regions have been used in the literature for the regional synthesis of observed and modelled climate and climate change information. A popular example is the series of reference regions used in the Intergovernmental Panel on Climate Change (IPCC) Special Report on Managing the Risks of Extreme Events and Disasters to Advance Climate Adaptation (SREX). The SREX regions were slightly modified for the Fifth Assessment Report of the IPCC and used for reporting subcontinental observed and projected changes over a reduced number (33) of climatologically consistent regions encompassing a representative number of grid boxes. These regions are intended to allow analysis of atmospheric data over broad land or ocean regions and have been used as the basis for several popular spatially aggregated datasets, such as the Seasonal Mean Temperature and Precipitation in IPCC Regions for CMIP5 dataset. We present an updated version of the reference regions for the analysis of new observed and simulated datasets (including CMIP6) which offer an opportunity for refinement due to the higher atmospheric model resolution. As a result, the number of land and ocean regions is increased to 46 and 15, respectively, better representing consistent regional climate features. The paper describes the rationale for the definition of the new regions and analyses their homogeneity. The regions are defined as polygons and are provided as coordinates and a shapefile together with companion R and Python notebooks to illustrate their use in practical problems (e.g. calculating regional averages).We also describe the generation of a new dataset with monthly temperature and precipitation, spatially aggregated in the new regions, currently for CMIP5 and CMIP6, to be extended to other datasets in the future (including observations). The use of these reference regions, dataset and code is illustrated through a worked example using scatter plots to offer guidance on the likely range of future climate change at the scale of the reference regions. The regions, datasets and code (R and Python notebooks) are freely available at the ATLAS GitHub repository: https://github.com/SantanderMetGroup/ATLAS (last access: 24 August 2020), https://doi.org/10.5281/zenodo.3998463 (Iturbide et al., 2020).This research has been supported by the Spanish National Plan for Scientific and Technical Research and Innovation (project PID2019-111481RB-I00 and María de Maeztu excellence programme projects MdM-2017-0765 and MdM-2017-0714), FCT MCTES financial support to CESAM (UIDP/50017/2020+UIDB/50017/2020), and the Basque Government BERC 2018–2021 programm
    corecore