6 research outputs found

    Earth system data cubes unravel global multivariate dynamics

    Get PDF
    Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model-data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model-data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model-data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries. © 2020 Institute of Electrical and Electronics Engineers Inc.. All rights reserved

    Statistical analysis of systematic differences in the calculated pollutant concentrations of the models ECMWF/CAMS (regional reanalysis) and Polyphemus/ DLR

    Get PDF
    In the last two decades, air pollution was viewed as a very serious issue due to the development of infrastructure all over the world. Environmental stressors such as air temperature, radiation, humidity, wind, noise, pollens, and air pollutants (e.g., O3, NO2, PM10, PM2.5) can affect human health in a variety of ways. With the Copernicus Atmospheric Monitoring Service (CAMS) and the air quality in-situ measurements from the European Environmental Agency, a wealth of data of unprecedented quality and spatiotemporal resolution are available. These data are supplemented by available spatiotemporal high-resolution numerical models like chemical-transport models for the comprehensive description of the environmental conditions. Their advantages are constant coverage and high spatial and temporal resolution. However, it is very important to assess the model performances and comparability with in-situ or satellite observations. The main focus of this paper is to perform a comparison of the outputs of the Copernicus Atmosphere Monitoring Service (CAMS) – Europe Air Quality Reanalysis data and the chemical transport model POLYPHEMUS/DLR, with in-situ measurements (station data). The scope is to assess the discrepancies concerning the different chemical species and to provide statistical indicators like Mean Bias, FGE, RMSE, and Trend Analysis and correction weights describing the different characteristics of the models. Also, a Machine Learning approach was applied as an exploratory task, with the goal to predict concentrations at in-situ stations and to identify the influence of each parameters considered by the Polyphemus model. From the results, it was found that Polyphemus/ DLR model overestimates NO2, PM2.5, and PM10 and underestimates the O3, concentrations in urban and rural areas over the time window considered [June 2016 to Dec 2018]. CAMS outputs especially for PM10 and PM2.5 deviates from station observations though the outputs are corrected using EEA air quality station datasets. Overall, the parameters like surface temperature, boundary layer height and season were found to play a major role in both urban and rural regions. There are also significant changes in the influence of some parameters depending on location. This comparison study will help to understand the model performances (overestimation and underestimation) for each of the pollutants and help to select modelled data for health and air pollution-related research in the future

    Earth system data cubes unravel global multivariate dynamics

    Get PDF
    Understanding Earth system dynamics in light of ongoing human intervention and dependency remains a major scientific challenge. The unprecedented availability of data streams describing different facets of the Earth now offers fundamentally new avenues to address this quest. However, several practical hurdles, especially the lack of data interoperability, limit the joint potential of these data streams. Today, many initiatives within and beyond the Earth system sciences are exploring new approaches to overcome these hurdles and meet the growing interdisciplinary need for data-intensive research; using data cubes is one promising avenue. Here, we introduce the concept of Earth system data cubes and how to operate on them in a formal way. The idea is that treating multiple data dimensions, such as spatial, temporal, variable, frequency, and other grids alike, allows effective application of user-defined functions to co-interpret Earth observations and/or model- data integration. An implementation of this concept combines analysis-ready data cubes with a suitable analytic interface. In three case studies, we demonstrate how the concept and its implementation facilitate the execution of complex workflows for research across multiple variables, and spatial and temporal scales: (1) summary statistics for ecosystem and climate dynamics; (2) intrinsic dimensionality analysis on multiple timescales; and (3) model-data integration. We discuss the emerging perspectives for investigating global interacting and coupled phenomena in observed or simulated data. In particular, we see many emerging perspectives of this approach for interpreting large-scale model ensembles. The latest developments in machine learning, causal inference, and model-data integration can be seamlessly implemented in the proposed framework, supporting rapid progress in data-intensive research across disciplinary boundaries

    Multidimensional arrays for analysing geoscientific data

    No full text
    © 2018 by the authors. Geographic data is growing in size and variety, which calls for big data management tools and analysis methods. To efficiently integrate information from high dimensional data, this paper explicitly proposes array-based modeling. A large portion of Earth observations and model simulations are naturally arrays once digitalized. This paper discusses the challenges in using arrays such as the discretization of continuous spatiotemporal phenomena, irregular dimensions, regridding, high-dimensional data analysis, and large-scale data management. We define categories and applications of typical array operations, compare their implementation in open-source software, and demonstrate dimension reduction and array regridding in study cases using Landsat and MODIS imagery. It turns out that arrays are a convenient data structure for representing and analysing many spatiotemporal phenomena. Although the array model simplifies data organization, array properties like the meaning of grid cell values are rarely being made explicit in practice

    Multidimensional Arrays for Analysing Geoscientific Data

    No full text
    Geographic data is growing in size and variety, which calls for big data management tools and analysis methods. To efficiently integrate information from high dimensional data, this paper explicitly proposes array-based modeling. A large portion of Earth observations and model simulations are naturally arrays once digitalized. This paper discusses the challenges in using arrays such as the discretization of continuous spatiotemporal phenomena, irregular dimensions, regridding, high-dimensional data analysis, and large-scale data management. We define categories and applications of typical array operations, compare their implementation in open-source software, and demonstrate dimension reduction and array regridding in study cases using Landsat and MODIS imagery. It turns out that arrays are a convenient data structure for representing and analysing many spatiotemporal phenomena. Although the array model simplifies data organization, array properties like the meaning of grid cell values are rarely being made explicit in practice
    corecore