25 research outputs found
A metadata reporting framework (FRAMES) for synthesis of ecohydrological observations
Metadata describe the ancillary information needed for data preservation and independent interpretation, comparison across heterogeneous datasets, and quality assessment and quality control (QA/QC). Environmental observations are vastly diverse in type and structure, can be taken across a wide range of spatiotemporal scales in a variety of measurement settings and approaches, and saved in multiple formats. Thus, well-organized, consistent metadata are required to produce usable data products from diverse environmental observations collected across field sites. However, existing metadata reporting protocols do not support the complex data synthesis and model-data integration needs of interdisciplinary earth system research. We developed a metadata reporting framework (FRAMES) to enable management and synthesis of observational data that are essential in advancing a predictive understanding of earth systems. FRAMES utilizes best practices for data and metadata organization enabling consistent data reporting and compatibility with a variety of standardized data protocols. We used an iterative scientist-centered design process to develop FRAMES, resulting in a data reporting format that incorporates existing field practices to maximize data-entry efficiency. Thus, FRAMES has a modular organization that streamlines metadata reporting and can be expanded to incorporate additional data types. With FRAMES\u27s multi-scale measurement position hierarchy, data can be reported at observed spatial resolutions and then easily aggregated and linked across measurement types to support model-data integration. FRAMES is in early use by both data originators (persons generating data) and consumers (persons using data and metadata). In this paper, we describe FRAMES, identify lessons learned, and discuss areas of future development
COSORE: A community database for continuous soil respiration and other soilâatmosphere greenhouse gas flux data
Globally, soils store two to three times as much carbon as currently resides in the atmosphere, and it is critical to understand how soil greenhouse gas (GHG) emissions and uptake will respond to ongoing climate change. In particular, the soilâtoâatmosphere CO2 flux, commonly though imprecisely termed soil respiration (RS), is one of the largest carbon fluxes in the Earth system. An increasing number of highâfrequency RS measurements (typically, from an automated system with hourly sampling) have been made over the last two decades; an increasing number of methane measurements are being made with such systems as well. Such high frequency data are an invaluable resource for understanding GHG fluxes, but lack a central database or repository. Here we describe the lightweight, openâsource COSORE (COntinuous SOil REspiration) database and software, that focuses on automated, continuous and longâterm GHG flux datasets, and is intended to serve as a community resource for earth sciences, climate change syntheses and model evaluation. Contributed datasets are mapped to a single, consistent standard, with metadata on contributors, geographic location, measurement conditions and ancillary data. The design emphasizes the importance of reproducibility, scientific transparency and open access to data. While being oriented towards continuously measured RS, the database design accommodates other soilâatmosphere measurements (e.g. ecosystem respiration, chamberâmeasured net ecosystem exchange, methane fluxes) as well as experimental treatments (heterotrophic only, etc.). We give brief examples of the types of analyses possible using this new community resource and describe its accompanying R software package
The FLUXNET2015 dataset and the ONEFlux processing pipeline for eddy covariance data
The FLUXNET2015 dataset provides ecosystem-scale data on CO2, water, and energy exchange between the biosphere and the atmosphere, and other meteorological and biological measurements, from 212 sites around the globe (over 1500 site-years, up to and including year 2014). These sites, independently managed and operated, voluntarily contributed their data to create global datasets. Data were quality controlled and processed using uniform methods, to improve consistency and intercomparability across sites. The dataset is already being used in a number of applications, including ecophysiology studies, remote sensing studies, and development of ecosystem and Earth system models. FLUXNET2015 includes derived-data products, such as gap-filled time series, ecosystem respiration and photosynthetic uptake estimates, estimation of uncertainties, and metadata about the measurements, presented for the first time in this paper. In addition, 206 of these sites are for the first time distributed under a Creative Commons (CC-BY 4.0) license. This paper details this enhanced dataset and the processing methods, now made available as open-source codes, making the dataset more accessible, transparent, and reproducible.Peer reviewe
Effects of sample design and landscape features on a measure of environmental heterogeneity
Environmental heterogeneity, an important influence on organisms and ecological processes, can be quantified by the variance of an environmental characteristic over all locations within a study extent. However on landscapes with autocorrelation and gradient patterns, estimating this variance from a sample of locations may lead to errors that cannot be corrected with statistical techniques. We analytically derived the relative expected sampling error of sample designs on landscapes with particular gradient pattern and autocorrelation features. We applied this closed-form approach to temperature observations from an existing study. The expected heterogeneity differed, both in magnitude and direction, amongst sample designs over the study site's likely range of autocorrelation and gradient features. We conducted a simulation study to understand the effects of (i) landscape variability and (ii) design variability on an average sampling error. On 10 000 simulated landscapes with varying gradient and autocorrelation features, we compared estimates of variance from a variety of structured and random sample designs. While gradient patterns and autocorrelation cause large errors for some designs, others yield near-zero average sampling error. Sample location spacing is a key factor in sample design performance. Random designs have larger range of possible sampling errors than structured designs due to the potential for sample arrangements that over- and under-sample certain areas of the landscape. When implementing a new sample design to quantify environmental heterogeneity via variance, we recommend using a simple structured design with appropriate sample spacing. For existing designs, we recommend calculating the relative expected sampling error via our analytical derivation
Recommended from our members
Challenges in Building an End-to-End System for Acquisition, Management, and Integration of Diverse Data From Sensor Networks in Watersheds: Lessons From a Mountainous Community Observatory in East River, Colorado
The U.S. Department of Energy's Watershed Function Scientific Focus Area (SFA), centered in the East River, Colorado, generates diverse datasets including hydrological, geological, geochemical, geophysical, ecological, microbiological and remote sensing data. The project has deployed extensive field infrastructure involving hundreds of sensors that measure highly diverse phenomena (e.g. stream and groundwater hydrology, water quality, soil moisture, weather) across the watershed. Data from the sensor network are telemetered and automatically ingested into a queryable database. The data are subsequently quality checked, integrated with the United States Geological Survey's stream monitoring network using a custom data integration broker, and published to a portal with interactive visualizations. The resulting data products are used in a variety of scientific modeling and analytical efforts. This paper describes the SFA's end-to-end infrastructure and services that support the generation of integrated datasets from a watershed sensor network. The development and maintenance of this infrastructure, presents a suite of challenges from practical field logistics to complex data processing, which are addressed through various solutions. In particular, the SFA adopts a holistic view for data collection, assessment and integration, which dramatically improves the products generated, and enables a co-design approach wherein data collection is informed by model results and vice-versa.U.S. Department of EnergyUnited States Department of Energy (DOE) [DE-AC02-05CH11231]; WatershedFunction Scientific Focus Area - U.S. Department of Energy, Office of Science, Office of Biological, and Environmental ResearchUnited States Department of Energy (DOE) [DE-AC02-05CH11231]; National Energy Research Scientific Computing Center (NERSC), U.S. Department of Energy Office of Science User FacilityUnited States Department of Energy (DOE) [DE-AC02-05CH11231]; Environmental Systems Science Data Infrastructure for a Virtual Ecosystem (ESS-DIVE) [DE-AC02-05CH11231]; [DE-SC0009732]; [DE-SC0018447]Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]
AmeriFlux BASE data pipeline to support network growth and data sharing
Abstract AmeriFlux is a network of research sites that measure carbon, water, and energy fluxes between ecosystems and the atmosphere using the eddy covariance technique to study a variety of Earth science questions. AmeriFluxâs diversity of ecosystems, instruments, and data-processing routines create challenges for data standardization, quality assurance, and sharing across the network. To address these challenges, the AmeriFlux Management Project (AMP) designed and implemented the BASE data-processing pipeline. The pipeline begins with data uploaded by the site teams, followed by the AMP teamâs quality assurance and quality control (QA/QC), ingestion of site metadata, and publication of the BASE data product. The semi-automated pipeline enables us to keep pace with the rapid growth of the network. As of 2022, the AmeriFlux BASE data product contains 3,130 site years of data from 444 sites, with standardized units and variable names of more than 60 common variables, representing the largest long-term data repository for flux-met data in the world. The standardized, quality-ensured data product facilitates multisite comparisons, model evaluations, and data syntheses
Recommended from our members
AmeriFlux BASE data pipeline to support network growth and data sharing
AmeriFlux is a network of research sites that measure carbon, water, and energy fluxes between ecosystems and the atmosphere using the eddy covariance technique to study a variety of Earth science questions. AmeriFlux's diversity of ecosystems, instruments, and data-processing routines create challenges for data standardization, quality assurance, and sharing across the network. To address these challenges, the AmeriFlux Management Project (AMP) designed and implemented the BASE data-processing pipeline. The pipeline begins with data uploaded by the site teams, followed by the AMP team's quality assurance and quality control (QA/QC), ingestion of site metadata, and publication of the BASE data product. The semi-automated pipeline enables us to keep pace with the rapid growth of the network. As of 2022, the AmeriFlux BASE data product contains 3,130 site years of data from 444 sites, with standardized units and variable names of more than 60 common variables, representing the largest long-term data repository for flux-met data in the world. The standardized, quality-ensured data product facilitates multisite comparisons, model evaluations, and data syntheses