14 research outputs found
A Global Repository for Planet-Sized Experiments and Observations
Working across U.S. federal agencies, international agencies, and multiple worldwide data centers, and spanning seven international network organizations, the Earth System Grid Federation (ESGF) allows users to access, analyze, and visualize data using a globally federated collection of networks, computers, and software. Its architecture employs a system of geographically distributed peer nodes that are independently administered yet united by common federation protocols and application programming interfaces (APIs). The full ESGF infrastructure has now been adopted by multiple Earth science projects and allows access to petabytes of geophysical data, including the Coupled Model Intercomparison Project (CMIP)—output used by the Intergovernmental Panel on Climate Change assessment reports. Data served by ESGF not only include model output (i.e., CMIP simulation runs) but also include observational data from satellites and instruments, reanalyses, and generated images. Metadata summarize basic information about the data for fast and easy data discovery.This work was supported by the U.S. Department of Energy Office of Science/Office of Biological and Environmental Research under Contract DE-AC52-07NA27344 at Lawrence Livermore National Laboratory. VB is supported by the Cooperative Institute for Climate Science, Princeton University, under Award NA08OAR4320752 from the National Oceanic and Atmospheric Administration, U.S. Department of Commerce. Part of this work was undertaken with the
assistance of resources from the National Computational
Infrastructure (NCI), which is supported by the Australian Government. Part of this activity was performed on behalf of the Jet Propulsion Laboratory, California Institute of Technology, under a contract with NASA. Part of this activity was performed on behalf of the Goddard Space Flight Center, under a contract with NASA. This work was supported by ANR Convergence project (Grant Agreement ANR-13-MONU-0008). This work was supported by FP7 IS-ENES2 project (Grant Agreement 312979)
Implementation of FAIR principles in the IPCC: the WGI AR6 Atlas repository
The Sixth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC) has adopted the FAIR Guiding Principles. We present the Atlas chapter of Working Group I (WGI) as a test case. We describe the application of the FAIR principles in the Atlas, the challenges faced during its implementation, and those that remain for the future. We introduce the open source repository resulting from this process, including coding (e.g., annotated Jupyter notebooks), data provenance, and some aggregated datasets used in some figures in the Atlas chapter and its interactive companion (the Interactive Atlas), open to scrutiny by the scientific community and the general public. We describe the informal pilot review conducted on this repository to gather recommendations that led to significant improvements. Finally, a working example illustrates the re-use of the repository resources to produce customized regional information, extending the Interactive Atlas products and running the code interactively in a web browser using Jupyter notebooks.Peer reviewe
Robustness and uncertainties in global multivariate wind-wave climate projections
Understanding climate-driven impacts on the multivariate global wind-wave climate is paramount to effective offshore/coastal climate adaptation planning. However, the use of single-method ensembles and variations arising from different methodologies has resulted in unquantified uncertainty amongst existing global wave climate projections. Here, assessing the first coherent, community-driven, multi-method ensemble of global wave climate projections, we demonstrate widespread ocean regions with robust changes in annual mean significant wave height and mean wave period of 5–15% and shifts in mean wave direction of 5–15°, under a high-emission scenario. Approximately 50% of the world’s coastline is at risk from wave climate change, with ~40% revealing robust changes in at least two variables. Furthermore, we find that uncertainty in current projections is dominated by climate model-driven uncertainty, and that single-method modelling studies are unable to capture up to ~50% of the total associated uncertainty
Persistent Identifier Practice for Big Data Management at NCI
The National Computational Infrastructure (NCI) manages over 10 PB research data, which is co-located with the high performance computer (Raijin) and an HPC class 3000 core OpenStack cloud system (Tenjin). In support of this integrated High Performance Computing/High Performance Data (HPC/HPD) infrastructure, NCI’s data management practices includes building catalogues, DOI minting, data curation, data publishing, and data delivery through a variety of data services. The metadata catalogues, DOIs, THREDDS, and Vocabularies, all use different Uniform Resource Locator (URL) styles. A Persistent IDentifier (PID) service provides an important utility to manage URLs in a consistent, controlled and monitored manner to support the robustness of our national ‘Big Data’ infrastructure. In this paper we demonstrate NCI’s approach of utilising the NCI’s 'PID Service 'to consistently manage its persistent identifiers with various applications
Global wave hindcast with Australian and Pacific Island Focus: From past to present
Abstract Wind‐wave hindcast data have many applications including climatology assessments for renewable energy projects, maritime engineering design, event‐based impact assessments, generating boundary conditions for further downscaling, amongst others. Here, we present a global wave hindcast with nested high‐resolution grids for the Exclusive Economic Zones of Australia and south west Pacific Island Countries, that is extended in time monthly. The model employs strategic methods to incorporate the effects of subgrid sized features such as small islands and islets. Various bulk wave parameters are available hourly from January 1979 to present, along with the full wave spectra at a set of 3,683 predetermined points distributed globally
Forging a path to a better normal for conferences and collaboration
The 2020 COVID-19 pandemic forced a string of cancelled conferences, causing many organizers to shift meetings online, with mixed success. Seizing the opportunity, a group of researchers came together to rethink how the conference experience and collaboration in general can be improved in a more virtual-centric future
The Australian Geoscience Data Cube - foundations and lessons learned
The Australian Geoscience Data Cube (AGDC) aims to realise the full potential of Earth observation data holdings by addressing the Big Data challenges of volume, velocity, and variety that otherwise limit the usefulness of Earth observation data. There have been several iterations and AGDC version 2 is a major advance on previous work. The foundations and core components of the data cube are: (1) data preparation, including geometric and spectral radiometric corrections to Earth observation data to produce standardised surface reflectance measurements that support time-series analysis, and collection management systems which track the provenance each data cube product and formalise re-processing decisions; (2) the software environment used to manage and interact with the data, including a minimal relational model that uses ‘not-only-SQL’ to simplify the process of adding new datasets to the data cube, or to simply ‘reference’ external datasets; and (3) the supporting, integrated, high performance computing - high performance data environment (HPC-HPD) provided by the Australian National Computational Infrastructure which supports both large scale analysis within the NCI, and direct access to data using standards-based web services. A growing number of exemplars demonstrate that the data cube approach allows analysts to extract rich new information from Earth observation time series, including through new methods that draw on the full spatial and temporal coverage of the Earth observation archives. To enable easy-uptake of the AGDC, and to facilitate future cooperative development, our code is developed under an open-source, Apache License, Version 2.0. This open-source approach is enabling other organisations, including the Committee on Earth Observing Satellites (CEOS), to explore the use of similar data cubes in developing countries.This work funded by the Australian Government through Geoscience Australia and the NCI.
Funding for the supporting HPC-HPD infrastructure at NCI came from the Australian Government Department of Education, through the National Collaborative Research Infrastructure (NCRIS) and Education Investment Fund (EIF) Super Science Initiatives via the NCI, Research Data Storage Infrastructure (RDSI) and Research Data Services (RDS) projects (particularly the A1.1 National Earth Systems Science Data Services Domain)
Strategie Roadmap for the Earth System Grid Federation
This article describes the Earth System Grid
Federation (ESGF) mission and an international integration
strategy for data, database and computational architecture,
and stable infrastructure highlighted by the authors (the ESGF
Executive Committee). These highlights are key developments
needed over the next five to seven years in response to largescale
national and international climate community projects
that depend on ESGF for success. Quality assurance and
baseline performance from laptop to high performance
computing characterizes available and potential data streams
and strategies. These are required for interactive data
collections to remedy gaps in handling enormous international
federated climate data archives. Appropriate cyber security
ensures protection of data according to projects but still allows
access and portability to different ESGF and individual groups
and users. A timeline and plan for forecasting interoperable
tools takes ESGF from a federated database archive to a
robust virtual laboratory and concludes the article
The NCI High Performance Computing and High Performance Data Platform to Support the Analysis of Petascale Environmental Data Collections
Part 8: High Performance Computing and BigDataInternational audienceThe National Computational Infrastructure (NCI) at the Australian National University (ANU) has co-located a priority set of over 10 PetaBytes (PBytes) of national data collections within a HPC research facility. The facility provides an integrated high-performance computational and storage platform, or a High Performance Data (HPD) platform, to serve and analyse the massive amounts of data across the spectrum of environmental collections – in particular from the climate, environmental and geoscientific domains. The data is managed in concert with the government agencies, major academic research communities and collaborating overseas organisations. By co-locating the vast data collections with high performance computing environments and harmonising these large valuable data assets, new opportunities have arisen for Data-Intensive interdisciplinary science at scales and resolutions not hitherto possible