29 research outputs found

    Developing an e-Research infrastructure for Australian earth sciences: the NCRIS 5.13 AuScope Grid

    No full text
    On the 27 November 2006 the Minister for the Department of Education, Science and Technology announced that under the National Collaborative Research Infrastructure Strategy (NCRIS) $42.8 million would go to Australian Earth Sciences to help build an integrated national infrastructure system called AuScope. A key element of AuScope is the AuScope Grid, which comprises an Earth Science Data Grid and a Compute Grid. Combined both provide a distributed computational e-Research infrastructure that will enable the construction of a dynamic updateable 4D Australian Earth Model. The goal of the AuScope Compute Grid is to facilitate quantitative geoscience analysis by providing an infrastructure and tools for advanced data mining, simulation and computational modelling. The AuScope Earth Science Data Grid is a proposed national geoscience data network, which aims to use international standards to allow real time access to data, information and knowledge stored in distributed repositories from academia, industry and government. The Data grid will be also built on ‘end-to-end’ Science principles (aka open access principles) whereby there will be access to the highly processed information and knowledge, as well as the original raw data and the processing programs used to generate the results

    Guest Editorial: Special issue Rescuing Legacy data for Future Science

    Get PDF
    Research and discovery in the natural sciences, particularly for documenting changes in our planet, is empowered by gathering, mining, and reusing observational data. However much of the data required, particularly data from the pre-digital era, are no longer accessible to science. The data are hidden away in investigators’ desks on printed paper records, or are no longer readable as they are on deteriorating or outdated media, and are not documented in a way that makes them re-usable. Special initiatives are required to rescue them and preserve such data so that they can contribute to the scientific debates of today and those of the future. Data rescue efforts are key to making data resources accessible that are at risk of being lost forever when researchers retire or die, or when data formats or storage media are obsolete and unreadable

    Versioning data is about more than revisions : A conceptual framework and proposed principles

    Get PDF
    A dataset, small or big, is often changed to correct errors, apply new algorithms, or add new data (e.g., as part of a time series), etc. In addition, datasets might be bundled into collections, distributed in different encodings or mirrored onto different platforms. All these differences between versions of datasets need to be understood by researchers who want to cite the exact version of the dataset that was used to underpin their research. Failing to do so reduces the reproducibility of research results. Ambiguous identification of datasets also impacts researchers and data centres who are unable to gain recognition and credit for their contributions to the collection, creation, curation and publication of individual datasets. Although the means to identify datasets using persistent identifiers have been in place for more than a decade, systematic data versioning practices are currently not available. In this work, we analysed 39 use cases and current practices of data versioning across 33 organisations. We noticed that the term ‘version’ was used in a very general sense, extending beyond the more common understanding of ‘version’ to refer primarily to revisions and replacements. Using concepts developed in software versioning and the Functional Requirements for Bibliographic Records (FRBR) as a conceptual framework, we developed six foundational principles for versioning of datasets: Revision, Release, Granularity, Manifestation, Provenance and Citation. These six principles provide a high-level framework for guiding the consistent practice of data versioning and can also serve as guidance for data centres or data providers when setting up their own data revision and version protocols and procedures.Peer reviewe

    23 Things Physical Samples

    Get PDF
    Physical samples are a basic element for reference, study, and experimentation in research. The 23 Things for Physical Samples aims to provide a reference overview of resources centered on the management and sharing of data on material samples. The output focuses on existing work, recent developments, recommended practices, and community initiatives. The 23 resources are related to the following categories: 1) a general introduction, 2) persistent identifiers, 3) metadata, 4) citing samples, 5) data licensing and ownership, 6) tools, 7) repositories and 8) communities of practice

    Integrating data and analysis technologies within leading environmental research infrastructures: Challenges and approaches

    Get PDF
    When researchers analyze data, it typically requires significant effort in data preparation to make the data analysis ready. This often involves cleaning, pre-processing, harmonizing, or integrating data from one or multiple sources and placing them into a computational environment in a form suitable for analysis. Research infrastructures and their data repositories host data and make them available to researchers, but rarely offer a computational environment for data analysis. Published data are often persistently identified, but such identifiers resolve onto landing pages that must be (manually) navigated to identify how data are accessed. This navigation is typically challenging or impossible for machines. This paper surveys existing approaches for improving environmental data access to facilitate more rapid data analyses in computational environments, and thus contribute to a more seamless integration of data and analysis. By analysing current state-of-the-art approaches and solutions being implemented by world‑leading environmental research infrastructures, we highlight the existing practices to interface data repositories with computational environments and the challenges moving forward. We found that while the level of standardization has improved during recent years, it still is challenging for machines to discover and access data based on persistent identifiers. This is problematic in regard to the emerging requirements for FAIR (Findable, Accessible, Interoperable, and Reusable) data, in general, and problematic for seamless integration of data and analysis, in particular. There are a number of promising approaches that would improve the state-of-the-art. A key approach presented here involves software libraries that streamline reading data and metadata into computational environments. We describe this approach in detail for two research infrastructures. We argue that the development and maintenance of specialized libraries for each RI and a range of programming languages used in data analysis does not scale well. Based on this observation, we propose a set of established standards and web practices that, if implemented by environmental research infrastructures, will enable the development of RI and programming language independent software libraries with much reduced effort required for library implementation and maintenance as well as considerably lower learning requirements on users. To catalyse such advancement, we propose a roadmap and key action points for technology harmonization among RIs that we argue will build the foundation for efficient and effective integration of data and analysis.This work was supported by the European Union’s Horizon 2020 research and innovation program under grant agreements No. 824068 (ENVRI-FAIR project) and No. 831558 (FAIR- sFAIR project). NEON is a project sponsored by the National Science Foundation (NSF) and managed under cooperative support agreement (EF-1029808) to Battell

    Call to action for global access to and harmonization of quality information of individual earth science datasets

    Get PDF
    Knowledge about the quality of data and metadata is important to support informed decisions on the (re)use of individual datasets and is an essential part of the ecosystem that supports open science. Quality assessments reflect the reliability and usability of data. They need to be consistently curated, fully traceable, and adequately documented, as these are crucial for sound decision-and policy-making efforts that rely on data. Quality assessments also need to be consistently represented and readily integrated across systems and tools to allow for improved sharing of information on quality at the dataset level for individual quality attribute or dimension. Although the need for assessing the quality of data and associated information is well recognized, methodologies for an evaluation framework and presentation of resultant quality information to end users may not have been comprehensively addressed within and across disciplines. Global interdisciplinary domain experts have come together to systematically explore needs, challenges and impacts of consistently curating and representing quality information through the entire lifecycle of a dataset. This paper describes the findings of that effort, argues the importance of sharing dataset quality information, calls for community action to develop practical guidelines, and outlines community recommendations for developing such guidelines. Practical guidelines will allow for global access to and harmonization of quality information at the level of individual Earth science datasets, which in turn will support open science

    Ocean FAIR Data Services

    Get PDF
    Well-founded data management systems are of vital importance for ocean observing systems as they ensure that essential data are not only collected but also retained and made accessible for analysis and application by current and future users. Effective data management requires collaboration across activities including observations, metadata and data assembly, quality assurance and control (QA/QC), and data publication that enables local and interoperable discovery and access and secures archiving that guarantees long-term preservation. To achieve this, data should be findable, accessible, interoperable, and reusable (FAIR). Here, we outline how these principles apply to ocean data and illustrate them with a few examples. In recent decades, ocean data managers, in close collaboration with international organizations, have played an active role in the improvement of environmental data standardization, accessibility, and interoperability through different projects, enhancing access to observation data at all stages of the data life cycle and fostering the development of integrated services targeted to research, regulatory, and operational users. As ocean observing systems evolve and an increasing number of autonomous platforms and sensors are deployed, the volume and variety of data increase dramatically. For instance, there are more than 70 data catalogs that contain metadata records for the polar oceans, a situation that makes comprehensive data discovery beyond the capacity of most researchers. To better serve research, operational, and commercial users, more efficient turnaround of quality data in known formats and made available through Web services is necessary. In particular, automation of data workflows will be critical to reduce friction throughout the data value chain. Adhering to the FAIR principles with free, timely, and unrestricted access to ocean observation data is beneficial for the originators, has obvious benefits for users, and is an essential foundation for the development of new services made possible with big data technologies
    corecore