14 research outputs found

    Fixity checking a large climate data archive

    Get PDF
    It is important not to rely on internal checking mechanisms in hardware systems as these are fallible like everything else. Corruption is rare on the current JASMIN storage system. Future systems needs to be monitored closely if we are to have confidence in their fixity

    A meteorological data journal - Overlay Journal Infrastructure for Meteorological Sciences (OJIMS)

    Get PDF
    A poster to describe a meteorological data journal. The OJIMS project is investigating the idea of data publication in the atmospheric sciences. It is creating the required mechanics for overlay journals and examine their long term sustainability. The project also set up a document repository for the atmospheric sciences. This would preserve a range of documents relevant to atmospheric science including journal papers, technical reports, and images

    Twenty Years of Data Management in the British Atmospheric Data Centre

    Get PDF
    The British Atmospheric Data Centre (BADC) has existed in its present form for 20 years, having been formally created in 1994. It evolved from the GDF (Geophysical Data Facility), a SERC (Science and Engineering Research Council) facility, as a result of research council reform where NERC (Natural Environment Research Council) extended its remit to cover atmospheric data below 10km altitude. With that change the BADC took on data from many other atmospheric sources and started interacting with NERC research programmes. The BADC has now hit early adulthood. Prompted by this milestone, we examine in this paper whether the data centre is creaking at the seams or is looking forward to the prime of its life, gliding effortlessly into the future. Which parts of it are bullet proof and which parts are held together with double-sided sticky tape? Can we expect to see it in its present form in another twenty years’ time? To answer these questions, we examine the interfaces, technology, processes and organisation used in the provision of data centre services by looking at three snapshots in time, 1994, 2004 and 2014, using metrics and reports from the time to compare and contrasts the services using BADC. The repository landscape has changed massively over this period and has moved the focus for technology and development as the broader community followed emerging trends, standards and ways of working. The incorporation of these new ideas has been both a blessing and a curse, providing the data centre staff with plenty of challenges and opportunities. We also discuss key data centre functions including: data discovery, data access, ingestion, data management planning, preservation plans, agreements/licences and data policy, storage and server technology, organisation and funding, and user management. We conclude that the data centre will probably still exist in some form in 2024 and that it will most likely still be reliant on a file system. However, the technology delivering this service will change and the host organisation and funding routes may vary

    Making data a first class scientific output : data citation and publication by NERC's Environmental Data Centres

    Get PDF
    The NERC Science Information Strategy Data Citation and Publication project aims to develop and formalise a method for formally citing and publishing the datasets stored in its environmental data centres. It is believed that this will act as an incentive for scientists, who often invest a great deal of effort in creating datasets, to submit their data to a suitable data repository where it can properly be archived and curated. Data citation and publication will also provide a mechanism for data producers to receive credit for their work, thereby encouraging them to share their data more freely

    An Information Management Framework for Environmental Digital Twins (IMFe) as a concept and pilot

    Get PDF
    Environmental science is concerned with assessing the impacts of changing environmental conditions upon the state of the natural world. Environmental Digital Twins (EDT) are a new technology that enable environmental change scenarios for real systems to be modelled and their impacts visualised. They will be particularly effective with delivering understanding of these impacts on the natural environment to non-specialist stakeholders. The UK Natural Environment Research Council (NERC) recently published its first digital strategy, which sets out a vision for digitally enabled environmental science for the next decade. This strategy places data and digital technologies at the heart of UK environmental science. EDT have been made possible by the emergence of increasingly large, diverse, static data sources, networks of dynamic environmental data from sensor networks and time-variant process modelling. Once combined with visualisation capabilities these provide the basis of the digital twin technologies to enable the environmental scientists community to make a step-change in understanding of the environment. Components may be developed separately by a network but can be combined to improve understanding provided development follows agreed standards to facilitate data exchange and integration. Replicating the behaviours of environmental systems is inevitably a multi-disciplinary activity. To enable this, an information management framework for Environmental digital twins (IMFe) is needed that establishes the components for effective information management within and across the EDT ecosystem. This must enable secure, resilient interoperability of data, and is a reference point to facilitate data use in line with security, legal, commercial, privacy and other relevant concerns. We present recommendations for developing an IMFe including the application of concepts such as an asset commons and balanced approach to standards to facilitate minimum interoperability requirements between twins while iteratively implementing an IMFe. Achieving this requires components to be developed that follow agreed standards to ensure that information can be trusted by the user, and that they are semantically interoperable so data can be shared. A digital Asset Register will be defined to provide access to and enable linking of such components. This previously conceptual project has now been enhanced into the Pilot IMFe project aiming to define the architectures, technologies, standards and hardware infrastructure to develop a fully functioned environmental digital twin. During the project lifespan this will be tested with by construction of a pilot EDT for the Haig Fras Marine Conservation Zone (MCZ) that both enables testing of the proposed IMFe concepts and will provide a clear demonstration of the power of EDT to monitor and scenario test a complex environmental system for the benefit of stakeholders

    The BADC-CSV Format: Meeting user and metadata requirements

    Get PDF
    The 2007 British Atmospheric Data Centre (BADC) Users Survey examined the skill base of the BADC’s user community. Results indicated a large proportion of users who were familiar with data held in ASCII formats such as comma-separated variables (csv) and there was a high degree of familiarity with spreadsheet programmes (e.g. Excel) for data analysis purposes. These result, combined with the experiences of the BADC staff dealing with user enquiries and assisting data suppliers in preparing data for submission, and the metadata requirements of the BADC, highlighted the need for a new ASCII format to be generated. The BADC-CSV format adheres to metadata conventions covered by the NASA-Ames and netCDF formats, the CF and Dublin Core metadata conventions, the ISO19115 standard and the metadata requirements of the BADC and its sister data centres within the Natural Environment Research Council (NERC). The format meets end user and data supplier requirements by being a native format for spreadsheet software as well as other commonly used data production and analysis tools (e.g. IDL, MatLab). This paper presents the requirements for the format resulting from the 2007 user survey and data centre requirements, describes the structure of the format and demonstrates the format through short examples. Finally ongoing work to further develop the format is discussed

    Linking data and publications in the environmental sciences: CLADDIER project workshop, Chilworth, Southampton, UK 15th May 2007

    No full text
    This CLADDIER (CITATION, LOCATION, And DEPOSITION IN DISCIPLINE & INSTITUTIONAL REPOSITORIES) workshop provided a key opportunity to hear about progress in the project and also, most importantly, to discuss fundamental next steps to further the linking of citations from source data to publications databases. The emphasis is on the environmental sciences. The University of Southampton and STFC institutional repositories are exemplars of institutional repositories in the UK. The British Atmospheric Data Centre (BADC), likewise, is an exemplar of a discipline based data archive. As part of the project a demonstration system linking publications held in the institutional repositories with data holdings in the BADC has been set up to explore the issues involved.Programme 10.00 Welcome and Introduction Jessie Hey, University of Southampton10.15 CLADDIER The CLADDIER vision Bryan Lawrence – Project Director CLADDIER – project fundamentals Sam Pepler – Project Manager11.00 Coffee Break11.30 Publication and citation in practice Data publication issues and methods - Catherine Jones, STFC A data and publication discovery service - Brian Matthews, STFC Other Initiatives Citing Geospatial Data Guy McGarva, EDINA National Data Centre Data and Publications: A view from Chemistry Simon Coles, EPSRC UK National Crystallography Service1.00 Lunch2.00 A view from NERC Mark Thorley, NERC2:15 Breakout groups Writing citations Peer review processes for data Publication and data interaction in the future3.15 Tea Break3.45 Breakout feedback4.15 Summing up – a view from the bridge Bryan Lawrence 4.30 Close of Worksho

    CCMVal Archive at BADC

    Get PDF
    There are many good reasons to keep data and here are three of the best... 1) Re-use 2) Re-purposing 3) Citation. This presentation is all about why you were asked to provide CCMVal data to the BADC in CF complient NetCDF and how this self-describing standard enables the BADC to fulfill it's role as custodian of the CCMVal data archive. RE-USE Providing data in CF complient NetCDF allows your data to be shared with other scientist now and in the future. Right now the CF NetCDF common fomat is facilitating the quick verification of data from different CCMVal models with the use of standard diagnositic tools. Twenty years from now the data will still be understood because it uses the self describing CF NetCDF standard. RE-PURPOSING Using CF standard names to describe the CCMVal variables will enable scientists from other diciplines to make use of the CCMVal data when it is eventually made available to them. The CF standard names facilitate data discovery through vocabulary servers which allow users to find data without needing to know the exact names of variables. Such value added services satisfy the increasing expactation from funders that science data can be used by different research communities. CITATION CCMVal data is an ideal candidate for publication in the new data journals because it used the CF NetCDF standard. Once the CCMVal data has been verified and moved to the archive at the BADC our data scientists can help with the process of publishing the data so that your data can be cited
    corecore