23 research outputs found
Recommended from our members
Storing and manipulating environmental big data with JASMIN
JASMIN is a super-data-cluster designed to provide
a high-performance high-volume data analysis environment for
the UK environmental science community. Thus far JASMIN
has been used primarily by the atmospheric science and earth
observation communities, both to support their direct scientific workflow, and the curation of data products in the STFC Centre for Environmental Data Archival (CEDA). Initial JASMIN configuration and first experiences are reported here. Useful improvements in scientific workflow are presented. It is clear from the explosive growth in stored data and use that there was a pent up demand for a suitable big-data analysis environment.
This demand is not yet satisfied, in part because JASMIN does not yet have enough compute, the storage is fully allocated, and not all software needs are met. Plans to address these constraints are introduced
Recommended from our members
Developing an open data portal for the ESA climate change initiative
We introduce the rationale for, and architecture of, the European Space Agency Climate Change Initiative (CCI) Open Data Portal (http://cci.esa.int/data/). The Open Data Portal hosts a set of richly diverse datasets â 13 âEssential Climate Variablesâ â from the CCI programme in a consistent and harmonised form and to provides a single point of access for the (>100 TB) data for broad dissemination to an international user community. These data have been produced by a range of different institutions and vary across both scientific and spatio-temporal characteristics. This heterogeneity of the data together with the range of services to be supported presented significant technical challenges.
An iterative development methodology was key to tackling these challenges: the system developed exploits a workflow which takes data that conforms to the CCI data specification, ingests it into a managed archive and uses both manual and automatically generated metadata to support data discovery, browse, and delivery services. It utilises both Earth System Grid Federation (ESGF) data nodes and the Open Geospatial Consortium Catalogue Service for the Web (OGC-CSW) interface, serving data into both the ESGF and the Global Earth Observation System of Systems (GEOSS). A key part of the system is a new vocabulary server, populated with CCI specific terms and relationships which integrates OGC-CSW and ESGF search services together, developed as part of a dialogue between domain scientists and linked data specialists. These services have enabled the development of a unified user interface for graphical search and visualisation â the CCI Open Data Portal Web Presence
A weighting method to improve habitat association analysis: tested on British carabids
Analysis of speciesâ habitat associations is important for biodiversity conservation and spatial ecology. The original phi coefficient of association is a simple method that gives both positive and negative associations of individual species with habitats. The method originates in assessing the association of plant species with habitats, sampled by quadrats. Using this method for mobile animals creates problems as records often have imprecise locations, and would require either using only records related to a single habitat or arbitrarily choosing a single habitat to assign.
We propose and test a new weighted version of the index that retains more records, which improves association estimates and allows assessment of more species. It weights habitats that lie within the area covered by the species record with their certainty level, in our case study, the proportion of the grid cell covered by that habitat.
We used carabid beetle data from the National Biodiversity Network atlas and CEH Land Cover Map 2015 across Great Britain to compare the original method with the weighted version. We used presenceâonly data, assigning species absences using a threshold based on the number of other species found at a location, and conducted a sensitivity analysis of this threshold. Qualitative descriptions of habitat associations were used as independent validation data.
The weighted index allowed the analysis of 52 additional species (19% more) and gave results with as few as 50 records. For the species we could analyse using both indices, the weighted index explained 70% of the qualitative validation data compared to 68% for the original, indicating no accuracy loss.
The weighted phi coefficient of association provides an improved method for habitat analysis giving information on preferred and avoided habitats for mobile species that have limited records, and can be used in modelling and analysis that directs conservation policy and practice
Twenty Years of Data Management in the British Atmospheric Data Centre
The British Atmospheric Data Centre (BADC) has existed in its present form for 20 years, having been formally created in 1994. It evolved from the GDF (Geophysical Data Facility), a SERC (Science and Engineering Research Council) facility, as a result of research council reform where NERC (Natural Environment Research Council) extended its remit to cover atmospheric data below 10km altitude. With that change the BADC took on data from many other atmospheric sources and started interacting with NERC research programmes. The BADC has now hit early adulthood. Prompted by this milestone, we examine in this paper whether the data centre is creaking at the seams or is looking forward to the prime of its life, gliding effortlessly into the future. Which parts of it are bullet proof and which parts are held together with double-sided sticky tape? Can we expect to see it in its present form in another twenty yearsâ time? To answer these questions, we examine the interfaces, technology, processes and organisation used in the provision of data centre services by looking at three snapshots in time, 1994, 2004 and 2014, using metrics and reports from the time to compare and contrasts the services using BADC. The repository landscape has changed massively over this period and has moved the focus for technology and development as the broader community followed emerging trends, standards and ways of working. The incorporation of these new ideas has been both a blessing and a curse, providing the data centre staff with plenty of challenges and opportunities. We also discuss key data centre functions including: data discovery, data access, ingestion, data management planning, preservation plans, agreements/licences and data policy, storage and server technology, organisation and funding, and user management. We conclude that the data centre will probably still exist in some form in 2024 and that it will most likely still be reliant on a file system. However, the technology delivering this service will change and the host organisation and funding routes may vary
Organising a collaborative online hackathon for cutting-edge climate research
The 2021 Met Office Climate Data Challenge hackathon series provided a valuable opportunity to learn best practice from the experience of running online hackathons uniquely characterised by the challenges faced by climate data science in the wake of the COVID-19 pandemic. In particular, the University of Bristol CMIP6 Data Hackathon with over 100 participants from the United Kingdom highlights the advantages of participating in such events as well as lessons learned. A suggested methodology to structure, plan, promote and ensure longevity of the hackathon outputs is described ensuring smoother running of future events
Cloud Computing for Climate Modelling: Evaluation, Challenges and Benefits
Cloud computing is a mature technology that has already shown benefits for a wide range of academic research domains that, in turn, utilize a wide range of application design models. In this paper, we discuss the use of cloud computing as a tool to improve the range of resources available for climate science, presenting the evaluation of two different climate models. Each was customized in a different way to run in public cloud computing environments (hereafter cloud computing) provided by three different public vendors: Amazon, Google and Microsoft. The adaptations and procedures necessary to run the models in these environments are described. The computational performance and cost of each model within this new type of environment are discussed, and an assessment is given in qualitative terms. Finally, we discuss how cloud computing can be used for geoscientific modelling, including issues related to the allocation of resources by funding bodies. We also discuss problems related to computing security, reliability and scientific reproducibilityS