1,104 research outputs found

    Towards Exascale Scientific Metadata Management

    Full text link
    Advances in technology and computing hardware are enabling scientists from all areas of science to produce massive amounts of data using large-scale simulations or observational facilities. In this era of data deluge, effective coordination between the data production and the analysis phases hinges on the availability of metadata that describe the scientific datasets. Existing workflow engines have been capturing a limited form of metadata to provide provenance information about the identity and lineage of the data. However, much of the data produced by simulations, experiments, and analyses still need to be annotated manually in an ad hoc manner by domain scientists. Systematic and transparent acquisition of rich metadata becomes a crucial prerequisite to sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and domain-agnostic metadata management infrastructure that can meet the demands of extreme-scale science is notable by its absence. To address this gap in scientific data management research and practice, we present our vision for an integrated approach that (1) automatically captures and manipulates information-rich metadata while the data is being produced or analyzed and (2) stores metadata within each dataset to permeate metadata-oblivious processes and to query metadata through established and standardized data access interfaces. We motivate the need for the proposed integrated approach using applications from plasma physics, climate modeling and neuroscience, and then discuss research challenges and possible solutions

    The lifecycle of provenance metadata and its associated challenges and opportunities

    Full text link
    This chapter outlines some of the challenges and opportunities associated with adopting provenance principles and standards in a variety of disciplines, including data publication and reuse, and information sciences

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    Distributed storage and queryng techniques for a semantic web of scientific workflow provenance

    Get PDF
    In scientific workflow environments, scientists depend on provenance, which records the history of an experiment. Resource Description Framework is frequently used to represent provenance based on vocabularies such as the Open Provenance Model. For complex scientific workflows that generate large amounts of RDF triples, single-machine provenance management becomes inadequate over time. In this thesis, we research how HBase capabilities can be leveraged for distributed storage and querying of provenance data represented in RDF. We architect the ProvBase system that incorporates an HBase/Hadoop backend, propose a storage schema to hold provenance triples, and design querying algorithms to evaluate SPARQL queries in the system. We conduct an experimental study to show the feasibility of our approach

    Estimating Fire Weather Indices via Semantic Reasoning over Wireless Sensor Network Data Streams

    Full text link
    Wildfires are frequent, devastating events in Australia that regularly cause significant loss of life and widespread property damage. Fire weather indices are a widely-adopted method for measuring fire danger and they play a significant role in issuing bushfire warnings and in anticipating demand for bushfire management resources. Existing systems that calculate fire weather indices are limited due to low spatial and temporal resolution. Localized wireless sensor networks, on the other hand, gather continuous sensor data measuring variables such as air temperature, relative humidity, rainfall and wind speed at high resolutions. However, using wireless sensor networks to estimate fire weather indices is a challenge due to data quality issues, lack of standard data formats and lack of agreement on thresholds and methods for calculating fire weather indices. Within the scope of this paper, we propose a standardized approach to calculating Fire Weather Indices (a.k.a. fire danger ratings) and overcome a number of the challenges by applying Semantic Web Technologies to the processing of data streams from a wireless sensor network deployed in the Springbrook region of South East Queensland. This paper describes the underlying ontologies, the semantic reasoning and the Semantic Fire Weather Index (SFWI) system that we have developed to enable domain experts to specify and adapt rules for calculating Fire Weather Indices. We also describe the Web-based mapping interface that we have developed, that enables users to improve their understanding of how fire weather indices vary over time within a particular region.Finally, we discuss our evaluation results that indicate that the proposed system outperforms state-of-the-art techniques in terms of accuracy, precision and query performance.Comment: 20pages, 12 figure

    Smart Environmental Data Infrastructures: Bridging the Gap between Earth Sciences and Citizens

    Get PDF
    The monitoring and forecasting of environmental conditions is a task to which much effort and resources are devoted by the scientific community and relevant authorities. Representative examples arise in meteorology, oceanography, and environmental engineering. As a consequence, high volumes of data are generated, which include data generated by earth observation systems and different kinds of models. Specific data models, formats, vocabularies and data access infrastructures have been developed and are currently being used by the scientific community. Due to this, discovering, accessing and analyzing environmental datasets requires very specific skills, which is an important barrier for their reuse in many other application domains. This paper reviews earth science data representation and access standards and technologies, and identifies the main challenges to overcome in order to enable their integration in semantic open data infrastructures. This would allow non-scientific information technology practitioners to devise new end-user solutions for citizen problems in new application domainsThis research was co-funded by (i) the TRAFAIR project (2017-EU-IA-0167), co-financed by the Connecting Europe Facility of the European Union, (ii) the RADAR-ON-RAIA project (0461_RADAR_ON_RAIA_1_E) co-financed by the European Regional Development Fund (ERDF) through the Iterreg V-A Spain-Portugal program (POCTEP) 2014-2020, and (iii) the Consellería de Educación, Universidade e Formación Profesional of the regional government of Galicia (Spain), through the support for research groups with growth potential (ED431B 2018/28)S

    Unlocking the potential of public sector information with Semantic Web technology

    Get PDF
    Governments often hold very rich data and whilst much of this information is published and available for re-use by others, it is often trapped by poor data structures, locked up in legacy data formats or in fragmented databases. One of the great benefits that Semantic Web (SW) technology offers is facilitating the large scale integration and sharing of distributed data sources. At the heart of information policy in the UK, the Office of Public Sector Information (OPSI) is the part of the UK government charged with enabling the greater re-use of public sector information. This paper describes the actions, findings, and lessons learnt from a pilot study, involving several parts of government and the public sector. The aim was to show to government how they can adopt SW technology for the dissemination, sharing and use of its data

    Provenance-aware knowledge representation: A survey of data models and contextualized knowledge graphs

    Get PDF
    Expressing machine-interpretable statements in the form of subject-predicate-object triples is a well-established practice for capturing semantics of structured data. However, the standard used for representing these triples, RDF, inherently lacks the mechanism to attach provenance data, which would be crucial to make automatically generated and/or processed data authoritative. This paper is a critical review of data models, annotation frameworks, knowledge organization systems, serialization syntaxes, and algebras that enable provenance-aware RDF statements. The various approaches are assessed in terms of standard compliance, formal semantics, tuple type, vocabulary term usage, blank nodes, provenance granularity, and scalability. This can be used to advance existing solutions and help implementers to select the most suitable approach (or a combination of approaches) for their applications. Moreover, the analysis of the mechanisms and their limitations highlighted in this paper can serve as the basis for novel approaches in RDF-powered applications with increasing provenance needs
    corecore