331 research outputs found

    Provenance in Data Interoperability for Multi-Sensor Intercomparison

    Get PDF
    As our inventory of Earth science data sets grows, the ability to compare, merge and fuse multiple datasets grows in importance. This requires a deeper data interoperability than we have now. Efforts such as Open Geospatial Consortium and OPeNDAP (Open-source Project for a Network Data Access Protocol) have broken down format barriers to interoperability; the next challenge is the semantic aspects of the data. Consider the issues when satellite data are merged, cross-calibrated, validated, inter-compared and fused. We must match up data sets that are related, yet different in significant ways: the phenomenon being measured, measurement technique, location in space-time or quality of the measurements. If subtle distinctions between similar measurements are not clear to the user, results can be meaningless or lead to an incorrect interpretation of the data. Most of these distinctions trace to how the data came to be: sensors, processing and quality assessment. For example, monthly averages of satellite-based aerosol measurements often show significant discrepancies, which might be due to differences in spatio- temporal aggregation, sampling issues, sensor biases, algorithm differences or calibration issues. Provenance information must be captured in a semantic framework that allows data inter-use tools to incorporate it and aid in the intervention of comparison or merged products. Semantic web technology allows us to encode our knowledge of measurement characteristics, phenomena measured, space-time representation, and data quality attributes in a well-structured, machine-readable ontology and rulesets. An analysis tool can use this knowledge to show users the provenance-related distrintions between two variables, advising on options for further data processing and analysis. An additional problem for workflows distributed across heterogeneous systems is retrieval and transport of provenance. Provenance may be either embedded within the data payload, or transmitted from server to client in an out-of-band mechanism. The out of band mechanism is more flexible in the richness of provenance information that can be accomodated, but it relies on a persistent framework and can be difficult for legacy clients to use. We are prototyping the embedded model, incorporating provenance within metadata objects in the data payload. Thus, it always remains with the data. The downside is a limit to the size of provenance metadata that we can include, an issue that will eventually need resolution to encompass the richness of provenance information required for daata intercomparison and merging

    Best Practices for Publishing, Retrieving, and Using Spatial Data on the Web

    Get PDF
    Data owners are creating an ever richer set of information resources online, and these are being used for more and more applications. With the rapid growth of connected embedded devices, GPS-enabled mobile devices, and various organizations that publish their location-based data (i.e., weather and traffic services), maps and geographical and spatial information (i.e., GIS and open maps), spatial data on the Web is becoming ubiquitous and voluminous. However, the heterogeneity of the available spatial data, as well as some challenges related to spatial data in particular make it difficult for data users, web applications and services to discover, interpret and use the information in large and distributed web systems. This paper summarizes some of the efforts that have been undertaken in the joint W3C/OGC Working Group on Spatial Data on the Web, in particular the effort to describe the best practices for publishing spatial data on the Web. This paper presents the set of principles that guide the selection of these best practices, describes best practices that are employed to enable publishing, discovery and retrieving (querying) this type of data on the Web, and identifies some areas where a best practice has not yet emerged

    Semantic Array Programming for Environmental Modelling: Application of the Mastrave Library

    Get PDF
    Environmental datasets grow in size and specialization while models designed for local scale are often unsuitable at regional/continental scale. At regional scale, data are usually available as georeferenced collections of spatially distributed despite semantically atomic information. Complex data intrinsically impose modellers to manipulate nontrivial information structures. For example, multi-dimensional arrays of time series may be composed by slices of raster spatial matrices for each time step, whilst heterogeneous collections of uneven arrays are common when dealing with data analogous to precipitation events, and these structures may ask for integration at several spatial scales, projections and temporal extents. Interestingly, it might be far more difficult to practically implement such a complexity rather than conceptually describe it: a subset of modelling generalizations may deal more with abstraction rather than with the explosion of lines of code. Many environmental modelling algorithms are composed by chains of data-transformations or trees of domain specific sub-algorithms. Concisely expressing them without the need for paying attention on the enormous set of spatio-temporal details, is a highly recommendable practice in both mathematical formulation and implementation. The use of semantic array programming paradigm is here exemplified as a powerful conceptual and practical (with the free software library Mastrave) tool for easing scalability and semantic integration in environmental modelling. Array programming, AP, is widely used for its computational effectiveness but often underexploited in reducing the gap between mathematical notation and algorithm implementations, i.e. by promoting arrays (vectors, matrices, tensors) as atomic quantities with extremely compact manipulating operators. Coherent array-based mathematical description of models can simplify complex algorithm prototyping while moving mathematical reasoning directly into the source code – because of its substantial size reduction – where the mathematical description is actually expressed in a completely formalized and reproducible way. The proposed paradigm suggests to complement the characteristic AP weak typing with semantics, both by composing generalized modular sub-models and via array oriented – thus concise – constraints. The Mastrave library use is exemplified with a regional scale benchmark application to local-average invariant (LAI) downscaling of climate raster data. Unnecessary errors frequently introduced by non-LAI upsampling are shown to be easily detected and removed when the scientific modelling practice is terse enough to let mathematical reasoning and model coding merge together.JRC.H.3-Forest Resources and Climat
    • …
    corecore