354,181 research outputs found

    Introducing fuzzy trust for managing belief conflict over semantic web data

    Get PDF
    Interpreting Semantic Web Data by different human experts can end up in scenarios, where each expert comes up with different and conflicting ideas what a concept can mean and how they relate to other concepts. Software agents that operate on the Semantic Web have to deal with similar scenarios where the interpretation of Semantic Web data that describes the heterogeneous sources becomes contradicting. One such application area of the Semantic Web is ontology mapping where different similarities have to be combined into a more reliable and coherent view, which might easily become unreliable if the conflicting beliefs in similarities are not managed effectively between the different agents. In this paper we propose a solution for managing this conflict by introducing trust between the mapping agents based on the fuzzy voting model

    iFuice - Information Fusion utilizing Instance Correspondences and Peer Mappings

    Get PDF
    We present a new approach to information fusion of web data sources. It is based on peer-to-peer mappings between sources and utilizes correspondences between their instances. Such correspondences are already available between many sources, e.g. in the form of web links, and help combine the information about specific objects and support a high quality data fusion. Sources and mappings relate to a domain model to support a semantically focused information fusion. The iFuice architecture incorporates a mapping mediator offering both an interactive and a script-driven, workflow-like access to the sources and their mappings. The script programmer can use powerful generic operators to execute and manipulate mappings and their results. The paper motivates the new approach and outlines the architecture and its main components, in particular the domain model, source and mapping model, and the script operators and their usage

    Open legacy soil survey data in Brazil: geospatial data quality and how to improve it

    Get PDF
    Spatial soil data applications require sound geospatial data including coordinates and a coordinate reference system. However, when it comes to legacy soil data we frequently find them to be missing or incorrect. This paper assesses the quality of the geospatial data of legacy soil observations in Brazil, and evaluates geospatial data sources (survey reports, maps, spatial data infrastructures, web mapping services) and expert knowledge as a means to fix inconsistencies. The analyses included several consistency checks performed on 6,195 observations from the Brazilian Soil Information System. The positional accuracy of geospatial data sources was estimated so as to obtain an indication of the quality for fixing inconsistencies. The coordinates of 20 soil observations, estimated using the web mapping service, were validated with the true coordinates measured in the field. Overall, inconsistencies of different types and magnitudes were found in half of the observations, causing mild to severe misplacements. The involuntary substitution of symbols and numeric characters with similar appearance when recording geospatial data was the most common typing mistake. Among the geospatial data sources, the web mapping service was the most useful, due to operational advantages and lower positional error (~6 m). However, the quality of the description of the observation location controls the accuracy of estimated coordinates. Thus, the error of coordinates estimated using the web mapping service ranged between 30 and 1000 m. This is equivalent to coordinates measured from arc-seconds to arc-minutes, respectively. Under this scenario, the feedback from soil survey experts is crucial to improving the quality of geospatial data

    Open legacy soil survey data in Brazil: geospatial data quality and how to improve it.

    Get PDF
    Spatial soil data applications require sound geospatial data including coordinates and a coordinate reference system. However, when it comes to legacy soil data we frequently find them to be missing or incorrect. This paper assesses the quality of the geospatial data of legacy soil observations in Brazil, and evaluates geospatial data sources (survey reports, maps, spatial data infrastructures, web mapping services) and expert knowledge as a means to fix inconsistencies. The analyses included several consistency checks performed on 6,195 observations from the Brazilian Soil Information System. The positional accuracy of geospatial data sources was estimated so as to obtain an indication of the quality for fixing inconsistencies. The coordinates of 20 soil observations, estimated using the web mapping service, were validated with the true coordinates measured in the field. Overall, inconsistencies of different types and magnitudes were found in half of the observations, causing mild to severe misplacements. The involuntary substitution of symbols and numeric characters with similar appearance when recording geospatial data was the most common typing mistake. Among the geospatial data sources, the web mapping service was the most useful, due to operational advantages and lower positional error (~6 m). However, the quality of the description of the observation location controls the accuracy of estimated coordinates. Thus, the error of coordinates estimated using the web mapping service ranged between 30 and 1000 m. This is equivalent to coordinates measured from arc-seconds to arc-minutes, respectively. Under this scenario, the feedback from soil survey experts is crucial to improving the quality of geospatial data

    Efficient Feedback Collection for Pay-as-you-go Source Selection

    Get PDF
    Article No. 1International audienceTechnical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP

    Towards Semi-automatic Generation of R2R Mappings

    Get PDF
    Translating data from linked data sources to the vocabulary that is expected by a linked data application requires a large number of mappings and can require a lot of structural transformations as well as complex property value transformations. The R2R mapping language is a language based on SPARQL for publishing expressive mappings on the web. However, the specification of R2R mappings is not an easy task. This paper therefore proposes the use of mapping patterns to semi-automatically generate R2R mappings between RDF vocabularies. In this paper, we first specify a mapping language with a high level of abstraction to transform data from a source ontology to a target ontology vocabulary. Second, we introduce the proposed mapping patterns. Finally, we present a method to semi-automatically generate R2R mappings using the mapping pattern

    Towards Semi-automatic Generation of R2R Mappings

    Get PDF
    Translating data from linked data sources to the vocabulary that is expected by a linked data application requires a large number of mappings and can require a lot of structural transformations as well as complex property value transformations. The R2R mapping language is a language based on SPARQL for publishing expressive mappings on the web. However, the specification of R2R mappings is not an easy task. This paper therefore proposes the use of mapping patterns to semi-automatically generate R2R mappings between RDF vocabularies. In this paper, we first specify a mapping language with a high level of abstraction to transform data from a source ontology to a target ontology vocabulary. Second, we introduce the proposed mapping patterns. Finally, we present a method to semi-automatically generate R2R mappings using the mapping patterns

    Taming Web Sources with Minute-Made Wrappers

    Get PDF
    The Web has become a major conduit to information repositories of all kinds. Today, more than 80% of information published on the Web is generated by underlying databases and this proportion keeps increasing. In some cases, database access is only granted through a Web gateway using forms as a query language and HTML as a display vehicle. In order to permit inter-operation (between Web sources and legacy databases or among Web sources themselves) there is a strong need for Web wrappers. Web wrappers share some of the characteristics of standard database wrappers but usually the underlying data sources offer very limited query capabilities and the struc- ture of the result (due to HTML shortcomings) might be loose and unstable. To overcome these problems, we divide the architecture of our Web wrappers into three components: (1) fetching the document, (2) extracting the information from its HTML formatting, and (3) mapping the information into a structure that can be used by applications (such as mediators)

    Exposing WikiPathways as Linked Open Data

    Get PDF
    Biology has become a data intensive science. Discovery of new biological facts increasingly relies on the ability to find and match appropriate biological data. For instance for functional annotation of genes of interest or for identification of pathways affected by over-expressed genes. Functional and pathway information about genes and proteins is typically distributed over a variety of databases and the literature.

Pathways are a convenient, easy to interpret way to describe known biological interactions. WikiPathways provides community curated pathways. WikiPathways users integrate their knowledge with facts from the literature and biological databases. The curated pathway is then reviewed and possibly corrected or enriched. Different tools (e.g. Pathvisio and Cytoscape) support the integration of WikiPathways-knowledge for additional tasks, such as the integration with personal data sets. 

Data from WikiPathways is increasingly also used for advanced analysis where it is integrated or compared with other data, Currently, integration with data from different biological sources is mostly done manually. This can be a very time consuming task because the curator often first needs to find the available resources, needs to learn about their specific content and qualities and often spends a lot of time to technically combine the two. 

Semantic web and Linked Data technologies eliminate the barriers between database silos by relying on a set of standards and best practices for representing and describing data. The architecture of the semantic web relies on the architecture of the web itself for integrating and mapping universal resource identifiers (URI), coupled with basic inference mechanisms to enable matching concepts and properties across data sources. Semantic Web and Linked Data technologies are increasingly being successfully applied as integration engines for linking biological elements. 

Exposing WikiPathways content as Linked Open Data to the Semantic Web, enables rapid, semi-automated integration with a the growing amount of biological resources available from the linked open data cloud, it also allows really fast queries of WikiPathways itself. 

We have harmonised WikiPathways content according to a selected set of vocabularies (Biopax, CHEMBL, etc), common to resources already available as Linked Open Data. 
WikiPathways content is now available as Linked Open Data for dynamic querying through a SPARQL endpoint: http://semantics.bigcat.unimaas.nl:8000/sparql

    Piazza: Data Management Infrastructure for Semantic Web Applications

    Get PDF
    The Semantic Web envisions a World Wide Web in which data is described with rich semantics and applications can pose complex queries. To this point, researchers have defined new languages for specifying meanings for concepts and developed techniques for reasoning about them, using RDF as the data model. To flourish, the Semantic Web needs to be able to accommodate the huge amounts of existing data and the applications operating on them. To achieve this, we are faced with two problems. First, most of the world\u27s data is available not in RDF but in XML; XML and the applications consuming it rely not only on the domain structure of the data, but also on its document structure. Hence, to provide interoperability between such sources, we must map between both their domain structures and their document structures. Second, data management practitioners often prefer to exchange data through local point-to-point data translations, rather than mapping to common mediated schemas or ontologies. This paper describes the Piazza system, which addresses these challenges. Piazza offers a language for mediating between data sources on the Semantic Web, which maps both the domain structure and document structure. Piazza also enables interoperation of XML data with RDF data that is accompanied by rich OWL ontologies. Mappings in Piazza are provided at a local scale between small sets of nodes, and our query answering algorithm is able to chain sets mappings together to obtain relevant data from across the Piazza network. We also describe an implemented scenario in Piazza and the lessons we learned from it
    • …
    corecore