318 research outputs found

    Co-evolution of RDF Datasets

    Get PDF
    Linking Data initiatives have fostered the publication of large number of RDF datasets in the Linked Open Data (LOD) cloud, as well as the development of query processing infrastructures to access these data in a federated fashion. However, different experimental studies have shown that availability of LOD datasets cannot be always ensured, being RDF data replication required for envisioning reliable federated query frameworks. Albeit enhancing data availability, RDF data replication requires synchronization and conflict resolution when replicas and source datasets are allowed to change data over time, i.e., co-evolution management needs to be provided to ensure consistency. In this paper, we tackle the problem of RDF data co-evolution and devise an approach for conflict resolution during co-evolution of RDF datasets. Our proposed approach is property-oriented and allows for exploiting semantics about RDF properties during co-evolution management. The quality of our approach is empirically evaluated in different scenarios on the DBpedia-live dataset. Experimental results suggest that proposed proposed techniques have a positive impact on the quality of data in source datasets and replicas.Comment: 18 pages, 4 figures, Accepted in ICWE, 201

    Collaborative Open Data versioning: a pragmatic approach using Linked Data

    Get PDF
    Most Open Government Data initiatives are centralised and unidirectional (i.e., they release data dumps in CSV or PDF format). Hence for non trivial applications reusers make copies of the government datasets to curate their local data copy. This situation is not optimal as it leads to duplication of efforts and reduces the possibility of sharing improvements. To improve the usefulness of publishing open data, several authors recommeded to use standard formats and data versioning. Here we focus on publishing versioned open linked data (i.e., in RDF format) because they allow one party to annotate data released independently by another party thus reducing the need to duplicate entire datasets. After describing a pipeline to open up legacy-databases data in RDF format, we argue that RDF is suitable to implement a scalable feedback channel, and we investigate what steps are needed to implement a distributed RDFversioning system in production

    Interest-based RDF Update Propagation

    Full text link
    Many LOD datasets, such as DBpedia and LinkedGeoData, are voluminous and process large amounts of requests from diverse applications. Many data products and services rely on full or partial local LOD replications to ensure faster querying and processing. While such replicas enhance the flexibility of information sharing and integration infrastructures, they also introduce data duplication with all the associated undesirable consequences. Given the evolving nature of the original and authoritative datasets, to ensure consistent and up-to-date replicas frequent replacements are required at a great cost. In this paper, we introduce an approach for interest-based RDF update propagation, which propagates only interesting parts of updates from the source to the target dataset. Effectively, this enables remote applications to `subscribe' to relevant datasets and consistently reflect the necessary changes locally without the need to frequently replace the entire dataset (or a relevant subset). Our approach is based on a formal definition for graph-pattern-based interest expressions that is used to filter interesting parts of updates from the source. We implement the approach in the iRap framework and perform a comprehensive evaluation based on DBpedia Live updates, to confirm the validity and value of our approach.Comment: 16 pages, Keywords: Change Propagation, Dataset Dynamics, Linked Data, Replicatio

    Methodology for Conflict Detection and Resolution in Semantic Revision Control Systems

    Get PDF
    Revision control mechanisms are a crucial part of information systems to keep track of changes. It is one of the key requirements for industrial application of technologies like Linked Data which provides the possibility to integrate data from different systems and domains in a semantic information space. A corresponding semantic revision control system must have the same functionality as established systems (e.g. Git or Subversion). There is also a need for branching to enable parallel work on the same data or concurrent access to it. This directly introduces the requirement of supporting merges. This paper presents an approach which makes it possible to merge branches and to detect inconsistencies before creating the merged revision. We use a structural analysis of triple differences as the smallest comparison unit between the branches. The differences that are detected can be accumulated to high level changes, which is an essential step towards semantic merging. We implemented our approach as a prototypical extension of therevision control system R43ples to show proof of concept

    Software Architecture by Component Selection

    Get PDF

    Col-Graph: Towards Writable and Scalable Linked Open Data

    Get PDF
    International audienceLinked Open Data faces severe issues of scalability, availability and data quality. these issues are observed by data consumers performing federated queries; SPARQL endpoints do not respond and results can be wrong or out-of-date. If a data consumer finds an error, how can she fix it? This raises the issue of the writability of Linked Data. In this paper, we devise aan extension of the federation of Linked Data to data consumers. A data consumer can make partial copies of different datasets and make them available through a SPARQL endpoint. A data consumer can update her local copy and share updates with data providers and consumers. Update sharing improves general data quality, and replicated data creates opportunities for federated query engines to improve availability. However, when updates occur in an uncontrolled way, consistency issues arise. In this paper, we define fragments as SPARQL CONSTRUCT queries and propose a correction criterion to maintain these fragments incrementally without reevaluating the query. We define a coordination free protocol based on the counting of triples derivations and provenance. We analyze the theoretical complexity of the protocol in time, space and traffic. Experimental results suggest the scalability of our approach
    • …
    corecore