15 research outputs found

    Clustered TDB: A Clustered Triple Store for Jena

    No full text
    This paper describes the design of Clustered TDB, a clustered triple store designed to store and query very large quantities of Resource Description Framework (RDF) data. It presents an evaluation of an initial prototype, showing that Clustered TDB offers excellent scaling characteristics with respect to load times and query throughput. Design decisions are justified in the context of a literature review on Database Management System (DBMS) and RDF store clustering, and it is shown that many techniques created during the course of DBMS research are applicable to the problem of storing RDF data

    Ingestion Pipeline for RDF

    No full text
    ingestion pipeline, validation of RDF, inferencing, large RDF datasets In this report we present the design and implementation of an ingestion pipeline for RDF Datasets. Our definition of ingestion subsumes: validation and inferencing. The design proposed performs these tasks without loading the data in-memory. There are several reasoners and Lint like validators available for RDF, but they require the data to be present in-memory. This makes them infeasible to be used for large data-sets(~10 Million triples). Our approach enables us to process large data-sets. The pipeline validates data-specific information constraints by making certain closed world assumptions and provides elementary inferencing support. We illustrate the system by processing large data sets (~10 Million triples) from the Lehigh University BenchMark. We highlight the errors the system is capable of handling by writing our own ontology for an educational institute and data with errors in it

    Information Infrastructure Laboratory

    No full text
    This paper reports on some initial work on a NetAPI for accessing and updating RDF data over the web. The NetAPI includes actions for conditional extraction or update of RDF data, actions for model upload and download and also the ability to enquire about the capabilities of a hosting server. An initial experimental system is described which partially implements these ideas within the Jena RDF toolkit
    corecore