15 research outputs found
Recommended from our members
SWAN: A Distributed Knowledge Infrastructure for Alzheimer Disease Research
SWAN â a Semantic Web Application in Neuromedicine â is a project to develop an effective, integrated scientiïŹc knowledge infrastructure for the Alzheimer disease (AD) research community, using the energy and self-organization of that community, enabled by Semantic Web technology. This infrastructure may later be deployed for research communities in other neuromedical disorders. SWAN incorporates the full biomedical research knowledge lifecycle in its ontological model, including support for personal data organization, hypothesis generation, experimentation, laboratory data organization, and digital pre-publication collaboration. Community, laboratory, and personal digital resources may all be organized and interconnected using SWANâs common semantic framework
Clustered TDB: A Clustered Triple Store for Jena
This paper describes the design of Clustered TDB, a clustered triple store designed to store and query very large quantities of Resource Description Framework (RDF) data. It presents an evaluation of an initial prototype, showing that Clustered TDB offers excellent scaling characteristics with respect to load times and query throughput. Design decisions are justified in the context of a literature review on Database Management System (DBMS) and RDF store clustering, and it is shown that many techniques created during the course of DBMS research are applicable to the problem of storing RDF data
Ingestion Pipeline for RDF
ingestion pipeline, validation of RDF, inferencing, large RDF datasets In this report we present the design and implementation of an ingestion pipeline for RDF Datasets. Our definition of ingestion subsumes: validation and inferencing. The design proposed performs these tasks without loading the data in-memory. There are several reasoners and Lint like validators available for RDF, but they require the data to be present in-memory. This makes them infeasible to be used for large data-sets(~10 Million triples). Our approach enables us to process large data-sets. The pipeline validates data-specific information constraints by making certain closed world assumptions and provides elementary inferencing support. We illustrate the system by processing large data sets (~10 Million triples) from the Lehigh University BenchMark. We highlight the errors the system is capable of handling by writing our own ontology for an educational institute and data with errors in it
Information Infrastructure Laboratory
This paper reports on some initial work on a NetAPI for accessing and updating RDF data over the web. The NetAPI includes actions for conditional extraction or update of RDF data, actions for model upload and download and also the ability to enquire about the capabilities of a hosting server. An initial experimental system is described which partially implements these ideas within the Jena RDF toolkit