2,378 research outputs found

    A Peer-to-Peer Architecture for e-Science

    Get PDF

    Towards interoperability in heterogeneous database systems

    Get PDF
    Distributed heterogeneous databases consist of systems which differ physically and logically, containing different data models and data manipulation languages. Although these databases are independently created and administered they must cooperate and interoperate. Users need to access and manipulate data from several databases and applications may require data from a wide variety of independent databases. Therefore, a new system architecture is required to manipulate and manage distinct and multiple databases, in a transparent way, while preserving their autonomy. This report contains an extensive survey on heterogeneous databases, analysing and comparing the different aspects, concepts and approaches related to the topic. It introduces an architecture to support interoperability among heterogeneous database systems. The architecture avoids the use of a centralised structure to assist in the different phases of the interoperability process. It aims to support scalability, and to assure privacy and nfidentiality of the data. The proposed architecture allows the databases to decide when to participate in the system, what type of data to share and with which other databases, thereby preserving their autonomy. The report also describes an approach to information discovery in the proposed architecture, without using any centralised structure as repositories and dictionaries, and broadcasting to all databases. It attempts to reduce the number of databases searched and to preserve the privacy of the shared data. The main idea is to visit a database that either containsthe requested data or knows about another database that possible contains this data

    Data replication and update propagation in XML P2P data management systems

    Get PDF
    XML P2P data management systems are P2P systems that use XML as the underlying data format shared between peers in the network. These systems aim to bring the benefits of XML and P2P systems to the distributed data management field. However, P2P systems are known for their lack of central control and high degree of autonomy. Peers may leave the network at any time at will, increasing the risk of data loss. Despite this, most research in XML P2P systems focus on novel and efficient XML indexing and retrieval techniques. Mechanisms for ensuring data availability in XML P2P systems has received comparatively little attention. This project attempts to address this issue. We design an XML P2P data management framework to improve data availability. This framework includes mechanisms for wide-spread data replication, replica location and update propagation. It allows XML documents to be broken down into fragments. By doing so, we aim to reduce the cost of replicating data by distributing smaller XML fragments throughout the network rather than entire documents. To tackle the data replication problem, we propose a suite of selection and placement algorithms that may be interchanged to form a particular replication strategy. To support the placement of replicas anywhere in the network, we use a Fragment Location Catalogue, a global index that maintains the locations of replicas. We also propose a lazy update propagation algorithm to propagate updates to replicas. Experiments show that the data replication algorithms improve data availability in our experimental network environment. We also find that breaking XML documents into smaller pieces and replicating those instead of whole XML documents considerably reduces the replication cost, but at the price of some loss in data availability. For the update propagation tests, we find that the probability that queries return up-to-date results increases, but improvements to the algorithm are necessary to handle environments with high update rates

    A network approach for managing and processing big cancer data in clouds

    Get PDF
    Translational cancer research requires integrative analysis of multiple levels of big cancer data to identify and treat cancer. In order to address the issues that data is decentralised, growing and continually being updated, and the content living or archiving on different information sources partially overlaps creating redundancies as well as contradictions and inconsistencies, we develop a data network model and technology for constructing and managing big cancer data. To support our data network approach for data process and analysis, we employ a semantic content network approach and adopt the CELAR cloud platform. The prototype implementation shows that the CELAR cloud can satisfy the on-demanding needs of various data resources for management and process of big cancer data

    Collaborative Open Data versioning: a pragmatic approach using Linked Data

    Get PDF
    Most Open Government Data initiatives are centralised and unidirectional (i.e., they release data dumps in CSV or PDF format). Hence for non trivial applications reusers make copies of the government datasets to curate their local data copy. This situation is not optimal as it leads to duplication of efforts and reduces the possibility of sharing improvements. To improve the usefulness of publishing open data, several authors recommeded to use standard formats and data versioning. Here we focus on publishing versioned open linked data (i.e., in RDF format) because they allow one party to annotate data released independently by another party thus reducing the need to duplicate entire datasets. After describing a pipeline to open up legacy-databases data in RDF format, we argue that RDF is suitable to implement a scalable feedback channel, and we investigate what steps are needed to implement a distributed RDFversioning system in production

    Peer-to-peer semantic integration of linked data

    Get PDF
    We propose a framework for peer-based integration of linked data sets, where the semantic relationships between data at different peers are expressed through mappings. We provide the theoretical foundations for such a setting and we devise an algorithm for processing graph pattern queries, discussing its complexity and scalability

    Schema matching in a peer-to-peer database system

    Get PDF
    Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network

    Peer-based query rewriting in SPARQL for semantic integration of linked data

    Get PDF
    In this proposal we address the problem of ontology-based SPARQL query answering over distributed Linked Data sources, where the ontology is given by conjunctive mappings between the source schemas in a peer-to-peer fashion and by equality constraints between constants. In our setting, the data is not materialised in a single datastore: it is accessed in a distributed environment through SPARQL endpoints. We aim to achieve query answering by generating the perfect rewriting of the original query and then processing the rewritten query over distributed SPARQL endpoints. We identify a subset of ontology constraints that enjoy the first-order rewritability property and we perform preliminary empirical evaluation taking into account such restricted constraints only. For future work, we aim to tackle the query answering problem in the general case
    corecore