240,534 research outputs found

    Distribution of the Object Oriented Databases. A Viewpoint of the MVDB Model's Methodology and Architecture

    Get PDF
    In databases, much work has been done towards extending models with advanced tools such as view technology, schema evolution support, multiple classification, role modeling and viewpoints. Over the past years, most of the research dealing with the object multiple representation and evolution has proposed to enrich the monolithic vision of the classical object approach in which an object belongs to one hierarchy class. In particular, the integration of the viewpoint mechanism to the conventional object-oriented data model gives it flexibility and allows one to improve the modeling power of objects. The viewpoint paradigm refers to the multiple descriptions, the distribution, and the evolution of object. Also, it can be an undeniable contribution for a distributed design of complex databases. The motivation of this paper is to define an object data model integrating viewpoints in databases and to present a federated database architecture integrating multiple viewpoint sources following a local-as-extended-view data integration approach.object-oriented data model, OQL language, LAEV data integration approach, MVDB model, federated databases, Local-As-View Strategy.

    UniProt in RDF: Tackling Data Integration and Distributed Annotation with the Semantic Web

    Get PDF
    The UniProt knowledgebase (UniProtKB) is a comprehensive repository of protein sequence and annotation data. We collect information from the scientific literature and other databases and provide links to over one hundred biological resources. Such links between different databases are an important basis for data integration, but the lack of a common standard to represent and link information makes data integration an expensive business. At UniProt we have started to tackle this problem by using the Resource Description Framework ("http://www.w3.org/RDF/":http://www.w3.org/RDF/) to represent our data. RDF is a core technology for the World Wide Web Consortium's Semantic Web activities ("http://www.w3.org/2001/sw/":http://www.w3.org/2001/sw/) and is therefore well suited to work in a distributed and decentralized environment. The RDF data model represents arbitrary information as a set of simple statements of the form subject-predicate-object. To enable the linking of data on the Web, RDF requires that each resource must have a (globally) unique identifier. These identifiers allow everybody to make statements about a given resource and, together with the simple structure of the RDF data model, make it easy to combine the statements made by different people (or databases) to allow queries across different datasets. RDF is thus an industry standard that can make a major contribution to solve two important problems of bioinformatics: distributed annotation and data integration

    Increasing the Efficiency of Rule-Based Expert Systems Applied on Heterogeneous Data Sources

    Get PDF
    Nowadays, the proliferation of heterogeneous data sources provided by different research and innovation projects and initiatives is proliferating more and more and presents huge opportunities. These developments create an increase in the number of different data sources, which could be involved in the process of decisionmaking for a specific purpose, but this huge heterogeneity makes this task difficult. Traditionally, the expert systems try to integrate all information into a main database, but, sometimes, this information is not easily available, or its integration with other databases is very problematic. In this case, it is essential to establish procedures that make a metadata distributed integration for them. This process provides a “mapping” of available information, but it is only at logic level. Thus, on a physical level, the data is still distributed into several resources. In this sense, this chapter proposes a distributed rule engine extension (DREE) based on edge computing that makes an integration of metadata provided by different heterogeneous data sources, applying then a mathematical decomposition over the antecedent of rules. The use of the proposed rule engine increases the efficiency and the capability of rule-based expert systems, providing the possibility of applying these rules over distributed and heterogeneous data sources, increasing the size of data sets that could be involved in the decision-making process

    Biodiversity informatics: the challenge of linking data and the role of shared identifiers

    Get PDF
    A major challenge facing biodiversity informatics is integrating data stored in widely distributed databases. Initial efforts have relied on taxonomic names as the shared identifier linking records in different databases. However, taxonomic names have limitations as identifiers, being neither stable nor globally unique, and the pace of molecular taxonomic and phylogenetic research means that a lot of information in public sequence databases is not linked to formal taxonomic names. This review explores the use of other identifiers, such as specimen codes and GenBank accession numbers, to link otherwise disconnected facts in different databases. The structure of these links can also be exploited using the PageRank algorithm to rank the results of searches on biodiversity databases. The key to rich integration is a commitment to deploy and reuse globally unique, shared identifiers (such as DOIs and LSIDs), and the implementation of services that link those identifiers

    Schema Management for Data Integration: A Short Survey

    Get PDF
    Schema management is a basic problem in many database application domains such as data integration systems. Users need to access and manipulate data from several databases. In this context, in order to integrate data from distributed heterogeneous database sources, data integration systems demand the resolution of several issues that arise in managing schemas. In this paper, we present a brief survey of the problem of schema matching which is used for solving problems of schema integration processing. Moreover, we propose a technique for integrating and querying distributed heterogeneous XML schemas.

    Secret charing vs. encryption-based techniques for privacy preserving data mining

    Get PDF
    Privacy preserving querying and data publishing has been studied in the context of statistical databases and statistical disclosure control. Recently, large-scale data collection and integration efforts increased privacy concerns which motivated data mining researchers to investigate privacy implications of data mining and how data mining can be performed without violating privacy. In this paper, we first provide an overview of privacy preserving data mining focusing on distributed data sources, then we compare two technologies used in privacy preserving data mining. The first technology is encryption based, and it is used in earlier approaches. The second technology is secret-sharing which is recently being considered as a more efficient approach

    An ontology approach to data integration

    Get PDF
    The term Federated Databases refers to the data integration of distributed, autonomous and heterogeneous databases. However, a federation can also include information systems, not only databases. At integrating data, several issues must be addressed. Here, we focus on the problem of heterogeneity, more specifically on semantic heterogeneity that is, problems rela ted to semantically equivalent concepts or semantically related/unrelated concepts. In order to address this problem, we apply the idea of ontologies as a tool for data integration. In this paper, we explain this concept and we briefly describe a method for constructing an ontology by using a hybrid ontology approach.Facultad de Informátic
    corecore