59 research outputs found

    Semantic Integration of heterogeneous data sources in the MOMIS Data Transformation System

    Get PDF
    In the last twenty years, many data integration systems following a classical wrapper/mediator architecture and providing a Global Virtual Schema (a.k.a. Global Virtual View - GVV) have been proposed by the research community. The main issues faced by these approaches range from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. Despite the research effort, all the approaches proposed require a lot of user intervention for customizing and managing the data integration and reconciliation tasks. In some cases, the effort and the complexity of the task is huge, since it requires the development of specific programming codes. Unfortunately, due to the specificity to be addressed, application codes and solutions are not frequently reusable in other domains. For this reason, the Lowell Report 2005 has provided the guideline for the definition of a public benchmark for information integration problem. The proposal, called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches), focuses on how the data integration systems manage syntactic and semantic heterogeneities, which definitely are the greatest technical challenges in the field. We developed a Data Transformation System (DTS) that supports data transformation functions and produces query translation in order to push down to the sources the execution. Our DTS is based on MOMIS, a mediator-based data integration system that our research group is developing and supporting since 1999. In this paper, we show how the DTS is able to solve all the twelve queries of the THALIA benchmark by using a simple combination of declarative translation functions already available in the standard SQL language. We think that this is a remarkable result, mainly for two reasons: firstly to the best of our knowledge there is no system that has provided a complete answer to the benchmark, secondly, our queries does not require any overhead of new code

    Privacy-Preserving Data Integration for Health

    Get PDF
    The digital transformation of health processes has resulted in the collection of vast amounts of health-related data that presents significant potential to support medical research projects and improve the healthcare system. Many of these possibilities arise as a consequence of integrating data from different sources to create an accurate and unified representation of the underlying data and enable detailed data analysis that is not possible through any individual source. Achieving this vision requires the collection and processing of sensitive health-related data about individuals, thus privacy and confidentiality implications have to be considered. In this paper, I describe my doctoral research topic: the design and development of a novel Privacy-Preserving Data Integration (PPDI) framework which aims to effectively address the challenges and opportunities of integrating Big Health Data (BHD) while ensuring compliance with the General Data Protection Regulation (GDPR). The paper describes the planned methodology for implementing the PPDI process through the usage of data pseudonymization techniques and Privacy-Preserving Record Linkage (PPRL) methods and provides an overview of the new framework, which is based on the re-implementation of MOMIS towards a microservices architecture with added PPDI functionalities

    an approach for semantic integration of heterogeneous data sources

    Get PDF
    Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view

    Survey: Models and Prototypes of Schema Matching

    Get PDF
    Schema matching is critical problem within many applications to integration of data/information, to achieve interoperability, and other cases caused by schematic heterogeneity. Schema matching evolved from manual way on a specific domain, leading to a new models and methods that are semi-automatic and more general, so it is able to effectively direct the user within generate a mapping among elements of two the schema or ontologies better. This paper is a summary of literature review on models and prototypes on schema matching within the last 25 years to describe the progress of and research chalenge and opportunities on a new models, methods, and/or prototypes

    Survey on Techniques for Ontology Interoperability in Semantic Web

    Get PDF
    Ontology is a shared conceptualization of knowledge representation of particular domain. These are used for the enhancement of semantic information explicitly. It is considered as a key element in semantic web development. Creation of global web data sources is impossible because of the dynamic nature of the web. Ontology Interoperability provides the reusability of ontologies. Different domain experts and ontology engineers create different ontologies for the same or similar domain depending on their data modeling requirements. These cause ontology heterogeneity and inconsistency problems. For more better and precise results ontology mapping is the solution. As their use has increased, providing means of resolving semantic differences has also become very important. Papers on ontology interoperability report the results on different frameworks and this makes their comparison almost impossible. Therefore, the main focus of this paper will be on providing some basics of ontology interoperability and briefly introducing its different approaches. In this paper we survey the approaches that have been proposed for providing interoperability among domain ontologies and its related techniques and tools

    Transparent access to relational, autonomous and distributed databases using semantic web and service oriented technologies

    Get PDF
    With the constant grow of enterprises and the need to share information across departments and business areas becomes more critical, companies are turning to integration to provide a method for interconnecting heterogeneous, distributed and autonomous systems. Whether the sales application needs to interface with the inventory application, the procurement application connect to an auction site, it seems that any application can be made better by integrating it with other applications. Integration between applications can face several troublesome due the fact that applications may not have been designed and implemented having integration in mind. Regarding to integration issues, two tier software systems, composed by the database tier and by the “front-end” tier (interface), have shown some limitations. As a solution to overcome the two tier limitations, three tier systems were proposed in the literature. Thus, by adding a middle-tier (referred as middleware) between the database tier and the “front-end” tier (or simply referred application), three main benefits emerge. The first benefit is related with the fact that the division of software systems in three tiers enables increased integration capabilities with other systems. The second benefit is related with the fact that any modifications to the individual tiers may be carried out without necessarily affecting the other tiers and integrated systems and the third benefit, consequence of the others, is related with less maintenance tasks in software system and in all integrated systems. Concerning software development in three tiers, this dissertation focus on two emerging technologies, Semantic Web and Service Oriented Architecture, combined with middleware. These two technologies blended with middleware, which resulted in the development of Swoat framework (Service and Semantic Web Oriented ArchiTecture), lead to the following four synergic advantages: (1) allow the creation of loosely-coupled systems, decoupling the database from “front-end” tiers, therefore reducing maintenance; (2) the database schema is transparent to “front-end” tiers which are aware of the information model (or domain model) that describes what data is accessible; (3) integration with other heterogeneous systems is allowed by providing services provided by the middleware; (4) the service request by the “frontend” tier focus on ‘what’ data and not on ‘where’ and ‘how’ related issues, reducing this way the application development time by developers.Supervisor: António Jorge Silva Cardos

    KA-SB: from data integration to large scale reasoning

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The analysis of information in the biological domain is usually focused on the analysis of data from single on-line data sources. Unfortunately, studying a biological process requires having access to disperse, heterogeneous, autonomous data sources. In this context, an analysis of the information is not possible without the integration of such data.</p> <p>Methods</p> <p>KA-SB is a querying and analysis system for final users based on combining a data integration solution with a reasoner. Thus, the tool has been created with a process divided into two steps: 1) KOMF, the Khaos Ontology-based Mediator Framework, is used to retrieve information from heterogeneous and distributed databases; 2) the integrated information is crystallized in a (persistent and high performance) reasoner (DBOWL). This information could be further analyzed later (by means of querying and reasoning).</p> <p>Results</p> <p>In this paper we present a novel system that combines the use of a mediation system with the reasoning capabilities of a large scale reasoner to provide a way of finding new knowledge and of analyzing the integrated information from different databases, which is retrieved as a set of ontology instances. This tool uses a graphical query interface to build user queries easily, which shows a graphical representation of the ontology and allows users o build queries by clicking on the ontology concepts.</p> <p>Conclusion</p> <p>These kinds of systems (based on KOMF) will provide users with very large amounts of information (interpreted as ontology instances once retrieved), which cannot be managed using traditional main memory-based reasoners. We propose a process for creating persistent and scalable knowledgebases from sets of OWL instances obtained by integrating heterogeneous data sources with KOMF. This process has been applied to develop a demo tool <url>http://khaos.uma.es/KA-SB</url>, which uses the BioPax Level 3 ontology as the integration schema, and integrates UNIPROT, KEGG, CHEBI, BRENDA and SABIORK databases.</p

    Information Integration - the process of integration, evolution and versioning

    Get PDF
    At present, many information sources are available wherever you are. Most of the time, the information needed is spread across several of those information sources. Gathering this information is a tedious and time consuming job. Automating this process would assist the user in its task. Integration of the information sources provides a global information source with all information needed present. All of these information sources also change over time. With each change of the information source, the schema of this source can be changed as well. The data contained in the information source, however, cannot be changed every time, due to the huge amount of data that would have to be converted in order to conform to the most recent schema.\ud In this report we describe the current methods to information integration, evolution and versioning. We distinguish between integration of schemas and integration of the actual data. We also show some key issues when integrating XML data sources
    • …
    corecore