216 research outputs found

    Matcher Composition Methods for Automatic Schema Matching

    Full text link
    We address the problem of automating the process of deciding whether two data schema ele-ments match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component match-ers, and the careful selection of optimization switches can improve matching accuracy even further

    Schema matching in a peer-to-peer database system

    Get PDF
    Includes bibliographical references (p. 112-118).Peer-to-peer or P2P systems are applications that allow a network of peers to share resources in a scalable and efficient manner. My research is concerned with the use of P2P systems for sharing databases. To allow data mediation between peers' databases, schema mappings need to exist, which are mappings between semantically equivalent attributes in different peers' schemas. Mappings can either be defined manually or found semi-automatically using a technique called schema matching. However, schema matching has not been used much in dynamic environments, such as P2P networks. Therefore, this thesis investigates how to enable effective semi-automated schema matching within a P2P network

    Fusing Automatically Extracted Annotations for the Semantic Web

    Get PDF
    This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

    A survey of approaches to automatic schema matching

    Get PDF
    Schema matching is a basic problem in many database application domains, such as data integration, E-business, data warehousing, and semantic query processing. In current implementations, schema matching is typically performed manually, which has significant limitations. On the other hand, previous research papers have proposed many techniques to achieve a partial automation of the match operation for specific application domains. We present a taxonomy that covers many of these existing approaches, and we describe the approaches in some detail. In particular, we distinguish between schema-level and instance-level, element-level and structure-level, and language-based and constraint-based matchers. Based on our classification we review some previous match implementations thereby indicating which part of the solution space they cover. We intend our taxonomy and review of past work to be useful when comparing different approaches to schema matching, when developing a new match algorithm, and when implementing a schema matching component

    A Cooperative Approach for Composite Ontology Matching

    Get PDF
    Ontologies have proven to be an essential element in a range of applications in which knowl-edge plays a key role. Resolving the semantic heterogeneity problem is crucial to allow the interoperability between ontology-based systems. This makes automatic ontology matching, as an anticipated solution to semantic heterogeneity, an important, research issue. Many dif-ferent approaches to the matching problem have emerged from the literature. An important issue of ontology matching is to find effective ways of choosing among many techniques and their variations, and then combining their results. An innovative and promising option is to formalize the combination of matching techniques using agent-based approaches, such as cooperative negotiation and argumentation. In this thesis, the formalization of the on-tology matching problem following an agent-based approach is proposed. Such proposal is evaluated using state-of-the-art data sets. The results show that the consensus obtained by negotiation and argumentation represent intermediary values which are closer to the best matcher. As the best matcher may vary depending on specific differences of multiple data sets, cooperative approaches are an advantage. *** RESUMO - Ontologias são elementos essenciais em sistemas baseados em conhecimento. Resolver o problema de heterogeneidade semântica é fundamental para permitira interoperabilidade entre sistemas baseados em ontologias. Mapeamento automático de ontologias pode ser visto como uma solução para esse problema. Diferentes e complementares abordagens para o problema são propostas na literatura. Um aspecto importante em mapeamento consiste em selecionar o conjunto adequado de abordagens e suas variações, e então combinar seus resultados. Uma opção promissora envolve formalizara combinação de técnicas de ma-peamento usando abordagens baseadas em agentes cooperativos, tais como negociação e argumentação. Nesta tese, a formalização do problema de combinação de técnicas de ma-peamento usando tais abordagens é proposta e avaliada. A avaliação, que envolve conjuntos de testes sugeridos pela comunidade científica, permite concluir que o consenso obtido pela negociação e pela argumentação não é exatamente a melhoria de todos os resultados individuais, mas representa os valores intermediários que são próximo da melhor técnica. Considerando que a melhor técnica pode variar dependendo de diferencas específicas de múltiplas bases de dados, abordagens cooperativas são uma vantagem

    Reconciling Schema Matching Networks

    Get PDF
    Depto. de Mineralogía y PetrologíaFac. de Ciencias GeológicasTRUEpu

    Integrating and conceptualizing heterogeneous ontologies on the web

    Get PDF
    Master'sMASTER OF SCIENC

    Automated extension of biomedical ontologies

    Get PDF
    Developing and extending a biomedical ontology is a very demanding process, particularly because biomedical knowledge is diverse, complex and continuously changing and growing. Existing automated and semi-automated techniques are not tailored to handling the issues in extending biomedical ontologies. This thesis advances the state of the art in semi-automated ontology extension by presenting a framework as well as methods and methodologies for automating ontology extension specifically designed to address the features of biomedical ontologies.The overall strategy is based on first predicting the areas of the ontology that are in need of extension and then applying ontology learning and ontology matching techniques to extend them. A novel machine learning approach for predicting these areas based on features of past ontology versions was developed and successfully applied to the Gene Ontology. Methods and techniques were also specifically designed for matching biomedical ontologies and retrieving relevant biomedical concepts from text, which were shown to be successful in several applications.O desenvolvimento e extensão de uma ontologia biomédica é um processo muito exigente, dada a diversidade, complexidade e crescimento contínuo do conhecimento biomédico. As técnicas existentes nesta área não estão preparadas para lidar com os desafios da extensão de uma ontologia biomédica. Esta tese avança o estado da arte na extensão semi-automática de ontologias, apresentando uma framework assim como métodos e metodologias para a automação da extensão de ontologias especificamente desenhados tendo em conta as características das ontologias biomédicas. A estratégia global é baseada em primeiro prever quais as áreas da ontologia que necessitam extensão, e depois usá-las como enfoque para técnicas de alinhamento e aprendizagem de ontologias, com o objectivo de as estender. Uma nova estratégia de aprendizagem automática para prever estas áreas baseada em atributos de antigas versões de ontologias foi desenvolvida e testada com sucesso na Gene Ontology. Foram também especificamente desenvolvidos métodos e técnicas para o alinhamento de ontologias biomédicas e extracção de conceitos relevantes de texto, cujo sucesso foi demonstrado em várias aplicações.Fundação para a Ciência e a Tecnologi

    2D Image Features Detector And Descriptor Selection Expert System

    Full text link
    Detection and description of keypoints from an image is a well-studied problem in Computer Vision. Some methods like SIFT, SURF or ORB are computationally really efficient. This paper proposes a solution for a particular case study on object recognition of industrial parts based on hierarchical classification. Reducing the number of instances leads to better performance, indeed, that is what the use of the hierarchical classification is looking for. We demonstrate that this method performs better than using just one method like ORB, SIFT or FREAK, despite being fairly slower.Comment: 10 pages, 5 figures, 5 table
    corecore