92 research outputs found

    Dealing with uncertain entities in ontology alignment using rough sets

    Get PDF
    This is the author's accepted manuscript. The final published article is available from the link below. Copyright @ 2012 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.Ontology alignment facilitates exchange of knowledge among heterogeneous data sources. Many approaches to ontology alignment use multiple similarity measures to map entities between ontologies. However, it remains a key challenge in dealing with uncertain entities for which the employed ontology alignment measures produce conflicting results on similarity of the mapped entities. This paper presents OARS, a rough-set based approach to ontology alignment which achieves a high degree of accuracy in situations where uncertainty arises because of the conflicting results generated by different similarity measures. OARS employs a combinational approach and considers both lexical and structural similarity measures. OARS is extensively evaluated with the benchmark ontologies of the ontology alignment evaluation initiative (OAEI) 2010, and performs best in the aspect of recall in comparison with a number of alignment systems while generating a comparable performance in precision

    Ontology matching: state of the art and future challenges

    Get PDF
    shvaiko2013aInternational audienceAfter years of research on ontology matching, it is reasonable to consider several questions: is the field of ontology matching still making progress? Is this progress significant enough to pursue some further research? If so, what are the particularly promising directions? To answer these questions, we review the state of the art of ontology matching and analyze the results of recent ontology matching evaluations. These results show a measurable improvement in the field, the speed of which is albeit slowing down. We conjecture that significant improvements can be obtained only by addressing important challenges for ontology matching. We present such challenges with insights on how to approach them, thereby aiming to direct research into the most promising tracks and to facilitate the progress of the field

    Doctor of Philosophy

    Get PDF
    dissertationThe explosion of structured Web data (e.g., online databases, Wikipedia infoboxes) creates many opportunities for integrating and querying these data that go far beyond the simple search capabilities provided by search engines. Although much work has been devoted to data integration in the database community, the Web brings new challenges: the Web-scale (e.g., the large and growing volume of data) and the heterogeneity in Web data. Because there are so much data, scalable techniques that require little or no manual intervention and that are robust to noisy data are needed. In this dissertation, we propose a new and effective approach for matching Web-form interfaces and for matching multilingual Wikipedia infoboxes. As a further step toward these problems, we propose a general prudent schema-matching framework that matches a large number of schemas effectively. Our comprehensive experiments for Web-form interfaces and Wikipedia infoboxes show that it can enable on-the-fly, automatic integration of large collections of structured Web data. Another problem we address in this dissertation is schema discovery. While existing integration approaches assume that the relevant data sources and their schemas have been identified in advance, schemas are not always available for structured Web data. Approaches exist that exploit information in Wikipedia to discover the entity types and their associate schemas. However, due to inconsistencies, sparseness, and noise from the community contribution, these approaches are error prone and require substantial human intervention. Given the schema heterogeneity in Wikipedia infoboxes, we developed a new approach that uses the structured information available in infoboxes to cluster similar infoboxes and infer the schemata for entity types. Our approach is unsupervised and resilient to the unpredictable skew in the entity class distribution. Our experiments, using over one hundred thousand infoboxes extracted from Wikipedia, indicate that our approach is effective and produces accurate schemata for Wikipedia entities

    Constraints preserving genetic algorithm for learning fuzzy measures with an application to ontology matching

    Get PDF
    Abstract. Both the fuzzy measure and integral have been widely studied for multi-source information fusion. A number of researchers have proposed optimization techniques to learn a fuzzy measure from training data. In part, this task is difficult as the fuzzy measure can have a large number of free parameters (2 N − 2 for N sources) and it has many (monotonicity) constraints. In this paper, a new genetic algorithm approach to constraint preserving optimization of the fuzzy measure is present for the task of learning and fusing different ontology matching results. Preliminary results are presented to show the stability of the leaning algorithm and its effectiveness compared to existing approaches

    Reshare an Operational Ontology Framework for Research Modeling, Combining and Sharing

    Get PDF
    Scientists always face difficulties dealing with disjointed information. There is a need for a standardized and robust way to represent and exchange knowledge. Ontology has been widely used for this purpose. However, since research involves semantics and operations, we need to conceptualize both of them. In this thesis, we propose ReShare to provide a solution for this problem. Maximizing utilization while preserving the semantics is one of the main challenges when the heterogeneous knowledge is combined. Therefore, operational annotations were designed to allow generic object modeling, binding and representation. Furthermore, a test bed is developed and preliminary results are presented to show the usefulness and robustness of our approach. Moreover, two aggregation techniques for fusing ontology matchers are investigated as an initial work for building an algorithm which converts descriptive ontologies into operational ones

    Matcher Composition Methods for Automatic Schema Matching

    Full text link
    We address the problem of automating the process of deciding whether two data schema ele-ments match (that is, refer to the same actual object or concept), and propose several methods for combining evidence computed by multiple basic matchers. One class of methods uses Bayesian networks to account for the conditional dependency between the similarity values produced by individual matchers that use the same or similar information, so as to avoid overconfidence in match probability estimates and improve the accuracy of matching. Another class of methods relies on optimization switches that mitigate this dependency in a domain-independent manner. Experimental results under several testing protocols suggest that the matching accuracy of the Bayesian composite matchers can significantly exceed that of the individual component match-ers, and the careful selection of optimization switches can improve matching accuracy even further

    A Cooperative Approach for Composite Ontology Matching

    Get PDF
    Ontologies have proven to be an essential element in a range of applications in which knowl-edge plays a key role. Resolving the semantic heterogeneity problem is crucial to allow the interoperability between ontology-based systems. This makes automatic ontology matching, as an anticipated solution to semantic heterogeneity, an important, research issue. Many dif-ferent approaches to the matching problem have emerged from the literature. An important issue of ontology matching is to find effective ways of choosing among many techniques and their variations, and then combining their results. An innovative and promising option is to formalize the combination of matching techniques using agent-based approaches, such as cooperative negotiation and argumentation. In this thesis, the formalization of the on-tology matching problem following an agent-based approach is proposed. Such proposal is evaluated using state-of-the-art data sets. The results show that the consensus obtained by negotiation and argumentation represent intermediary values which are closer to the best matcher. As the best matcher may vary depending on specific differences of multiple data sets, cooperative approaches are an advantage. *** RESUMO - Ontologias sĂŁo elementos essenciais em sistemas baseados em conhecimento. Resolver o problema de heterogeneidade semĂąntica Ă© fundamental para permitira interoperabilidade entre sistemas baseados em ontologias. Mapeamento automĂĄtico de ontologias pode ser visto como uma solução para esse problema. Diferentes e complementares abordagens para o problema sĂŁo propostas na literatura. Um aspecto importante em mapeamento consiste em selecionar o conjunto adequado de abordagens e suas variaçÔes, e entĂŁo combinar seus resultados. Uma opção promissora envolve formalizara combinação de tĂ©cnicas de ma-peamento usando abordagens baseadas em agentes cooperativos, tais como negociação e argumentação. Nesta tese, a formalização do problema de combinação de tĂ©cnicas de ma-peamento usando tais abordagens Ă© proposta e avaliada. A avaliação, que envolve conjuntos de testes sugeridos pela comunidade cientĂ­fica, permite concluir que o consenso obtido pela negociação e pela argumentação nĂŁo Ă© exatamente a melhoria de todos os resultados individuais, mas representa os valores intermediĂĄrios que sĂŁo prĂłximo da melhor tĂ©cnica. Considerando que a melhor tĂ©cnica pode variar dependendo de diferencas especĂ­ficas de mĂșltiplas bases de dados, abordagens cooperativas sĂŁo uma vantagem

    Fusing Automatically Extracted Annotations for the Semantic Web

    Get PDF
    This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

    Machine Learning-Based Ontology Mapping Tool to Enable Interoperability in Coastal Sensor Networks

    Get PDF
    In today’s world, ontologies are being widely used for data integration tasks and solving information heterogeneity problems on the web because of their capability in providing explicit meaning to the information. The growing need to resolve the heterogeneities between different information systems within a domain of interest has led to the rapid development of individual ontologies by different organizations. These ontologies designed for a particular task could be a unique representation of their project needs. Thus, integrating distributed and heterogeneous ontologies by finding semantic correspondences between their concepts has become the key point to achieve interoperability among different representations. In this thesis, an advanced instance-based ontology matching algorithm has been proposed to enable data integration tasks in ocean sensor networks, whose data are highly heterogeneous in syntax, structure, and semantics. This provides a solution to the ontology mapping problem in such systems based on machine-learning methods and string-based methods

    Learning To Scale Up Search-Driven Data Integration

    Get PDF
    A recent movement to tackle the long-standing data integration problem is a compositional and iterative approach, termed “pay-as-you-go” data integration. Under this model, the objective is to immediately support queries over “partly integrated” data, and to enable the user community to drive integration of the data that relate to their actual information needs. Over time, data will be gradually integrated. While the pay-as-you-go vision has been well-articulated for some time, only recently have we begun to understand how it can be manifested into a system implementation. One branch of this effort has focused on enabling queries through keyword search-driven data integration, in which users pose queries over partly integrated data encoded as a graph, receive ranked answers generated from data and metadata that is linked at query-time, and provide feedback on those answers. From this user feedback, the system learns to repair bad schema matches or record links. Many real world issues of uncertainty and diversity in search-driven integration remain open. Such tasks in search-driven integration require a combination of human guidance and machine learning. The challenge is how to make maximal use of limited human input. This thesis develops three methods to scale up search-driven integration, through learning from expert feedback: (1) active learning techniques to repair links from small amounts of user feedback; (2) collaborative learning techniques to combine users’ conflicting feedback; and (3) debugging techniques to identify where data experts could best improve integration quality. We implement these methods within the Q System, a prototype of search-driven integration, and validate their effectiveness over real-world datasets
    • 

    corecore