5 research outputs found

    Exploiting general-purpose background knowledge for automated schema matching

    Full text link
    The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process. In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources. A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems. One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented. In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications

    Efficient Maximum A-Posteriori Inference in Markov Logic and Application in Description Logics

    Full text link
    Maximum a-posteriori (MAP) query in statistical relational models computes the most probable world given evidence and further knowledge about the domain. It is arguably one of the most important types of computational problems, since it is also used as a subroutine in weight learning algorithms. In this thesis, we discuss an improved inference algorithm and an application for MAP queries. We focus on Markov logic (ML) as statistical relational formalism. Markov logic combines Markov networks with first-order logic by attaching weights to first-order formulas. For inference, we improve existing work which translates MAP queries to integer linear programs (ILP). The motivation is that existing ILP solvers are very stable and fast and are able to precisely estimate the quality of an intermediate solution. In our work, we focus on improving the translation process such that we result in ILPs having fewer variables and fewer constraints. Our main contribution is the Cutting Plane Aggregation (CPA) approach which leverages symmetries in ML networks and parallelizes MAP inference. Additionally, we integrate the cutting plane inference (Riedel 2008) algorithm which significantly reduces the number of groundings by solving multiple smaller ILPs instead of one large ILP. We present the new Markov logic engine RockIt which outperforms state-of-the-art engines in standard Markov logic benchmarks. Afterwards, we apply the MAP query to description logics. Description logics (DL) are knowledge representation formalisms whose expressivity is higher than propositional logic but lower than first-order logic. The most popular DLs have been standardized in the ontology language OWL and are an elementary component in the Semantic Web. We combine Markov logic, which essentially follows the semantic of a log-linear model, with description logics to log-linear description logics. In log-linear description logic weights can be attached to any description logic axiom. Furthermore, we introduce a new query type which computes the most-probable 'coherent' world. Possible applications of log-linear description logics are mainly located in the area of ontology learning and data integration. With our novel log-linear description logic reasoner ELog, we experimentally show that more expressivity increases quality and that the solutions of optimal solving strategies have higher quality than the solutions of approximate solving strategies

    A Knowledge-based Approach for Creating Detailed Landscape Representations by Fusing GIS Data Collections with Associated Uncertainty

    Get PDF
    Geographic Information Systems (GIS) data for a region is of different types and collected from different sources, such as aerial digitized color imagery, elevation data consisting of terrain height at different points in that region, and feature data consisting of geometric information and properties about entities above/below the ground in that region. Merging GIS data and understanding the real world information present explicitly or implicitly in that data is a challenging task. This is often done manually by domain experts because of their superior capability to efficiently recognize patterns, combine, reason, and relate information. When a detailed digital representation of the region is to be created, domain experts are required to make best-guess decisions about each object. For example, a human would create representations of entities by collectively looking at the data layers, noting even elements that are not visible, like a covered overpass or underwater tunnel of a certain width and length. Such detailed representations are needed for use by processes like visualization or 3D modeling in applications used by military, simulation, earth sciences and gaming communities. Many of these applications are increasingly using digitally synthesized visuals and require detailed digital 3D representations to be generated quickly after acquiring the necessary initial data. Our main thesis, and a significant research contribution of this work, is that this task of creating detailed representations can be automated to a very large extent using a methodology which first fuses all Geographic Information System (GIS) data sources available into knowledge base (KB) assertions (instances) representing real world objects using a subprocess called GIS2KB. Then using reasoning, implicit information is inferred to define detailed 3D entity representations using a geometry definition engine called KB2Scene. Semantic Web is used as the semantic inferencing system and is extended with a data extraction framework. This framework enables the extraction of implicit property information using data and image analysis techniques. The data extraction framework supports extraction of spatial relationship values and attribution of uncertainties to inferred details. Uncertainty is recorded per property and used under Zadeh fuzzy semantics to compute a resulting uncertainty for inferred assertional axioms. This is achieved by another major contribution of our research, a unique extension of the KB ABox Realization service using KB explanation services. Previous semantics based research in this domain has concentrated more on improving represented details through the addition of artifacts like lights, signage, crosswalks, etc. Previous attempts regarding uncertainty in assertions use a modified reasoner expressivity and calculus. Our work differs in that separating formal knowledge from data processing allows fusion of different heterogeneous data sources which share the same context. Imprecision is modeled through uncertainty on assertions without defining a new expressivity as long as KB explanation services are available for the used expressivity. We also believe that in our use case, this simplifies uncertainty calculations. The uncertainties are then available for user-decision at output. We show that the process of creating 3D visuals from GIS data sources can be more automated, modular, verifiable, and the knowledge base instances available for other applications to use as part of a common knowledge base. We define our method’s components, discuss advantages and limitations, and show sample results for the transportation domain

    Folkoncept: método de suporte à modelagem conceitual de ontologias a partir da aquisição de conhecimentos de folksonomias

    Get PDF
    In this work, we present a method called Folkoncept for supporting conceptual modeling of ontologies starting with knowledge acquisition based on folksonomies. The method aims at helping actors enrolled in the development process in eliciting terms and in the modeling choice of how to represent these terms in the ontology. The objective of applying the Folkoncept method is to reduce the knowledge acquisition bottleneck through ontology learning techniques based on folksonomies. Folkoncept reaches three activities of the development process: knowledge acquisition, conceptual modeling, and evaluation, the latter being integrated into the conceptual modeling activity. With relation to the knowledge acquisition, Folkoncept deals with the retrieval, representation, and enrichment of terms (tags) coming from a folksonomy resulting from a social process of tagging performed by the actors involved in the ontology development process. In the conceptual modeling activity, Folkoncept helps the ontology designer to transform folksonomy’s tags into elements of the ontology being developed. In the ontology evaluation activity, the method helps ontology designers to validate the new elements that are suggested by the ontology learning method. In addition, the Folkoncept reduces the difficulty in using the OntoClean methodology making its use transparent to the ontology designer. Folkoncept was evaluated by means of ontology development experiments realized in a controlled environment by teams composed by ontology designers coming from the area of computing. Some teams worked with a prototype system that implements the Folkoncept. Results obtained by these teams were compared with the results from teams working without the system. The comparison was performed through metrics that show that the Folkoncept helped ontology designers to develop more descriptive ontologies with fewer errors with relation to the idealized taxonomies of OntoClean.Neste trabalho, apresenta-se um método para o desenvolvimento de ontologias a partir de folksonomias. O objetivo do método é auxiliar os atores envolvidos no processo de desenvolvimento na elicitação de termos a serem representados na ontologia e na tomada de decisão de como modelar tais termos. Busca-se, pela aplicação do método, reduzir o gargalo na aquisição de conhecimentos empregando-se técnicas de aprendizado de ontologias a partir de folksonomias. O método atinge três atividades do processo de desenvolvimento de ontologias: aquisição de conhecimentos, modelagem conceitual e avaliação das ontologias, sendo este último integrado à modelagem conceitual. Na aquisição de conhecimentos, o método trata da recuperação, representação e enriquecimento das etiquetas (termos) presentes nas folksonomias originadas de um processo social de etiquetagem realizado pelos atores envolvidos no desenvolvimento da ontologia. Na modelagem conceitual, auxilia o projetista a transformar as etiquetas das folksonomias em elementos da ontologia em desenvolvimento, ou seja, na modelagem de novos elementos. Na avaliação de ontologias, o método auxilia os projetistas na validação dos novos elementos que são sugeridos pelo método de aprendizado. Além disso, o método diminui a dificuldade em utilizar a metodologia OntoClean tornando sua aplicação transparente ao projetista. A avaliação do método foi realizada por meio de experimentos de desenvolvimento de ontologias em um ambiente controlado. Participaram dos experimentos equipes compostas por projetistas da área da computação, sendo que algumas equipes trabalharam com um protótipo que implementa o método e outras não. A avaliação foi realizada por meio de métricas que comprovaram que o sistema auxiliou os projetistas a desenvolverem ontologias mais descritivas e com número menor de erros de formalismo
    corecore