18 research outputs found

    Natural Language Reasoning on ALC knowledge bases using Large Language Models

    Get PDF
    Τα προεκπαιδευμένα γλωσσικά μοντέλα έχουν κυριαρχήσει στην επεξεργασία φυσικής γλώσσας, αποτελώντας πρόκληση για τη χρήση γλωσσών αναπαράστασης γνώσης για την περιγραφή του κόσμου. Ενώ οι γλώσσες αυτές δεν είναι αρκετά εκφραστικές για να καλύψουν πλήρως τη φυσική γλώσσα, τα γλωσσικά μοντέλα έχουν ήδη δείξει σπουδαία αποτελέσματα όσον αφορά την κατανόηση και την ανάκτηση πληροφοριών απευθείας σε δεδομένα φυσικής γλώσσας. Διερευνούμε τις επιδόσεις των γλωσσικών μοντέλων για συλλογιστική φυσικής γλώσσας στη περιγραφική λογική ALC. Δημιουργούμε ένα σύνολο δεδομένων από τυχαίες βάσεις γνώσης ALC, μεταφρασμένες σε φυσική γλώσσα, ώστε να αξιολογήσουμε την ικανότητα των γλωσσικών μοντέλων να λειτουργούν ως συστήματα απάντησης ερωτήσεων πάνω σε βάσεις γνώσης φυσικής γλώσσας.Pretrained language models have dominated natural language processing, challenging the use of knowledge representation languages to describe the world. While these lan- guages are not expressive enough to fully cover natural language, language models have already shown great results in terms of understanding and information retrieval directly on natural language data. We explore language models’ performance at the downstream task of natural language reasoning in the description logic ALC. We generate a dataset of random ALC knowledge bases, translated in natural language, in order to assess the language models’ ability to function as question-answering systems over natural language knowledge bases

    Belief Revision in Expressive Knowledge Representation Formalisms

    Get PDF
    We live in an era of data and information, where an immeasurable amount of discoveries, findings, events, news, and transactions are generated every second. Governments, companies, or individuals have to employ and process all that data for knowledge-based decision-making (i.e. a decision-making process that uses predetermined criteria to measure and ensure the optimal outcome for a specific topic), which then prompt them to view the knowledge as valuable resource. In this knowledge-based view, the capability to create and utilize knowledge is the key source of an organization or individual’s competitive advantage. This dynamic nature of knowledge leads us to the study of belief revision (or belief change), an area which emerged from work in philosophy and then impacted further developments in computer science and artificial intelligence. In belief revision area, the AGM postulates by Alchourrón, Gärdenfors, and Makinson continue to represent a cornerstone in research related to belief change. Katsuno and Mendelzon (K&M) adopted the AGM postulates for changing belief bases and characterized AGM belief base revision in propositional logic over finite signatures. In this thesis, two research directions are considered. In the first, by considering the semantic point of view, we generalize K&M’s approach to the setting of (multiple) base revision in arbitrary Tarskian logics, covering all logics with a classical model-theoretic semantics and hence a wide variety of logics used in knowledge representation and beyond. Our generic formulation applies to various notions of “base”, such as belief sets, arbitrary or finite sets of sentences, or single sentences. The core result is a representation theorem showing a two-way correspondence between AGM base revision operators and certain “assignments”: functions mapping belief bases to total — yet not transitive — “preference” relations between interpretations. Alongside, we present a companion result for the case when the AGM postulate of syntax-independence is abandoned. We also provide a characterization of all logics for which our result can be strengthened to assignments producing transitive preference relations (as in K&M’s original work), giving rise to two more representation theorems for such logics, according to syntax dependence vs. independence. The second research direction in this thesis explores two approaches for revising description logic knowledge bases under fixed-domain semantics, namely model-based approach and individual-based approach. In this logical setting, models of the knowledge bases can be enumerated and can be computed to produce the revision result, semantically. We show a characterization of the AGM revision operator for this logic and present a concrete model-based revision approach via distance between interpretations. In addition, by weakening the KB based on certain domain elements, a novel individual-based revision operator is provided as an alternative approach

    Computational and human-based methods for knowledge discovery over knowledge graphs

    Get PDF
    The modern world has evolved, accompanied by the huge exploitation of data and information. Daily, increasing volumes of data from various sources and formats are stored, resulting in a challenging strategy to manage and integrate them to discover new knowledge. The appropriate use of data in various sectors of society, such as education, healthcare, e-commerce, and industry, provides advantages for decision support in these areas. However, knowledge discovery becomes challenging since data may come from heterogeneous sources with important information hidden. Thus, new approaches that adapt to the new challenges of knowledge discovery in such heterogeneous data environments are required. The semantic web and knowledge graphs (KGs) are becoming increasingly relevant on the road to knowledge discovery. This thesis tackles the problem of knowledge discovery over KGs built from heterogeneous data sources. We provide a neuro-symbolic artificial intelligence system that integrates symbolic and sub-symbolic frameworks to exploit the semantics encoded in a KG and its structure. The symbolic system relies on existing approaches of deductive databases to make explicit, implicit knowledge encoded in a KG. The proposed deductive database DSDS can derive new statements to ego networks given an abstract target prediction. Thus, DSDS minimizes data sparsity in KGs. In addition, a sub-symbolic system relies on knowledge graph embedding (KGE) models. KGE models are commonly applied in the KG completion task to represent entities in a KG in a low-dimensional vector space. However, KGE models are known to suffer from data sparsity, and a symbolic system assists in overcoming this fact. The proposed approach discovers knowledge given a target prediction in a KG and extracts unknown implicit information related to the target prediction. As a proof of concept, we have implemented the neuro-symbolic system on top of a KG for lung cancer to predict polypharmacy treatment effectiveness. The symbolic system implements a deductive system to deduce pharmacokinetic drug-drug interactions encoded in a set of rules through the Datalog program. Additionally, the sub-symbolic system predicts treatment effectiveness using a KGE model, which preserves the KG structure. An ablation study on the components of our approach is conducted, considering state-of-the-art KGE methods. The observed results provide evidence for the benefits of the neuro-symbolic integration of our approach, where the neuro-symbolic system for an abstract target prediction exhibits improved results. The enhancement of the results occurs because the symbolic system increases the prediction capacity of the sub-symbolic system. Moreover, the proposed neuro-symbolic artificial intelligence system in Industry 4.0 (I4.0) is evaluated, demonstrating its effectiveness in determining relatedness among standards and analyzing their properties to detect unknown relations in the I4.0KG. The results achieved allow us to conclude that the proposed neuro-symbolic approach for an abstract target prediction improves the prediction capability of KGE models by minimizing data sparsity in KGs

    Explainable methods for knowledge graph refinement and exploration via symbolic reasoning

    Get PDF
    Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollständig und enthalten auch ungültige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die Erklärbarkeit und Verständlichkeit von Wissensgraphinhalten für Nutzer. In Anwendungen ist darüber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen Entitäten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch für tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz über dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit Erklärungen für Nutzer. Die Dissertation umfasst folgende Beiträge: Insbesondere leistet die Dissertation folgende Beiträge: • Zur Wissensgraph-Erweiterung präsentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch Hinzufügen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln können neue Fakten inferieren und somit Lücken im Wissensgraphen schließen. Experimente mit großen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche Erklärungen liefert. • Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen Repräsentationen für fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken für die Regelqualität verwendet. Experimente zeigen, dass RuLES die Qualität der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. • Zur Unterstützung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von Erklärungen für Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen für Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und Qualität der entdeckten Erklärungen deutlich verbessert. Die generierten unterstützen Erklärungen unterstütze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. • Zur Unterstützung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen Entitäts-Clustern mit Erklärungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-Erklärung besteht aus einer Kombination von Relationen zwischen den Entitäten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- Qualität und die Cluster-Erklärbarkeit durch iteratives Verschränken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher Qualität berechnet und dass die Cluster-Erklärungen für Nutzer informativ sind

    Reasoning in Description Logic Ontologies for Privacy Management

    Get PDF
    A rise in the number of ontologies that are integrated and distributed in numerous application systems may provide the users to access the ontologies with different privileges and purposes. In this situation, preserving confidential information from possible unauthorized disclosures becomes a critical requirement. For instance, in the clinical sciences, unauthorized disclosures of medical information do not only threaten the system but also, most importantly, the patient data. Motivated by this situation, this thesis initially investigates a privacy problem, called the identity problem, where the identity of (anonymous) objects stored in Description Logic ontologies can be revealed or not. Then, we consider this problem in the context of role-based access control to ontologies and extend it to the problem asking if the identity belongs to a set of known individuals of cardinality smaller than the number k. If it is the case that some confidential information of persons, such as their identity, their relationships or their other properties, can be deduced from an ontology, which implies that some privacy policy is not fulfilled, then one needs to repair this ontology such that the modified one complies with the policies and preserves the information from the original ontology as much as possible. The repair mechanism we provide is called gentle repair and performed via axiom weakening instead of axiom deletion which was commonly used in classical approaches of ontology repair. However, policy compliance itself is not enough if there is a possible attacker that can obtain relevant information from other sources, which together with the modified ontology still violates the privacy policies. Safety property is proposed to alleviate this issue and we investigate this in the context of privacy-preserving ontology publishing. Inference procedures to solve those privacy problems and additional investigations on the complexity of the procedures, as well as the worst-case complexity of the problems, become the main contributions of this thesis.:1. Introduction 1.1 Description Logics 1.2 Detecting Privacy Breaches in Information System 1.3 Repairing Information Systems 1.4 Privacy-Preserving Data Publishing 1.5 Outline and Contribution of the Thesis 2. Preliminaries 2.1 Description Logic ALC 2.1.1 Reasoning in ALC Ontologies 2.1.2 Relationship with First-Order Logic 2.1.3. Fragments of ALC 2.2 Description Logic EL 2.3 The Complexity of Reasoning Problems in DLs 3. The Identity Problem and Its Variants in Description Logic Ontologies 3.1 The Identity Problem 3.1.1 Description Logics with Equality Power 3.1.2 The Complexity of the Identity Problem 3.2 The View-Based Identity Problem 3.3 The k-Hiding Problem 3.3.1 Upper Bounds 3.3.2 Lower Bound 4. Repairing Description Logic Ontologies 4.1 Repairing Ontologies 4.2 Gentle Repairs 4.3 Weakening Relations 4.4 Weakening Relations for EL Axioms 4.4.1 Generalizing the Right-Hand Sides of GCIs 4.4.2 Syntactic Generalizations 4.5 Weakening Relations for ALC Axioms 4.5.1 Generalizations and Specializations in ALC w.r.t. Role Depth 4.5.2 Syntactical Generalizations and Specializations in ALC 5. Privacy-Preserving Ontology Publishing for EL Instance Stores 5.1 Formalizing Sensitive Information in EL Instance Stores 5.2 Computing Optimal Compliant Generalizations 5.3 Computing Optimal Safe^{\exists} Generalizations 5.4 Deciding Optimality^{\exists} in EL Instance Stores 5.5 Characterizing Safety^{\forall} 5.6 Optimal P-safe^{\forall} Generalizations 5.7 Characterizing Safety^{\forall\exists} and Optimality^{\forall\exists} 6. Privacy-Preserving Ontology Publishing for EL ABoxes 6.1 Logical Entailments in EL ABoxes with Anonymous Individuals 6.2 Anonymizing EL ABoxes 6.3 Formalizing Sensitive Information in EL ABoxes 6.4 Compliance and Safety for EL ABoxes 6.5 Optimal Anonymizers 7. Conclusion 7.1 Main Results 7.2 Future Work Bibliograph

    Application of Semantics to Solve Problems in Life Sciences

    Get PDF
    Fecha de lectura de Tesis: 10 de diciembre de 2018La cantidad de información que se genera en la Web se ha incrementado en los últimos años. La mayor parte de esta información se encuentra accesible en texto, siendo el ser humano el principal usuario de la Web. Sin embargo, a pesar de todos los avances producidos en el área del procesamiento del lenguaje natural, los ordenadores tienen problemas para procesar esta información textual. En este cotexto, existen dominios de aplicación en los que se están publicando grandes cantidades de información disponible como datos estructurados como en el área de las Ciencias de la Vida. El análisis de estos datos es de vital importancia no sólo para el avance de la ciencia, sino para producir avances en el ámbito de la salud. Sin embargo, estos datos están localizados en diferentes repositorios y almacenados en diferentes formatos que hacen difícil su integración. En este contexto, el paradigma de los Datos Vinculados como una tecnología que incluye la aplicación de algunos estándares propuestos por la comunidad W3C tales como HTTP URIs, los estándares RDF y OWL. Haciendo uso de esta tecnología, se ha desarrollado esta tesis doctoral basada en cubrir los siguientes objetivos principales: 1) promover el uso de los datos vinculados por parte de la comunidad de usuarios del ámbito de las Ciencias de la Vida 2) facilitar el diseño de consultas SPARQL mediante el descubrimiento del modelo subyacente en los repositorios RDF 3) crear un entorno colaborativo que facilite el consumo de Datos Vinculados por usuarios finales, 4) desarrollar un algoritmo que, de forma automática, permita descubrir el modelo semántico en OWL de un repositorio RDF, 5) desarrollar una representación en OWL de ICD-10-CM llamada Dione que ofrezca una metodología automática para la clasificación de enfermedades de pacientes y su posterior validación haciendo uso de un razonador OWL

    Моделі та інформаційні технології проектування і управління в складних системах

    Get PDF
    Запропонований комплекс моделей для СППР оператора-керівника автоматизованих систем керування створює основу для автоматизації процесу надання операторові-керівникові інформаційної підтримки при ухваленні рішення про прийняття ергономічних розв'язків. У результаті проведених досліджень установлено факт приділення недостатньої уваги питанням управління при комп’ютеризації проектних робіт. Для реалізації можливості управління процесом проектування зроблено системний аналіз процесу проектування технічних об’єктів, запропоновано зв’язки системи автоматизації проектних робіт із зовнішнім середовищем, встановлена необхідність наявності трьох видів програмного забезпечення для функціонування системи, представлено обґрунтоване виконання проектних робіт за схемою «зверху-вниз», складено інформаційну модель для підтримки процесу проектування та отримано архітектурне рішення для системи автоматизації проектних робіт в організаціях, виявлено відсутність інформаційного опису процесів діяльності, який дозволяє їх супроводжувати протягом усього життєвого циклу. Єдина модель сприяє автоматизації роботи виконавців із різними рівнями повноважень, забезпечує однозначність розуміння ними виконуваних завдань

    A Language for Inconsistency-Tolerant Ontology Mapping

    Get PDF
    Ontology alignment plays a key role in enabling interoperability among various data sources present in the web. The nature of the world is such, that the same concepts differ in meaning, often so slightly, which makes it difficult to relate these concepts. It is the omni-present heterogeneity that is at the core of the web. The research work presented in this dissertation, is driven by the goal of providing a robust ontology alignment language for the semantic web, as we show that description logics based alignment languages are not suitable for aligning ontologies. The adoption of the semantic web technologies has been consistently on the rise over the past decade, and it continues to show promise. The core component of the semantic web is the set of knowledge representation languages -- mainly the W3C (World Wide Web Consortium) standards Web Ontology Language (OWL), Resource Description Framework (RDF), and Rule Interchange Format (RIF). While these languages have been designed in order to be suitable for the openness and extensibility of the web, they lack certain features which we try to address in this dissertation. One such missing component is the lack of non-monotonic features, in the knowledge representation languages, that enable us to perform common sense reasoning. For example, OWL supports the open world assumption (OWA), which means that knowledge about everything is assumed to be possibly incomplete at any point of time. However, experience has shown that there are situations that require us to assume that certain parts of the knowledge base are complete. Employing the Closed World Assumption (CWA) helps us achieve this. Circumscription is a very well-known approach towards CWA, which provides closed world semantics by employing the idea of minimal models with respect to certain predicates which are closed. We provide the formal semantics of the notion of Grounded Circumscription, which is an extension of circumscription with desirable properties like decidability. We also provide a tableaux calculus to reason over knowledge bases under the notion of grounded circumscription. Another form of common sense logic, is default logic. Default logic provides a way to specify rules that, by default, hold in most cases but not necessarily in all cases. The classic example of such a rule is: If something is a bird then it flies. The power of defaults comes from the ability of the logic to handle exceptions to the default rules. For example, a bird will be assumed to fly by default unless it is an exception, i.e. it belongs to a class of birds that do not fly, like penguins. Interestingly, this property of defaults can be utilized to create mappings between concepts of different ontologies (knowledge bases). We provide a new semantics for the integration of defaults in description logics and show that it improves upon previously known results in literature. In this study, we give various examples to show the utility and advantages of using a default logic based ontology alignment language. We provide the semantics and decidability results of a default based mapping language for tractable fragments of description logics (or OWL). Furthermore, we provide a proof of concept system and qualitative analysis of the results obtained from the system when compared to that of traditional mapping repair techniques
    corecore