6 research outputs found

    Automated Knowledge Graph Completion for Natural Language Understanding: Known Paths and Future Directions

    Get PDF
    Knowledge Graphs (KGs) are large collections of structured data that can model real world knowledge and are important assets for the companies that employ them. KGs are usually constructed iteratively and often show a sparse structure. Also, as knowledge evolves, KGs must be updated and completed. Many automatic methods for KG Completion (KGC) have been proposed in the literature to reduce the costs associated with manual maintenance. Motivated by an industrial case study aiming to enrich a KG specifically designed for Natural Language Understanding tasks, this paper presents an overview of classical and modern deep learning completion methods. In particular, we delve into Large Language Models (LLMs), which are the most promising deep learning architectures. We show that their applications to KGC are affected by several shortcomings, namely they neglect the structure of KG and treat KGC as a classification problem. Such limitations, together with the brittleness of the LLMs themselves, stress the need to create KGC solutions at the interface between symbolic and neural approaches and lead to the way ahead for future research in intelligible corpus-based KGC

    KnowText: Auto-generated Knowledge Graphs for custom domain applications

    Get PDF
    While industrial Knowledge Graphs enable information extraction from massive data volumes creating the backbone of the Semantic Web, the specialised, custom designed knowledge graphs focused on enterprise specific information are an emerging trend. We present “KnowText”, an application that performs automatic generation of custom Knowledge Graphs from unstructured text and enables fast information extraction based on graph visualisation and free text query methods designed for non-specialist users. An OWL ontology automatically extracted from text is linked to the knowledge graph and used as a knowledge base. A basic ontological schema is provided including 16 Classes and Data type Properties. The extracted facts and the OWL ontology can be downloaded and further refined. KnowText is designed for applications in business (CRM, HR, banking). Custom KG can serve for locally managing existing data, often stored as “sensitive” information or proprietary accounts, which are not on open web access. KnowText deploys a custom KG from a collection of text documents and enable fast information extraction based on its graph based visualisation and text based query methods

    An Automatic Ontology Generation Framework with An Organizational Perspective

    Get PDF
    Ontologies have been known for their powerful semantic representation of knowledge. However, ontologies cannot automatically evolve to reflect updates that occur in respective domains. To address this limitation, researchers have called for automatic ontology generation from unstructured text corpus. Unfortunately, systems that aim to generate ontologies from unstructured text corpus are domain-specific and require manual intervention. In addition, they suffer from uncertainty in creating concept linkages and difficulty in finding axioms for the same concept. Knowledge Graphs (KGs) has emerged as a powerful model for the dynamic representation of knowledge. However, KGs have many quality limitations and need extensive refinement. This research aims to develop a novel domain-independent automatic ontology generation framework that converts unstructured text corpus into domain consistent ontological form. The framework generates KGs from unstructured text corpus as well as refine and correct them to be consistent with domain ontologies. The power of the proposed automatically generated ontology is that it integrates the dynamic features of KGs and the quality features of ontologies

    Efficient Symbolic Learning over Knowledge Graphs

    Get PDF
    Knowledge Graphs (KG) are repositories of structured information. Inductive Logic Programming (ILP) can be used over these KGs to mine logical rules which can then be used to deduce new information and learn new facts from these KGs. Over the years, many algorithms have been developed for this purpose, almost all requiring the complete KG to be present in the main memory at some point of their execution. With increasing sizes of the KGs, owing to the improvement in the knowledge extraction mechanisms, the application of these algorithms is being renderedless and less feasible locally. Due to the sheer size of these KGs, many of them don’t even fit in the memory of normal computing devices. These KGs can, however, also be represented in RDF making them structured and queriable using the SPARQL endpoints. And thanks to software like Openlink’s Virtuoso, these queriable KGs can be hosted on a server as SPARQL endpoints. In light of this fact, an effort was undertaken to develop an algorithm that overcomes the memory bottleneck of the current logical rule mining procedures by using SPARQL endpoints. To that end, one of the state-of-the-art algorithms AMIE was taken as a reference to create a new algorithm that mines logical rules over these KGs by querying the SPARQL endpoints on which they are hosted, effectively overcoming the aforementioned memory bottleneck, allowing us to mine rules (and eventually deduce new information) locally

    Cross-Lingual Entity Matching for Knowledge Graphs

    Get PDF
    Multilingual knowledge graphs (KGs), such as YAGO and DBpedia, represent entities in different languages. The task of cross-lingual entity matching is to align entities in a source language with their counterparts in target languages. In this thesis, we investigate embedding-based approaches to encode entities from multilingual KGs into the same vector space, where equivalent entities are close to each other. Specifically, we apply graph convolutional networks (GCNs) to combine multi-aspect information of entities, including topological connections, relations, and attributes of entities, to learn entity embeddings. To exploit the literal descriptions of entities expressed in different languages, we propose two uses of a pre-trained multilingual BERT model to bridge cross-lingual gaps. We further propose two strategies to integrate GCN-based and BERT-based modules to boost performance. Extensive experiments on two benchmark datasets demonstrate that our method significantly outperforms existing systems. We additionally introduce a new dataset comprised of 15 low-resource languages and featured with unlinkable cases to draw closer to the real-world challenges
    corecore