322 research outputs found

    Natural Language Reasoning on ALC knowledge bases using Large Language Models

    Get PDF
    Τα προεκπαιδευμένα γλωσσικά μοντέλα έχουν κυριαρχήσει στην επεξεργασία φυσικής γλώσσας, αποτελώντας πρόκληση για τη χρήση γλωσσών αναπαράστασης γνώσης για την περιγραφή του κόσμου. Ενώ οι γλώσσες αυτές δεν είναι αρκετά εκφραστικές για να καλύψουν πλήρως τη φυσική γλώσσα, τα γλωσσικά μοντέλα έχουν ήδη δείξει σπουδαία αποτελέσματα όσον αφορά την κατανόηση και την ανάκτηση πληροφοριών απευθείας σε δεδομένα φυσικής γλώσσας. Διερευνούμε τις επιδόσεις των γλωσσικών μοντέλων για συλλογιστική φυσικής γλώσσας στη περιγραφική λογική ALC. Δημιουργούμε ένα σύνολο δεδομένων από τυχαίες βάσεις γνώσης ALC, μεταφρασμένες σε φυσική γλώσσα, ώστε να αξιολογήσουμε την ικανότητα των γλωσσικών μοντέλων να λειτουργούν ως συστήματα απάντησης ερωτήσεων πάνω σε βάσεις γνώσης φυσικής γλώσσας.Pretrained language models have dominated natural language processing, challenging the use of knowledge representation languages to describe the world. While these lan- guages are not expressive enough to fully cover natural language, language models have already shown great results in terms of understanding and information retrieval directly on natural language data. We explore language models’ performance at the downstream task of natural language reasoning in the description logic ALC. We generate a dataset of random ALC knowledge bases, translated in natural language, in order to assess the language models’ ability to function as question-answering systems over natural language knowledge bases

    Knowledge extraction from unstructured data

    Get PDF
    Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models

    Neural-symbolic learning for knowledge base completion

    Get PDF
    A query answering task computes the prediction scores of ground queries inferred from a Knowledge Base (KB). Traditional symbolic-based methods solve this task using ‘exact’ provers. However, they are not very scalable and difficult to apply to current large KBs. Sub-symbolic methods have recently been proposed to address this problem. They require to be trained to learn the semantics of the symbolic representation and use it to make predictions about query answering. Such predictions may rely upon unknown rules over the given KB. Not all proposed sub-symbolic systems are capable of inducing rules from the KB; and even more challenging is the learning of rules that are human interpretable. Some approaches, e.g., those based on a Neural Theorem Prover (NTP), are able to address this problem but with limited scalability and expressivity of the rules that they can induce. We take inspiration from the NTP framework and propose three sub-symbolic architectures that solve the query answering task in a scalable manner while supporting the induction of more expressive rules. Two of these architectures, called Topical NTP (TNTP) and Topic-Subdomain NTP (TSNTP), address the scalability aspect. Trained representations of predicates and constants are clustered and the soft-unification of the backward chaining proof procedure that they use is controlled by these clusters. The third architecture, called Negation-as-Failure TSNTP (NAF TSNTP), addresses the expressivity of the induced rules by supporting the learning of rules with negation-as-failure. All these architectures make use of additional hyperparameters that encourage the learning of induced rules during training. Each architecture is evaluated over benchmark datasets with increased complexity in size of the KB, number of predicates and constants present in the KB, and level of incompleteness of the KB with respect to test sets. The evaluation measures the accuracy of query answering prediction and computational time. The former uses two key metrics, AUC_PR and HITS, adopted also by existing sub-symbolic systems that solve the same task, whereas the computational time is in terms of CPU training time. The evaluation performance of our systems is compared against that of existing state-of-the-art sub-symbolic systems, showing that our approaches are indeed in most cases more accurate in solving query answering tasks, whilst being more efficient in computational time. The increased accuracy in some tasks is specifically due to the learning of more expressive rules, thus demonstrating the importance of increased expressivity in rule induction.Open Acces

    Meta-ontology fault detection

    Get PDF
    Ontology engineering is the field, within knowledge representation, concerned with using logic-based formalisms to represent knowledge, typically moderately sized knowledge bases called ontologies. How to best develop, use and maintain these ontologies has produced relatively large bodies of both formal, theoretical and methodological research. One subfield of ontology engineering is ontology debugging, and is concerned with preventing, detecting and repairing errors (or more generally pitfalls, bad practices or faults) in ontologies. Due to the logical nature of ontologies and, in particular, entailment, these faults are often both hard to prevent and detect and have far reaching consequences. This makes ontology debugging one of the principal challenges to more widespread adoption of ontologies in applications. Moreover, another important subfield in ontology engineering is that of ontology alignment: combining multiple ontologies to produce more powerful results than the simple sum of the parts. Ontology alignment further increases the issues, difficulties and challenges of ontology debugging by introducing, propagating and exacerbating faults in ontologies. A relevant aspect of the field of ontology debugging is that, due to the challenges and difficulties, research within it is usually notably constrained in its scope, focusing on particular aspects of the problem or on the application to only certain subdomains or under specific methodologies. Similarly, the approaches are often ad hoc and only related to other approaches at a conceptual level. There are no well established and widely used formalisms, definitions or benchmarks that form a foundation of the field of ontology debugging. In this thesis, I tackle the problem of ontology debugging from a more abstract than usual point of view, looking at existing literature in the field and attempting to extract common ideas and specially focussing on formulating them in a common language and under a common approach. Meta-ontology fault detection is a framework for detecting faults in ontologies that utilizes semantic fault patterns to express schematic entailments that typically indicate faults in a systematic way. The formalism that I developed to represent these patterns is called existential second-order query logic (abbreviated as ESQ logic). I further reformulated a large proportion of the ideas present in some of the existing research pieces into this framework and as patterns in ESQ logic, providing a pattern catalogue. Most of the work during my PhD has been spent in designing and implementing an algorithm to effectively automatically detect arbitrary ESQ patterns in arbitrary ontologies. The result is what we call minimal commitment resolution for ESQ logic, an extension of first-order resolution, drawing on important ideas from higher-order unification and implementing a novel approach to unification problems using dependency graphs. I have proven important theoretical properties about this algorithm such as its soundness, its termination (in a certain sense and under certain conditions) and its fairness or completeness in the enumeration of infinite spaces of solutions. Moreover, I have produced an implementation of minimal commitment resolution for ESQ logic in Haskell that has passed all unit tests and produces non-trivial results on small examples. However, attempts to apply this algorithm to examples of a more realistic size have proven unsuccessful, with computation times that exceed our tolerance levels. In this thesis, I have provided both details of the challenges faced in this regard, as well as other successful forms of qualitative evaluation of the meta-ontology fault detection approach, and discussions about both what I believe are the main causes of the computational feasibility problems, ideas on how to overcome them, and also ideas on other directions of future work that could use the results in the thesis to contribute to the production of foundational formalisms, ideas and approaches to ontology debugging that can properly combine existing constrained research. It is unclear to me whether minimal commitment resolution for ESQ logic can, in its current shape, be implemented efficiently or not, but I believe that, at the very least, the theoretical and conceptual underpinnings that I have presented in this thesis will be useful to produce more foundational results in the field

    An ontology-based approach for modelling and querying Alzheimer’s disease data

    Get PDF
    Background The recent advances in biotechnology and computer science have led to an ever-increasing availability of public biomedical data distributed in large databases worldwide. However, these data collections are far from being "standardized" so to be harmonized or even integrated, making it impossible to fully exploit the latest machine learning technologies for the analysis of data themselves. Hence, facing this huge flow of biomedical data is a challenging task for researchers and clinicians due to their complexity and high heterogeneity. This is the case of neurodegenerative diseases and the Alzheimer's Disease (AD) in whose context specialized data collections such as the one by the Alzheimer's Disease Neuroimaging Initiative (ADNI) are maintained.Methods Ontologies are controlled vocabularies that allow the semantics of data and their relationships in a given domain to be represented. They are often exploited to aid knowledge and data management in healthcare research. Computational Ontologies are the result of the combination of data management systems and traditional ontologies. Our approach is i) to define a computational ontology representing a logic-based formal conceptual model of the ADNI data collection and ii) to provide a means for populating the ontology with the actual data in the Alzheimer Disease Neuroimaging Initiative (ADNI). These two components make it possible to semantically query the ADNI database in order to support data extraction in a more intuitive manner.Results We developed: i) a detailed computational ontology for clinical multimodal datasets from the ADNI repository in order to simplify the access to these data; ii) a means for populating this ontology with the actual ADNI data. Such computational ontology immediately makes it possible to facilitate complex queries to the ADNI files, obtaining new diagnostic knowledge about Alzheimer's disease.Conclusions The proposed ontology will improve the access to the ADNI dataset, allowing queries to extract multivariate datasets to perform multidimensional and longitudinal statistical analyses. Moreover, the proposed ontology can be a candidate for supporting the design and implementation of new information systems for the collection and management of AD data and metadata, and for being a reference point for harmonizing or integrating data residing in different sources

    Building Blocks for IoT Analytics Internet-of-Things Analytics

    Get PDF
    Internet-of-Things (IoT) Analytics are an integral element of most IoT applications, as it provides the means to extract knowledge, drive actuation services and optimize decision making. IoT analytics will be a major contributor to IoT business value in the coming years, as it will enable organizations to process and fully leverage large amounts of IoT data, which are nowadays largely underutilized. The Building Blocks of IoT Analytics is devoted to the presentation the main technology building blocks that comprise advanced IoT analytics systems. It introduces IoT analytics as a special case of BigData analytics and accordingly presents leading edge technologies that can be deployed in order to successfully confront the main challenges of IoT analytics applications. Special emphasis is paid in the presentation of technologies for IoT streaming and semantic interoperability across diverse IoT streams. Furthermore, the role of cloud computing and BigData technologies in IoT analytics are presented, along with practical tools for implementing, deploying and operating non-trivial IoT applications. Along with the main building blocks of IoT analytics systems and applications, the book presents a series of practical applications, which illustrate the use of these technologies in the scope of pragmatic applications. Technical topics discussed in the book include: Cloud Computing and BigData for IoT analyticsSearching the Internet of ThingsDevelopment Tools for IoT Analytics ApplicationsIoT Analytics-as-a-ServiceSemantic Modelling and Reasoning for IoT AnalyticsIoT analytics for Smart BuildingsIoT analytics for Smart CitiesOperationalization of IoT analyticsEthical aspects of IoT analyticsThis book contains both research oriented and applied articles on IoT analytics, including several articles reflecting work undertaken in the scope of recent European Commission funded projects in the scope of the FP7 and H2020 programmes. These articles present results of these projects on IoT analytics platforms and applications. Even though several articles have been contributed by different authors, they are structured in a well thought order that facilitates the reader either to follow the evolution of the book or to focus on specific topics depending on his/her background and interest in IoT and IoT analytics technologies. The compilation of these articles in this edited volume has been largely motivated by the close collaboration of the co-authors in the scope of working groups and IoT events organized by the Internet-of-Things Research Cluster (IERC), which is currently a part of EU's Alliance for Internet of Things Innovation (AIOTI)
    corecore