252 research outputs found
Natural Language Reasoning on ALC knowledge bases using Large Language Models
Τα προεκπαιδευμένα γλωσσικά μοντέλα έχουν κυριαρχήσει στην επεξεργασία φυσικής
γλώσσας, αποτελώντας πρόκληση για τη χρήση γλωσσών αναπαράστασης γνώσης για
την περιγραφή του κόσμου. Ενώ οι γλώσσες αυτές δεν είναι αρκετά εκφραστικές για να
καλύψουν πλήρως τη φυσική γλώσσα, τα γλωσσικά μοντέλα έχουν ήδη δείξει σπουδαία
αποτελέσματα όσον αφορά την κατανόηση και την ανάκτηση πληροφοριών απευθείας σε
δεδομένα φυσικής γλώσσας. Διερευνούμε τις επιδόσεις των γλωσσικών μοντέλων για
συλλογιστική φυσικής γλώσσας στη περιγραφική λογική ALC. Δημιουργούμε ένα σύνολο
δεδομένων από τυχαίες βάσεις γνώσης ALC, μεταφρασμένες σε φυσική γλώσσα, ώστε
να αξιολογήσουμε την ικανότητα των γλωσσικών μοντέλων να λειτουργούν ως συστήματα
απάντησης ερωτήσεων πάνω σε βάσεις γνώσης φυσικής γλώσσας.Pretrained language models have dominated natural language processing, challenging
the use of knowledge representation languages to describe the world. While these lan-
guages are not expressive enough to fully cover natural language, language models have
already shown great results in terms of understanding and information retrieval directly
on natural language data. We explore language models’ performance at the downstream
task of natural language reasoning in the description logic ALC. We generate a dataset
of random ALC knowledge bases, translated in natural language, in order to assess the
language models’ ability to function as question-answering systems over natural language
knowledge bases
Knowledge extraction from unstructured data
Data availability is becoming more essential, considering the current growth of web-based data. The data available on the web are represented as unstructured, semi-structured, or structured data. In order to make the web-based data available for several Natural Language Processing or Data Mining tasks, the data needs to be presented as machine-readable data in a structured format. Thus, techniques for addressing the problem of capturing knowledge from unstructured data sources are needed. Knowledge extraction methods are used by the research communities to address this problem; methods that are able to capture knowledge in a natural language text and map the extracted knowledge to existing knowledge presented in knowledge graphs (KGs). These knowledge extraction methods include Named-entity recognition, Named-entity Disambiguation, Relation Recognition, and Relation Linking. This thesis addresses the problem of extracting knowledge over unstructured data and discovering patterns in the extracted knowledge. We devise a rule-based approach for entity and relation recognition and linking. The defined approach effectively maps entities and relations within a text to their resources in a target KG. Additionally, it overcomes the challenges of recognizing and linking entities and relations to a specific KG by employing devised catalogs of linguistic and domain-specific rules that state the criteria to recognize entities in a sentence of a particular language, and a deductive database that encodes knowledge in community-maintained KGs. Moreover, we define a Neuro-symbolic approach for the tasks of knowledge extraction in encyclopedic and domain-specific domains; it combines symbolic and sub-symbolic components to overcome the challenges of entity recognition and linking and the limitation of the availability of training data while maintaining the accuracy of recognizing and linking entities. Additionally, we present a context-aware framework for unveiling semantically related posts in a corpus; it is a knowledge-driven framework that retrieves associated posts effectively. We cast the problem of unveiling semantically related posts in a corpus into the Vertex Coloring Problem. We evaluate the performance of our techniques on several benchmarks related to various domains for knowledge extraction tasks. Furthermore, we apply these methods in real-world scenarios from national and international projects. The outcomes show that our techniques are able to effectively extract knowledge encoded in unstructured data and discover patterns over the extracted knowledge presented as machine-readable data. More importantly, the evaluation results provide evidence to the effectiveness of combining the reasoning capacity of the symbolic frameworks with the power of pattern recognition and classification of sub-symbolic models
Meta-ontology fault detection
Ontology engineering is the field, within knowledge representation, concerned with using logic-based formalisms to represent knowledge, typically moderately sized knowledge bases called ontologies. How to best develop, use and maintain these ontologies has produced relatively large bodies of both formal, theoretical and methodological research.
One subfield of ontology engineering is ontology debugging, and is concerned with preventing, detecting and repairing errors (or more generally pitfalls, bad practices or faults) in ontologies. Due to the logical nature of ontologies and, in particular, entailment, these faults are often both hard to prevent and detect and have far reaching consequences. This makes ontology debugging one of the principal challenges to more widespread adoption of ontologies in applications.
Moreover, another important subfield in ontology engineering is that of ontology alignment: combining multiple ontologies to produce more powerful results than the simple sum of the parts. Ontology alignment further increases the issues, difficulties and challenges of ontology debugging by introducing, propagating and exacerbating faults in ontologies.
A relevant aspect of the field of ontology debugging is that, due to the challenges and difficulties, research within it is usually notably constrained in its scope, focusing on particular aspects of the problem or on the application to only certain subdomains or under specific methodologies. Similarly, the approaches are often ad hoc and only related to other approaches at a conceptual level. There are no well established and widely used formalisms, definitions or benchmarks that form a foundation of the field of ontology debugging.
In this thesis, I tackle the problem of ontology debugging from a more abstract than usual point of view, looking at existing literature in the field and attempting to extract common ideas and specially focussing on formulating them in a common language and under a common approach. Meta-ontology fault detection is a framework for detecting faults in ontologies that utilizes semantic fault patterns to express schematic entailments that typically indicate faults in a systematic way. The formalism that I developed to represent these patterns is called existential second-order query logic (abbreviated as ESQ logic). I further reformulated a large proportion of the ideas present in some of the existing research pieces into this framework and as patterns in ESQ logic, providing a pattern catalogue.
Most of the work during my PhD has been spent in designing and implementing
an algorithm to effectively automatically detect arbitrary ESQ patterns in arbitrary ontologies. The result is what we call minimal commitment resolution for ESQ logic, an extension of first-order resolution, drawing on important ideas from higher-order unification and implementing a novel approach to unification problems using dependency graphs. I have proven important theoretical properties about this algorithm such as its soundness, its termination (in a certain sense and under certain conditions) and its fairness or completeness in the enumeration of infinite spaces of solutions.
Moreover, I have produced an implementation of minimal commitment resolution for ESQ logic in Haskell that has passed all unit tests and produces non-trivial results on small examples. However, attempts to apply this algorithm to examples of a more realistic size have proven unsuccessful, with computation times that exceed our tolerance levels.
In this thesis, I have provided both details of the challenges faced in this regard,
as well as other successful forms of qualitative evaluation of the meta-ontology fault detection approach, and discussions about both what I believe are the main causes of the computational feasibility problems, ideas on how to overcome them, and also ideas on other directions of future work that could use the results in the thesis to contribute to the production of foundational formalisms, ideas and approaches to ontology debugging that can properly combine existing constrained research. It is unclear to me whether minimal commitment resolution for ESQ logic can, in its current shape, be implemented efficiently or not, but I believe that, at the very least, the theoretical and conceptual underpinnings that I have presented in this thesis will be useful to produce more
foundational results in the field
Building Blocks for IoT Analytics Internet-of-Things Analytics
Internet-of-Things (IoT) Analytics are an integral element of most IoT applications, as it provides the means to extract knowledge, drive actuation services and optimize decision making. IoT analytics will be a major contributor to IoT business value in the coming years, as it will enable organizations to process and fully leverage large amounts of IoT data, which are nowadays largely underutilized. The Building Blocks of IoT Analytics is devoted to the presentation the main technology building blocks that comprise advanced IoT analytics systems. It introduces IoT analytics as a special case of BigData analytics and accordingly presents leading edge technologies that can be deployed in order to successfully confront the main challenges of IoT analytics applications. Special emphasis is paid in the presentation of technologies for IoT streaming and semantic interoperability across diverse IoT streams. Furthermore, the role of cloud computing and BigData technologies in IoT analytics are presented, along with practical tools for implementing, deploying and operating non-trivial IoT applications. Along with the main building blocks of IoT analytics systems and applications, the book presents a series of practical applications, which illustrate the use of these technologies in the scope of pragmatic applications. Technical topics discussed in the book include: Cloud Computing and BigData for IoT analyticsSearching the Internet of ThingsDevelopment Tools for IoT Analytics ApplicationsIoT Analytics-as-a-ServiceSemantic Modelling and Reasoning for IoT AnalyticsIoT analytics for Smart BuildingsIoT analytics for Smart CitiesOperationalization of IoT analyticsEthical aspects of IoT analyticsThis book contains both research oriented and applied articles on IoT analytics, including several articles reflecting work undertaken in the scope of recent European Commission funded projects in the scope of the FP7 and H2020 programmes. These articles present results of these projects on IoT analytics platforms and applications. Even though several articles have been contributed by different authors, they are structured in a well thought order that facilitates the reader either to follow the evolution of the book or to focus on specific topics depending on his/her background and interest in IoT and IoT analytics technologies. The compilation of these articles in this edited volume has been largely motivated by the close collaboration of the co-authors in the scope of working groups and IoT events organized by the Internet-of-Things Research Cluster (IERC), which is currently a part of EU's Alliance for Internet of Things Innovation (AIOTI)
Knowledge Augmented Machine Learning with Applications in Autonomous Driving: A Survey
The existence of representative datasets is a prerequisite of many successful artificial intelligence and machine learning models. However, the subsequent application of these models often involves scenarios that are inadequately represented in the data used for training. The reasons for this are manifold and range from time and cost constraints to ethical considerations. As a consequence, the reliable use of these models, especially in safety-critical applications, is a huge challenge. Leveraging additional, already existing sources of knowledge is key to overcome the limitations of purely data-driven approaches, and eventually to increase the generalization capability of these models. Furthermore, predictions that conform with knowledge are crucial for making trustworthy and safe decisions even in underrepresented scenarios. This work provides an overview of existing techniques and methods in the literature that combine data-based models with existing knowledge. The identified approaches are structured according to the categories integration, extraction and conformity. Special attention is given to applications in the field of autonomous driving
Principled Flow Tracking in IoT and Low-Level Applications
Significant fractions of our lives are spent digitally, connected to and dependent on Internet-based applications, be it through the Web, mobile, or IoT. All such applications have access to and are entrusted with private user data, such as location, photos, browsing habits, private feed from social networks, or bank details.In this thesis, we focus on IoT and Web(Assembly) apps. We demonstrate IoT apps to be vulnerable to attacks by malicious app makers who are able to bypass the sandboxing mechanisms enforced by the platform to stealthy exfiltrate user data. We further give examples of carefully crafted WebAssembly code abusing the semantics to leak user data.We are interested in applying language-based technologies to ensure application security due to the formal guarantees they provide. Such technologies analyze the underlying program and track how the information flows in an application, with the goal of either statically proving its security, or preventing insecurities from happening at runtime. As such, for protecting against the attacks on IoT apps, we develop both static and dynamic methods, while for securing WebAssembly apps we describe a hybrid approach, combining both.While language-based technologies provide strong security guarantees, they are still to see a widespread adoption outside the academic community where they emerged.In this direction, we outline six design principles to assist the developer in choosing the right security characterization and enforcement mechanism for their system.We further investigate the relative expressiveness of two static enforcement mechanisms which pursue fine- and coarse-grained approaches for tracking the flow of sensitive information in a system.\ua0Finally, we provide the developer with an automatic method for reducing the manual burden associated with some of the language-based enforcements
- …