148 research outputs found

    Enriching open-world knowledge graphs with expressive negative statements

    Get PDF
    Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Machine knowledge about entities and their relationships has been a long-standing goal for AI researchers. Over the last 15 years, thousands of public knowledge graphs have been automatically constructed from various web sources. They are crucial for use cases such as search engines. Yet, existing web-scale knowledge graphs focus on collecting positive statements, and store very little to no negatives. Due to their incompleteness, the truth of absent information remains unknown, which compromises the usability of the knowledge graph. In this dissertation: First, I make the case for selective materialization of salient negative statements in open-world knowledge graphs. Second, I present our methods to automatically infer them from encyclopedic and commonsense knowledge graphs, by locally inferring closed-world topics from reference comparable entities. I then discuss our evaluation fin-dings on metrics such as correctness and salience. Finally, I conclude with open challenges and future opportunities.Wissensgraphen über Entitäten und ihre Attribute sind eine wichtige Komponente vieler KI-Anwendungen. Wissensgraphen im Webmaßstab speichern fast nur positive Aussagen und übersehen negative Aussagen. Aufgrund der Unvollständigkeit von Open-World-Wissensgraphen werden fehlende Aussagen als unbekannt und nicht als falsch betrachtet. Diese Dissertation plädiert dafür, Wissensgraphen mit informativen Aussagen anzureichern, die nicht gelten, und so ihren Mehrwert für Anwendungen wie die Beantwortung von Fragen und die Zusammenfassung von Entitäten zu verbessern. Mit potenziell Milliarden negativer Aussagen von Kandidaten bewältigen wir vier Hauptherausforderungen. 1. Korrektheit (oder Plausibilität) negativer Aussagen: Unter der Open-World-Annahme (OWA) reicht es nicht aus, zu prüfen, ob ein negativer Kandidat im Wissensgraphen nicht explizit als positiv angegeben ist, da es sich möglicherweise um eine fehlende Aussage handeln kann. Von entscheidender Bedeutung sind Methoden zur Prüfung großer Kandidatengruppen, und zur Beseitigung falsch positiver Ergebnisse. 2. Bedeutung negativer Aussagen: Die Menge korrekter negativer Aussagen ist sehr groß, aber voller trivialer oder unsinniger Aussagen, z. B. “Eine Katze kann keine Daten speichern.”. Es sind Methoden zur Quantifizierung der Aussagekraft von Negativen erforderlich. 3. Abdeckung der Themen: Abhängig von der Datenquelle und den Methoden zum Abrufen von Kandidaten erhalten einige Themen oder Entitäten in demWissensgraphen möglicherweise keine negativen Kandidaten. Methoden müssen die Fähigkeit gewährleisten, Negative über fast jede bestehende Entität zu entdecken. 4. Komplexe negative Aussagen: In manchen Fällen erfordert das Ausdrücken einer Negation mehr als ein Wissensgraphen-Tripel. Beispielsweise ist “Einstein hat keine Ausbildung erhalten” eine inkorrekte Negation, aber “Einstein hat keine Ausbildung an einer US-amerikanischen Universität erhalten” ist korrekt. Es werden Methoden zur Erzeugung komplexer Negationen benötigt. Diese Dissertation geht diese Herausforderungen wie folgt an. 1. Wir plädieren zunächst für die selektive Materialisierung negativer Aussagen über Entitäten in enzyklopädischen (gut kanonisierten) Open-World-Wissensgraphen, und definieren formal drei Arten negativer Aussagen: fundiert, universell abwesend und konditionierte negative Aussagen. Wir stellen die Peer-basierte Negationsinferenz-Methode vor, um Listen hervorstechender Negationen über Entitäten zu erstellen. Die Methode berechnet relevante Peers für eine bestimmte Eingabeentität und verwendet ihre positiven Eigenschaften, um Erwartungen für die Eingabeentität festzulegen. Eine Erwartung, die nicht erfüllt ist, ist ein unmittelbar negativer Kandidat und wird dann anhand von Häufigkeits-, Wichtigkeits- und Unerwartetheitsmetriken bewertet. 2. Wir schlagen die Methode musterbasierte Abfrageprotokollextraktion vor, um hervorstechende Negationen aus umfangreichen Textquellen zu extrahieren. Diese Methode extrahiert hervorstechende Negationen über eine Entität, indem sie große Korpora, z.B., die Anfrageprotokolle von Suchmaschinen, unter Verwendung einiger handgefertigter Muster mit negativen Schlüsselwörtern sammelt. 3. Wir führen die UnCommonsense-Methode ein, um hervorstechende negative Phrasen über alltägliche Konzepte in weniger kanonisierten commonsense-KGs zu generieren. Diese Methode ist für die Negationsinferenz, Prüfung und Einstufung kurzer Phrasen in natürlicher Sprache konzipiert. Sie berechnet vergleichbare Konzepte für ein bestimmtes Zielkonzept, leitet aus dem Vergleich ihrer positiven Kandidaten Negationen ab, und prüft diese Kandidaten im Vergleich zum Wissensgraphen selbst, sowie mit Sprachmodellen (LMs) als externer Wissensquelle. Schließlich werden die Kandidaten mithilfe semantischer Ähnlichkeitserkennungshäufigkeitsmaßen eingestuft. 4. Um die Exploration unserer Methoden und ihrer Ergebnisse zu erleichtern, implementieren wir zwei Prototypensysteme. In Wikinegata wird ein System zur Präsentation der Peer-basierten Methode entwickelt, mit dem Benutzer negative Aussagen über 500K Entitäten aus 11 Klassen untersuchen und verschiedene Parameter der Peer-basierten Inferenzmethode anpassen können. Sie können den Wissensgraphen auch mithilfe einer Suchmaske mit negierten Prädikaten befragen. Im UnCommonsense-System können Benutzer genau prüfen, was die Methode bei jedem Schritt hervorbringt, sowie Negationen zu 8K alltäglichen Konzepten durchsuchen. Darüber hinaus erstellen wir mithilfe der Peer-basierten Negationsinferenzmethode den ersten groß angelegten Datensatz zu Demografie und Ausreißern in Interessengemeinschaften und zeigen dessen Nützlichkeit in Anwendungsfällen wie der Identifizierung unterrepräsentierter Gruppen. 5. Wir veröffentlichen alle in diesen Projekten erstellten Datensätze und Quellcodes unter https://www.mpi-inf.mpg.de/negation-in-kbs und https://www.mpi-inf.mpg.de/Uncommonsense

    OWL Reasoners still useable in 2023

    Full text link
    In a systematic literature and software review over 100 OWL reasoners/systems were analyzed to see if they would still be usable in 2023. This has never been done in this capacity. OWL reasoners still play an important role in knowledge organisation and management, but the last comprehensive surveys/studies are more than 8 years old. The result of this work is a comprehensive list of 95 standalone OWL reasoners and systems using an OWL reasoner. For each item, information on project pages, source code repositories and related documentation was gathered. The raw research data is provided in a Github repository for anyone to use

    Prototyping and Evaluation of Sensor Data Integration in Cloud Platforms

    Get PDF
    The SFI Smart Ocean centre has initiated a long-running project which consists of developing a wireless and autonomous marine observation system for monitoring of underwater environments and structures. The increasing popularity of integrating the Internet of Things (IoT) with Cloud Computing has led to promising infrastructures that could realize Smart Ocean's goals. The project will utilize underwater wireless sensor networks (UWSNs) for collecting data in the marine environments and develop a cloud-based platform for retrieving, processing, and storing all the sensor data. Currently, the project is in its early stages and the collaborating partners are researching approaches and technologies that can potentially be utilized. This thesis contributes to the centre's ongoing research, focusing on the aspect of how sensor data can be integrated into three different cloud platforms: Microsoft Azure, Amazon Web Services, and the Google Cloud Platform. The goals were to develop prototypes that could successfully send data to the chosen cloud platforms and evaluate their applicability in context of the Smart Ocean project. In order to determine the most suitable option, each platform was evaluated based on set of defined criteria, focusing on their sensor data integration capabilities. The thesis has also investigated the cloud platforms' supported protocol bindings, as well as several candidate technologies for metadata standards and compared them in surveys. Our evaluation results shows that all three cloud platforms handle sensor data integration in very similar ways, offering a set of cloud services relevant for creating diverse IoT solutions. However, the Google Cloud Platform ranks at the bottom due to the lack of IoT focus on their platform, with less service options, features, and capabilities compared to the other two. Both Microsoft Azure and Amazon Web Services rank very close to each other, as they provide many of the same sensor data integration capabilities, making them the most applicable options.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Selectional Restriction Extraction for Frame-Based Knowledge Graph Augmentation

    Get PDF
    The Semantic Web is an ambitious project aimed at creating a global, machine-readable web of data, to enable intelligent agents to access and reason over this data. Ontologies are a key component of the Semantic Web, as they provide a formal description of the concepts and relationships in a particular domain. Exploiting the expressiveness of knowledge graphs together with a more logically sound ontological schema can be crucial to represent consistent knowledge and inferring new relations over the data. In other words, constraining the entities and predicates of knowledge graphs leads to improved semantics. The same benefits can be found for restrictions over linguistic resources, which are knowledge graphs used to represent natural language. More specifically, it is possible to specify constraints on the arguments that can be associated with a given frame, based on their semantic roles (selectional restrictions). However, most of the linguistic resources define very general restrictions because they must be able to represent different domains. Hence, the main research question tackled by this thesis is whether the use of domain-specific selectional restrictions is useful for ontology augmentation, ontology definition and neuro-symbolic tasks on knowledge graphs. To this end, we have developed a tool to empirically extract selectional restrictions and their probabilities. The obtained constraints are represented in OWL-Star and subsequently mapped into OWL: we show that the mapping is information preserving and invertible if certain conditions hold. The OWL ontologies are inserted inside Framester, an open lexical-semantic resource for the English language, resulting in an improved and augmented language resource hub. The use of selectional restrictions is also tested for ontology documentation and neuro-symbolic tasks, showing how they can be exploited to provide meaningful results

    Big Earth Data and Machine Learning for Sustainable and Resilient Agriculture

    Full text link
    Big streams of Earth images from satellites or other platforms (e.g., drones and mobile phones) are becoming increasingly available at low or no cost and with enhanced spatial and temporal resolution. This thesis recognizes the unprecedented opportunities offered by the high quality and open access Earth observation data of our times and introduces novel machine learning and big data methods to properly exploit them towards developing applications for sustainable and resilient agriculture. The thesis addresses three distinct thematic areas, i.e., the monitoring of the Common Agricultural Policy (CAP), the monitoring of food security and applications for smart and resilient agriculture. The methodological innovations of the developments related to the three thematic areas address the following issues: i) the processing of big Earth Observation (EO) data, ii) the scarcity of annotated data for machine learning model training and iii) the gap between machine learning outputs and actionable advice. This thesis demonstrated how big data technologies such as data cubes, distributed learning, linked open data and semantic enrichment can be used to exploit the data deluge and extract knowledge to address real user needs. Furthermore, this thesis argues for the importance of semi-supervised and unsupervised machine learning models that circumvent the ever-present challenge of scarce annotations and thus allow for model generalization in space and time. Specifically, it is shown how merely few ground truth data are needed to generate high quality crop type maps and crop phenology estimations. Finally, this thesis argues there is considerable distance in value between model inferences and decision making in real-world scenarios and thereby showcases the power of causal and interpretable machine learning in bridging this gap.Comment: Phd thesi

    Inspecting Java Program States with Semantic Web Technologies

    Get PDF
    Semantic debugging, as introduced by Kamburjan et al., refers to the practice of applying technologies of the semantic web to query the run-time state of a program and combine it with external domain knowledge. This master thesis aims to take the first step toward making the benefits of semantic debugging available for real-world application development. For this purpose, we implement a semantic debugging tool for the Java programming language, called the Semantic Java Debugger or sjdb. The sjdb tool provides an interactive, command line-based user interface through which users can (1) run Java programs and suspend their execution at user-defined breakpoints, (2) automatically extract RDF knowledge bases with description logic semantics that describe the current state of the program, (3) optionally supplement the knowledge base with external domain knowledge formalized in OWL, (4) run (semantic) queries on this extended knowledge base, and resolve the query results back to Java objects. As part of this debugging tool, the development of an extraction mechanism for knowledge bases from the states of suspended Java programs is one of the main contributions of this thesis. For this purpose, we also devise an OWL formalization of Java runtime states to structure this extraction process and give meaning to the resulting knowledge base. Moreover, case studies are conducted to demonstrate the capabilities of sjdb, but also to identify its limitations, as well as its response times and memory requirements

    First-Order Rewritability and Complexity of Two-Dimensional Temporal Ontology-Mediated Queries

    Get PDF
    Aiming at ontology-based data access to temporal data, we design two-dimensional temporal ontology and query languages by combining logics from the (extended) DL-Lite family with linear temporal logic LTL over discrete time (Z,<). Our main concern is first-order rewritability of ontology-mediated queries (OMQs) that consist of a 2D ontology and a positive temporal instance query. Our target languages for FO-rewritings are two-sorted FO(<) - first-order logic with sorts for time instants ordered by the built-in precedence relation < and for the domain of individuals - its extension FOE with the standard congruence predicates t \equiv 0 mod n, for any fixed n > 1, and FO(RPR) that admits relational primitive recursion. In terms of circuit complexity, FOE- and FO(RPR)-rewritability guarantee answering OMQs in uniform AC0 and NC1, respectively. We proceed in three steps. First, we define a hierarchy of 2D DL-Lite/LTL ontology languages and investigate the FO-rewritability of OMQs with atomic queries by constructing projections onto 1D LTL OMQs and employing recent results on the FO-rewritability of propositional LTL OMQs. As the projections involve deciding consistency of ontologies and data, we also consider the consistency problem for our languages. While the undecidability of consistency for 2D ontology languages with expressive Boolean role inclusions might be expected, we also show that, rather surprisingly, the restriction to Krom and Horn role inclusions leads to decidability (and ExpSpace-completeness), even if one admits full Booleans on concepts. As a final step, we lift some of the rewritability results for atomic OMQs to OMQs with expressive positive temporal instance queries. The lifting results are based on an in-depth study of the canonical models and only concern Horn ontologies
    corecore