248 research outputs found

    Game with a Purpose for Mappings Verification

    Full text link

    Mining Meaning from Wikipedia

    Get PDF
    Wikipedia is a goldmine of information; not just for its many readers, but also for the growing community of researchers who recognize it as a resource of exceptional scale and utility. It represents a vast investment of manual effort and judgment: a huge, constantly evolving tapestry of concepts and relations that is being applied to a host of tasks. This article provides a comprehensive description of this work. It focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building. The article addresses how Wikipedia is being used as is, how it is being improved and adapted, and how it is being combined with other structures to create entirely new resources. We identify the research groups and individuals involved, and how their work has developed in the last few years. We provide a comprehensive list of the open-source software they have produced.Comment: An extensive survey of re-using information in Wikipedia in natural language processing, information retrieval and extraction and ontology building. Accepted for publication in International Journal of Human-Computer Studie

    Question Generation from Knowledge Graphs

    No full text

    Commonsense knowledge acquisition and applications

    Get PDF
    Computers are increasingly expected to make smart decisions based on what humans consider commonsense. This would require computers to understand their environment, including properties of objects in the environment (e.g., a wheel is round), relations between objects (e.g., two wheels are part of a bike, or a bike is slower than a car) and interactions of objects (e.g., a driver drives a car on the road). The goal of this dissertation is to investigate automated methods for acquisition of large-scale, semantically organized commonsense knowledge. Prior state-of-the-art methods to acquire commonsense are either not automated or based on shallow representations. Thus, they cannot produce large-scale, semantically organized commonsense knowledge. To achieve the goal, we divide the problem space into three research directions, constituting our core contributions: 1. Properties of objects: acquisition of properties like hasSize, hasShape, etc. We develop WebChild, a semi-supervised method to compile semantically organized properties. 2. Relationships between objects: acquisition of relations like largerThan, partOf, memberOf, etc. We develop CMPKB, a linear-programming based method to compile comparative relations, and, we develop PWKB, a method based on statistical and logical inference to compile part-whole relations. 3. Interactions between objects: acquisition of activities like drive a car, park a car, etc., with attributes such as temporal or spatial attributes. We develop Knowlywood, a method based on semantic parsing and probabilistic graphical models to compile activity knowledge. Together, these methods result in the construction of a large, clean and semantically organized Commonsense Knowledge Base that we call WebChild KB.Von Computern wird immer mehr erwartet, dass sie kluge Entscheidungen treffen können, basierend auf Allgemeinwissen. Dies setzt voraus, dass Computer ihre Umgebung, einschließlich der Eigenschaften von Objekten (z. B. das Rad ist rund), Beziehungen zwischen Objekten (z. B. ein Fahrrad hat zwei RĂ€der, ein Fahrrad ist langsamer als ein Auto) und Interaktionen von Objekten (z. B. ein Fahrer fĂ€hrt ein Auto auf der Straße), verstehen können. Das Ziel dieser Dissertation ist es, automatische Methoden fĂŒr die Erfassung von großmaßstĂ€blichem, semantisch organisiertem Allgemeinwissen zu schaffen. Dies ist schwierig aufgrund folgender Eigenschaften des Allgemeinwissens. Es ist: (i) implizit und spĂ€rlich, da Menschen nicht explizit das Offensichtliche ausdrĂŒcken, (ii) multimodal, da es ĂŒber textuelle und visuelle Inhalte verteilt ist, (iii) beeintrĂ€chtigt vom Einfluss des Berichtenden, da ungewöhnliche Fakten disproportional hĂ€ufig berichtet werden, (iv) KontextabhĂ€ngig, und hat aus diesem Grund eine eingeschrĂ€nkte statistische Konfidenz. Vorherige Methoden, auf diesem Gebiet sind entweder nicht automatisiert oder basieren auf flachen ReprĂ€sentationen. Daher können sie kein großmaßstĂ€bliches, semantisch organisiertes Allgemeinwissen erzeugen. Um unser Ziel zu erreichen, teilen wir den Problemraum in drei Forschungsrichtungen, welche den Hauptbeitrag dieser Dissertation formen: 1. Eigenschaften von Objekten: Erfassung von Eigenschaften wie hasSize, hasShape, usw. Wir entwickeln WebChild, eine halbĂŒberwachte Methode zum Erfassen semantisch organisierter Eigenschaften. 2. Beziehungen zwischen Objekten: Erfassung von Beziehungen wie largerThan, partOf, memberOf, usw. Wir entwickeln CMPKB, eine Methode basierend auf linearer Programmierung um vergleichbare Beziehungen zu erfassen. Weiterhin entwickeln wir PWKB, eine Methode basierend auf statistischer und logischer Inferenz welche zugehörigkeits Beziehungen erfasst. 3. Interaktionen zwischen Objekten: Erfassung von AktivitĂ€ten, wie drive a car, park a car, usw. mit temporalen und rĂ€umlichen Attributen. Wir entwickeln Knowlywood, eine Methode basierend auf semantischem Parsen und probabilistischen grafischen Modellen um AktivitĂ€tswissen zu erfassen. Als Resultat dieser Methoden erstellen wir eine große, saubere und semantisch organisierte Allgemeinwissensbasis, welche wir WebChild KB nennen

    Statistical Extraction of Multilingual Natural Language Patterns for RDF Predicates: Algorithms and Applications

    Get PDF
    The Data Web has undergone a tremendous growth period. It currently consists of more then 3300 publicly available knowledge bases describing millions of resources from various domains, such as life sciences, government or geography, with over 89 billion facts. In the same way, the Document Web grew to the state where approximately 4.55 billion websites exist, 300 million photos are uploaded on Facebook as well as 3.5 billion Google searches are performed on average every day. However, there is a gap between the Document Web and the Data Web, since for example knowledge bases available on the Data Web are most commonly extracted from structured or semi-structured sources, but the majority of information available on the Web is contained in unstructured sources such as news articles, blog post, photos, forum discussions, etc. As a result, data on the Data Web not only misses a significant fragment of information but also suffers from a lack of actuality since typical extraction methods are time-consuming and can only be carried out periodically. Furthermore, provenance information is rarely taken into consideration and therefore gets lost in the transformation process. In addition, users are accustomed to entering keyword queries to satisfy their information needs. With the availability of machine-readable knowledge bases, lay users could be empowered to issue more specific questions and get more precise answers. In this thesis, we address the problem of Relation Extraction, one of the key challenges pertaining to closing the gap between the Document Web and the Data Web by four means. First, we present a distant supervision approach that allows finding multilingual natural language representations of formal relations already contained in the Data Web. We use these natural language representations to find sentences on the Document Web that contain unseen instances of this relation between two entities. Second, we address the problem of data actuality by presenting a real-time data stream RDF extraction framework and utilize this framework to extract RDF from RSS news feeds. Third, we present a novel fact validation algorithm, based on natural language representations, able to not only verify or falsify a given triple, but also to find trustworthy sources for it on the Web and estimating a time scope in which the triple holds true. The features used by this algorithm to determine if a website is indeed trustworthy are used as provenance information and therewith help to create metadata for facts in the Data Web. Finally, we present a question answering system that uses the natural language representations to map natural language question to formal SPARQL queries, allowing lay users to make use of the large amounts of data available on the Data Web to satisfy their information need

    Commonsense knowledge acquisition and applications

    Get PDF
    Computers are increasingly expected to make smart decisions based on what humans consider commonsense. This would require computers to understand their environment, including properties of objects in the environment (e.g., a wheel is round), relations between objects (e.g., two wheels are part of a bike, or a bike is slower than a car) and interactions of objects (e.g., a driver drives a car on the road). The goal of this dissertation is to investigate automated methods for acquisition of large-scale, semantically organized commonsense knowledge. Prior state-of-the-art methods to acquire commonsense are either not automated or based on shallow representations. Thus, they cannot produce large-scale, semantically organized commonsense knowledge. To achieve the goal, we divide the problem space into three research directions, constituting our core contributions: 1. Properties of objects: acquisition of properties like hasSize, hasShape, etc. We develop WebChild, a semi-supervised method to compile semantically organized properties. 2. Relationships between objects: acquisition of relations like largerThan, partOf, memberOf, etc. We develop CMPKB, a linear-programming based method to compile comparative relations, and, we develop PWKB, a method based on statistical and logical inference to compile part-whole relations. 3. Interactions between objects: acquisition of activities like drive a car, park a car, etc., with attributes such as temporal or spatial attributes. We develop Knowlywood, a method based on semantic parsing and probabilistic graphical models to compile activity knowledge. Together, these methods result in the construction of a large, clean and semantically organized Commonsense Knowledge Base that we call WebChild KB.Von Computern wird immer mehr erwartet, dass sie kluge Entscheidungen treffen können, basierend auf Allgemeinwissen. Dies setzt voraus, dass Computer ihre Umgebung, einschließlich der Eigenschaften von Objekten (z. B. das Rad ist rund), Beziehungen zwischen Objekten (z. B. ein Fahrrad hat zwei RĂ€der, ein Fahrrad ist langsamer als ein Auto) und Interaktionen von Objekten (z. B. ein Fahrer fĂ€hrt ein Auto auf der Straße), verstehen können. Das Ziel dieser Dissertation ist es, automatische Methoden fĂŒr die Erfassung von großmaßstĂ€blichem, semantisch organisiertem Allgemeinwissen zu schaffen. Dies ist schwierig aufgrund folgender Eigenschaften des Allgemeinwissens. Es ist: (i) implizit und spĂ€rlich, da Menschen nicht explizit das Offensichtliche ausdrĂŒcken, (ii) multimodal, da es ĂŒber textuelle und visuelle Inhalte verteilt ist, (iii) beeintrĂ€chtigt vom Einfluss des Berichtenden, da ungewöhnliche Fakten disproportional hĂ€ufig berichtet werden, (iv) KontextabhĂ€ngig, und hat aus diesem Grund eine eingeschrĂ€nkte statistische Konfidenz. Vorherige Methoden, auf diesem Gebiet sind entweder nicht automatisiert oder basieren auf flachen ReprĂ€sentationen. Daher können sie kein großmaßstĂ€bliches, semantisch organisiertes Allgemeinwissen erzeugen. Um unser Ziel zu erreichen, teilen wir den Problemraum in drei Forschungsrichtungen, welche den Hauptbeitrag dieser Dissertation formen: 1. Eigenschaften von Objekten: Erfassung von Eigenschaften wie hasSize, hasShape, usw. Wir entwickeln WebChild, eine halbĂŒberwachte Methode zum Erfassen semantisch organisierter Eigenschaften. 2. Beziehungen zwischen Objekten: Erfassung von Beziehungen wie largerThan, partOf, memberOf, usw. Wir entwickeln CMPKB, eine Methode basierend auf linearer Programmierung um vergleichbare Beziehungen zu erfassen. Weiterhin entwickeln wir PWKB, eine Methode basierend auf statistischer und logischer Inferenz welche zugehörigkeits Beziehungen erfasst. 3. Interaktionen zwischen Objekten: Erfassung von AktivitĂ€ten, wie drive a car, park a car, usw. mit temporalen und rĂ€umlichen Attributen. Wir entwickeln Knowlywood, eine Methode basierend auf semantischem Parsen und probabilistischen grafischen Modellen um AktivitĂ€tswissen zu erfassen. Als Resultat dieser Methoden erstellen wir eine große, saubere und semantisch organisierte Allgemeinwissensbasis, welche wir WebChild KB nennen

    Ontology mapping with auxiliary resources

    Get PDF
    • 

    corecore