9 research outputs found

    Populating knowledge bases with temporal information

    Get PDF
    Recent progress in information extraction has enabled the automatic construction of large knowledge bases. Knowledge bases contain millions of entities (e.g. persons, organizations, events, etc.), their semantic classes, and facts about them. Knowledge bases have become a great asset for semantic search, entity linking, deep analytics, and question answering. However, a common limitation of current knowledge bases is the poor coverage of temporal knowledge. First of all, so far, knowledge bases have focused on popular events and ignored long tail events such as political scandals, local festivals, or protests. Secondly, they do not cover the textual phrases denoting events and temporal facts at all. The goal of this dissertation, thus, is to automatically populate knowledge bases with this kind of temporal knowledge. The dissertation makes the following contributions to address the afore mentioned limitations. The first contribution is a method for extracting events from news articles. The method reconciles the extracted events into canonicalized representations and organizes them into fine-grained semantic classes. The second contribution is a method for mining the textual phrases denoting the events and facts. The method infers the temporal scopes of these phrases and maps them to a knowledge base. Our experimental evaluations demonstrate that our methods yield high quality output compared to state-of- the-art approaches, and can indeed populate knowledge bases with temporal knowledge.Der Fortschritt in der Informationsextraktion ermöglicht heute das automatischen Erstellen von Wissensbasen. Derartige Wissensbasen enthalten Entitäten wie Personen, Organisationen oder Events sowie Informationen über diese und deren semantische Klasse. Automatisch generierte Wissensbasen bilden eine wesentliche Grundlage für das semantische Suchen, das Verknüpfen von Entitäten, die Textanalyse und für natürlichsprachliche Frage-Antwortsysteme. Eine Schwäche aktueller Wissensbasen ist jedoch die unzureichende Erfassung von temporalen Informationen. Wissenbasen fokussieren in erster Linie auf populäre Events und ignorieren weniger bekannnte Events wie z.B. politische Skandale, lokale Veranstaltungen oder Demonstrationen. Zudem werden Textphrasen zur Bezeichung von Events und temporalen Fakten nicht erfasst. Ziel der vorliegenden Arbeit ist es, Methoden zu entwickeln, die temporales Wissen au- tomatisch in Wissensbasen integrieren. Dazu leistet die Dissertation folgende Beiträge: 1. Die Entwicklung einer Methode zur Extrahierung von Events aus Nachrichtenartikeln sowie deren Darstellung in einer kanonischen Form und ihrer Einordnung in detaillierte semantische Klassen. 2. Die Entwicklung einer Methode zur Gewinnung von Textphrasen, die Events und Fakten in Wissensbasen bezeichnen sowie einer Methode zur Ableitung ihres zeitlichen Verlaufs und ihrer Dauer. Unsere Experimente belegen, dass die von uns entwickelten Methoden zu qualitativ deutlich besseren Ausgabewerten führen als bisherige Verfahren und Wissensbasen tatsächlich um temporales Wissen erweitern können

    Extraction of Temporal Facts and Events from Wikipedia

    No full text
    Recently, large-scale knowledge bases have been constructed by automatically extracting relational facts from text. Unfortunately, most of the current knowledge bases focus on static facts and ignore the temporal dimension. However, the vast majority of facts are evolving with time or are valid only during a particular time period. Thus, time is a significant dimension that should be included in knowledge bases. In this paper, we introduce a complete information extraction framework that harvests temporal facts and events from semi-structured data and free text of Wikipedia articles to create a temporal ontology. First, we extend a temporal data representation model by making it aware of events. Second, we develop an information extraction method which harvests temporal facts and events from Wikipedia infoboxes, categories, lists, and article titles in order to build a temporal knowledge base. Third, we show how the system can use its extracted knowledge for further growing the knowledge base. We demonstrate the effectiveness of our proposed methods through several experiments. We extracted more than one million temporal facts with precision over 90 % for extraction from semi-structured data and almost 70 % for extraction from text

    EVIN: building a knowledge base of events

    No full text
    We present EVIN: a system that extracts named events from news articles, reconciles them into canonicalized events, and organizes them into semantic classes to populate a knowl-edge base. EVIN exploits different kinds of similarity mea-sures among news, referring to textual contents, entity oc-currences, and temporal ordering. These similarities are captured in a multi-view attributed graph. To distill canon-icalized events, EVIN coarsens the graph by iterative merg-ing based on a judiciously designed loss function. To infer semantic classes of events, EVIN uses statistical language models. EVIN provides a GUI that allows users to query the constructed knowledge base of events, and to explore it in a visual manner

    Inside YAGO2s: A Transparent Information Extraction Architecture

    No full text
    YAGO[9, 6] is one of the largest public ontologies constructed by information extraction. In a recent refactoring called YAGO2s, the system has been given a modular and completely transparent architecture. In this demo, users can see how more than 30 individual modules of YAGO work in parallel to extract facts, to check facts for their correctness, to deduce facts, and to merge facts from different sources. A GUI allows users to play with different input files, to trace the provenance of individual facts to their sources, to change deduction rules, and to run individual extractors. Users can see step by step how the extractors work together to combine the individual facts to the coherent whole of the YAGO ontology

    YAGO2s: Modular High-Quality Information Extraction with an Application to Flight Planning

    No full text
    Abstract: In this paper, we present YAGO2s, the new edition of the YAGO ontology [SKW07, HSBW12]. The software architecture has been refactored from scratch, yielding a design that modularizes both code and data. This modularization enables us to add in new data sources more easily, while still maintaining the high accuracy and coherence of the ontology. Thus, we believe that YAGO2s occupies a sweetspot between a centralized design and a completely distributed design. In this demo, we present an application of this design to the task of planning a flight. Our proposed system finds flights between all airports close to the departure city to all airports close to the destination city. 1 Knowledge Base Construction In recent years, many projects have successfully created large-scale knowledge bases (KBs) in an automated fashion. The KBs contain millions of entities (such as rivers, universities, people, and movies), and millions of facts about them (such as who acted in which movie, which river is located in which country, etc.). There are several strategies to build such KBs. One strategy is to accumulate and reconcile as much data as possible
    corecore