5 research outputs found

    Joint Inference for Knowledge Base Population

    Full text link
    Populating Knowledge Base (KB) with new knowledge facts from reliable text resources usually consists of linking name mentions to KB entities and identifying relationship between entity pairs. However, the task often suffers from errors propagating from upstream entity linkers to downstream relation extractors. In this paper, we propose a novel joint inference framework to allow interactions between the two subtasks and find an optimal assignment by addressing the coherence among preliminary local predictions: whether the types of entities meet the expectations of relations explicitly or implicitly, and whether the local predictions are globally compatible. We further measure the confidence of the extracted triples by looking at the details of the complete extraction process. Experiments show that the proposed framework can significantly reduce the error propagations thus obtain more reliable facts, and outperforms competitive baselines with state-of-the-art relation extraction models. ? 2014 Association for Computational Linguistics.EI

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Grounding event references in news

    Get PDF
    Events are frequently discussed in natural language, and their accurate identification is central to language understanding. Yet they are diverse and complex in ontology and reference; computational processing hence proves challenging. News provides a shared basis for communication by reporting events. We perform several studies into news event reference. One annotation study characterises each news report in terms of its update and topic events, but finds that topic is better consider through explicit references to background events. In this context, we propose the event linking task which—analogous to named entity linking or disambiguation—models the grounding of references to notable events. It defines the disambiguation of an event reference as a link to the archival article that first reports it. When two references are linked to the same article, they need not be references to the same event. Event linking hopes to provide an intuitive approximation to coreference, erring on the side of over-generation in contrast with the literature. The task is also distinguished in considering event references from multiple perspectives over time. We diagnostically evaluate the task by first linking references to past, newsworthy events in news and opinion pieces to an archive of the Sydney Morning Herald. The intensive annotation results in only a small corpus of 229 distinct links. However, we observe that a number of hyperlinks targeting online news correspond to event links. We thus acquire two large corpora of hyperlinks at very low cost. From these we learn weights for temporal and term overlap features in a retrieval system. These noisy data lead to significant performance gains over a bag-of-words baseline. While our initial system can accurately predict many event links, most will require deep linguistic processing for their disambiguation

    Enhancing knowledge acquisition systems with user generated and crowdsourced resources

    Get PDF
    This thesis is on leveraging knowledge acquisition systems with collaborative data and crowdsourcing work from internet. We propose two strategies and apply them for building effective entity linking and question answering (QA) systems. The first strategy is on integrating an information extraction system with online collaborative knowledge bases, such as Wikipedia and Freebase. We construct a Cross-Lingual Entity Linking (CLEL) system to connect Chinese entities, such as people and locations, with corresponding English pages in Wikipedia. The main focus is to break the language barrier between Chinese entities and the English KB, and to resolve the synonymy and polysemy of Chinese entities. To address those problems, we create a cross-lingual taxonomy and a Chinese knowledge base (KB). We investigate two methods of connecting the query representation with the KB representation. Based on our CLEL system participating in TAC KBP 2011 evaluation, we finally propose a simple and effective generative model, which achieved much better performance. The second strategy is on creating annotation for QA systems with the help of crowd- sourcing. Crowdsourcing is to distribute a task via internet and recruit a lot of people to complete it simultaneously. Various annotated data are required to train the data-driven statistical machine learning algorithms for underlying components in our QA system. This thesis demonstrates how to convert the annotation task into crowdsourcing micro-tasks, investigate different statistical methods for enhancing the quality of crowdsourced anno- tation, and ïŹnally use enhanced annotation to train learning to rank models for passage ranking algorithms for QA.Gegenstand dieser Arbeit ist das Nutzbarmachen sowohl von Systemen zur Wissener- fassung als auch von kollaborativ erstellten Daten und Arbeit aus dem Internet. Es werden zwei Strategien vorgeschlagen, welche fĂŒr die Erstellung effektiver Entity Linking (Disambiguierung von EntitĂ€tennamen) und Frage-Antwort Systeme eingesetzt werden. Die erste Strategie ist, ein Informationsextraktions-System mit kollaborativ erstellten Online- Datenbanken zu integrieren. Wir entwickeln ein Cross-Linguales Entity Linking-System (CLEL), um chinesische EntitĂ€ten, wie etwa Personen und Orte, mit den entsprechenden Wikipediaseiten zu verknĂŒpfen. Das Hauptaugenmerk ist es, die Sprachbarriere zwischen chinesischen EntitĂ€ten und englischer Datenbank zu durchbrechen, und Synonymie und Polysemie der chinesis- chen EntitĂ€ten aufzulösen. Um diese Probleme anzugehen, erstellen wir eine cross linguale Taxonomie und eine chinesische Datenbank. Wir untersuchen zwei Methoden, die ReprĂ€sentation der Anfrage und die ReprĂ€sentation der Datenbank zu verbinden. Schließlich stellen wir ein einfaches und effektives generatives Modell vor, das auf unserem System fĂŒr die Teilnahme an der TAC KBP 2011 Evaluation basiert und eine erheblich bessere Performanz erreichte. Die zweite Strategie ist, Annotationen fĂŒr Frage-Antwort-Systeme mit Hilfe von "Crowd- sourcing" zu erstellen. "Crowdsourcing" bedeutet, eine Aufgabe via Internet an eine große Menge an angeworbene Menschen zu verteilen, die diese simultan erledigen. Verschiedene annotierte Daten sind notwendig, um die datengetriebenen statistischen Lernalgorithmen zu trainieren, die unserem Frage-Antwort System zugrunde liegen. Wir zeigen, wie die Annotationsaufgabe in Mikro-Aufgaben fĂŒr das Crowdsourcing umgewan- delt werden kann, wir untersuchen verschiedene statistische Methoden, um die QualitĂ€t der Annotation aus dem Crowdsourcing zu erweitern, und schließlich nutzen wir die erwei- erte Annotation, um Modelle zum Lernen von Ranglisten von Textabschnitten zu trainieren
    corecore