13 research outputs found
Enhancing knowledge acquisition systems with user generated and crowdsourced resources
This thesis is on leveraging knowledge acquisition systems with collaborative data and
crowdsourcing work from internet. We propose two strategies and apply them for building
effective entity linking and question answering (QA) systems.
The first strategy is on integrating an information extraction system with online collaborative
knowledge bases, such as Wikipedia and Freebase. We construct a Cross-Lingual Entity
Linking (CLEL) system to connect Chinese entities, such as people and locations, with
corresponding English pages in Wikipedia.
The main focus is to break the language barrier between Chinese entities and the English
KB, and to resolve the synonymy and polysemy of Chinese entities. To address those
problems, we create a cross-lingual taxonomy and a Chinese knowledge base (KB). We
investigate two methods of connecting the query representation with the KB representation.
Based on our CLEL system participating in TAC KBP 2011 evaluation, we finally propose
a simple and effective generative model, which achieved much better performance.
The second strategy is on creating annotation for QA systems with the help of crowd-
sourcing. Crowdsourcing is to distribute a task via internet and recruit a lot of people to
complete it simultaneously. Various annotated data are required to train the data-driven
statistical machine learning algorithms for underlying components in our QA system. This
thesis demonstrates how to convert the annotation task into crowdsourcing micro-tasks,
investigate different statistical methods for enhancing the quality of crowdsourced anno-
tation, and finally use enhanced annotation to train learning to rank models for passage
ranking algorithms for QA.Gegenstand dieser Arbeit ist das Nutzbarmachen sowohl von Systemen zur Wissener-
fassung als auch von kollaborativ erstellten Daten und Arbeit aus dem Internet. Es
werden zwei Strategien vorgeschlagen, welche für die Erstellung effektiver Entity Linking
(Disambiguierung von Entitätennamen) und Frage-Antwort Systeme eingesetzt werden.
Die erste Strategie ist, ein Informationsextraktions-System mit kollaborativ erstellten Online-
Datenbanken zu integrieren. Wir entwickeln ein Cross-Linguales Entity Linking-System
(CLEL), um chinesische Entitäten, wie etwa Personen und Orte, mit den entsprechenden
Wikipediaseiten zu verknüpfen.
Das Hauptaugenmerk ist es, die Sprachbarriere zwischen chinesischen Entitäten und
englischer Datenbank zu durchbrechen, und Synonymie und Polysemie der chinesis-
chen Entitäten aufzulösen. Um diese Probleme anzugehen, erstellen wir eine cross
linguale Taxonomie und eine chinesische Datenbank. Wir untersuchen zwei Methoden,
die Repräsentation der Anfrage und die Repräsentation der Datenbank zu verbinden.
Schließlich stellen wir ein einfaches und effektives generatives Modell vor, das auf unserem
System für die Teilnahme an der TAC KBP 2011 Evaluation basiert und eine erheblich
bessere Performanz erreichte.
Die zweite Strategie ist, Annotationen für Frage-Antwort-Systeme mit Hilfe von "Crowd-
sourcing" zu erstellen. "Crowdsourcing" bedeutet, eine Aufgabe via Internet an eine
große Menge an angeworbene Menschen zu verteilen, die diese simultan erledigen.
Verschiedene annotierte Daten sind notwendig, um die datengetriebenen statistischen
Lernalgorithmen zu trainieren, die unserem Frage-Antwort System zugrunde liegen. Wir
zeigen, wie die Annotationsaufgabe in Mikro-Aufgaben für das Crowdsourcing umgewan-
delt werden kann, wir untersuchen verschiedene statistische Methoden, um die Qualität
der Annotation aus dem Crowdsourcing zu erweitern, und schließlich nutzen wir die erwei-
erte Annotation, um Modelle zum Lernen von Ranglisten von Textabschnitten zu trainieren
PolyUCOMP in TAC 2011 entity linking and slot filling
The Text Analysis Conference (TAC) is organized by the U.S. National Institute of Standards and Technology (NIST).2011-2012 > Academic research: refereed > Refereed conference paperVersion of RecordPublishe
UNIMIB@NEEL-IT: Named Entity Recognition and Linking of Italian Tweets
Questo articolo descrive il sistema proposto dal gruppo UNIMIB per il task di Named Entity Recognition and Linking applicato a tweet in lingua italiana (NEEL-IT). Il sistema, che rappresenta un approccio iniziale al problema, \ue8 costituito da tre passaggi fondamentali: (1) Named Entity Recognition tramite l\u2019utilizzo di Conditional Random Fields, (2) Named Entity Linking considerando sia approcci supervisionati sia modelli di linguaggio basati su reti neurali, e (3) NIL clustering tramite un approccio basato su grafi.This paper describes the framework proposed by the UNIMIB Team for the task of Named Entity Recognition and Linking of Italian Tweets (NEEL-IT). The proposed pipeline, which represents an entry level system, is composed of three main steps: (1) Named Entity Recognition using Conditional Random Fields, (2) Named Entity Linking by considering both Supervised and Neural-Network Language models, and (3) NIL clustering byusing a graph-based approach
エンティティ・リンキングのための候補検索とランキング方法に関する研究
Tohoku University乾健太郎課
Linking named entities to Wikipedia
Natural language is fraught with problems of ambiguity, including name reference. A name in text can refer to multiple entities just as an entity can be known by different names. This thesis examines how a mention in text can be linked to an external knowledge base (KB), in our case, Wikipedia. The named entity linking (NEL) task requires systems to identify the KB entry, or Wikipedia article, that a mention refers to; or, if the KB does not contain the correct entry, return NIL. Entity linking systems can be complex and we present a framework for analysing their different components, which we use to analyse three seminal systems which are evaluated on a common dataset and we show the importance of precise search for linking. The Text Analysis Conference (TAC) is a major venue for NEL research. We report on our submissions to the entity linking shared task in 2010, 2011 and 2012. The information required to disambiguate entities is often found in the text, close to the mention. We explore apposition, a common way for authors to provide information about entities. We model syntactic and semantic restrictions with a joint model that achieves state-of-the-art apposition extraction performance. We generalise from apposition to examine local descriptions specified close to the mention. We add local description to our state-of-the-art linker by using patterns to extract the descriptions and matching against this restricted context. Not only does this make for a more precise match, we are also able to model failure to match. Local descriptions help disambiguate entities, further improving our state-of-the-art linker. The work in this thesis seeks to link textual entity mentions to knowledge bases. Linking is important for any task where external world knowledge is used and resolving ambiguity is fundamental to advancing research into these problems
Recommended from our members
Entity-based Enrichment for Information Extraction and Retrieval
The goal of this work is to leverage cross-document entity relationships for improved understanding of queries and documents. We define an entity to be a thing or concept that exists in the world, such as a politician, a battle, a film, or a color. Entity-based enrichment (EBE) is a new expansion model for both queries and documents using features from similar entitymentions in the document collection and external knowledge resources. It uses task-specific features from entities beyond words that include: name aliases, fine-grained entity types, categories, and relationships to other entities. EBE addresses the problem of sparse or noisy local evidence due to multiple topics, implicit context, or informal writing. With the ultimate goal of improving information retrieval effectiveness, we start from unstructured text and through information extraction build up rich entity-based representations linked to external knowledge resources. We study the application ofentity-based enrichment to each step in the pipeline: 1) Named entity recognition, 2) Entity linking, and 3) Ad hoc document retrieval. The empirical results for EBE in each of these tasks shows significant improvements. Our first application of entity-based enrichment is the task of detecting entities in named entity recognition. We enrich the representation of observed words likely to represent entities. We perform weighted feature copying of recognition features from similar tokens in the corpus and external collections. The evaluation shows statistically significant improvements on in-domain newswire accuracy and demonstrates that the models are more robust on out-of-domain data. In the second part of this work, we apply EBE to the task of entity linking. The proposed entity linking method for disambiguating the detected mentions to entries in an external knowledge base is based on information retrieval. Theneighborhood relevance model, an enrichment model, identifies salient associations between an entity mention and otherentity mentions in the document. The results show that the enrichment model outperforms other context models and results in a system that is competitive with leading methods. Using the constructed entity representation, the final task is ad hoc document retrieval. We enrich the query representation with entity features. Retrieval is performed over documents annotated with entities linked to Wikipedia and Freebase from our system as well as the publicly available Google FACC1 annotations. To effectively leverage linked entity features, we extend dependency-based retrieval models to include structured attributes. We also define a new query-specific entity context model which builds a model for disambiguated entities from retrieved documents. Our results demonstrate significant improvements over competitive query expansion baselines for several standard test collections