4 research outputs found

    Slot Filling

    Get PDF
    Slot filling (SF) is the task of automatically extracting facts about particular entities from unstructured text, and populating a knowledge base (KB) with these facts. These structured KBs enable applications such as structured web queries and question answering. SF is typically framed as a query-oriented setting of the related task of relation extraction. Throughout this thesis, we reflect on how SF is a task with many distinct problems. We demonstrate that recall is a major limiter on SF system performance. We contribute an analysis of typical SF recall loss, and find a substantial amount of loss occurs early in the SF pipeline. We confirm that accurate NER and coreference resolution are required for high-recall SF. We measure upper bounds using a naïve graph-based semi-supervised bootstrapping technique, and find that only 39% of results are reachable using a typical feature space. We expect that this graph-based technique will be directly useful for extraction, and this leads us to frame SF as a label propagation task. We focus on a detailed graph representation of the task which reflects the behaviour and assumptions we want to model based on our analysis, including modifying the label propagation process to model multiple types of label interaction. Analysing the graph, we find that a large number of errors occur in very close proximity to training data, and identify that this is of major concern for propagation. While there are some conflicts caused by a lack of sufficient disambiguating context—we explore adding additional contextual features to address this—many of these conflicts are caused by subtle annotation problems. We find that lack of a standard for how explicit expressions of relations must be in text makes consistent annotation difficult. Using a strict definition of explicitness results in 20% of correct annotations being removed from a standard dataset. We contribute several annotation-driven analyses of this problem, exploring the definition of slots and the effect of the lack of a concrete definition of explicitness: annotation schema do not detail how explicit expressions of relations need to be, and there is large scope for disagreement between annotators. Additionally, applications may require relatively strict or relaxed evidence for extractions, but this is not considered in annotation tasks. We demonstrate that annotators frequently disagree on instances, dependent on differences in annotator world knowledge and thresholds on making probabilistic inference. SF is fundamental to enabling many knowledge-based applications, and this work motivates modelling and evaluating SF to better target these tasks

    Deep learning methods for knowledge base population

    Get PDF
    Knowledge bases store structured information about entities or concepts of the world and can be used in various applications, such as information retrieval or question answering. A major drawback of existing knowledge bases is their incompleteness. In this thesis, we explore deep learning methods for automatically populating them from text, addressing the following tasks: slot filling, uncertainty detection and type-aware relation extraction. Slot filling aims at extracting information about entities from a large text corpus. The Text Analysis Conference yearly provides new evaluation data in the context of an international shared task. We develop a modular system to address this challenge. It was one of the top-ranked systems in the shared task evaluations in 2015. For its slot filler classification module, we propose contextCNN, a convolutional neural network based on context splitting. It improves the performance of the slot filling system by 5.0% micro and 2.9% macro F1. To train our binary and multiclass classification models, we create a dataset using distant supervision and reduce the number of noisy labels with a self-training strategy. For model optimization and evaluation, we automatically extract a labeled benchmark for slot filler classification from the manual shared task assessments from 2012-2014. We show that results on this benchmark are correlated with slot filling pipeline results with a Pearson's correlation coefficient of 0.89 (0.82) on data from 2013 (2014). The combination of patterns, support vector machines and contextCNN achieves the best results on the benchmark with a micro (macro) F1 of 51% (53%) on test. Finally, we analyze the results of the slot filling pipeline and the impact of its components. For knowledge base population, it is essential to assess the factuality of the statements extracted from text. From the sentence "Obama was rumored to be born in Kenya", a system should not conclude that Kenya is the place of birth of Obama. Therefore, we address uncertainty detection in the second part of this thesis. We investigate attention-based models and make a first attempt to systematize the attention design space. Moreover, we propose novel attention variants: External attention, which incorporates an external knowledge source, k-max average attention, which only considers the vectors with the k maximum attention weights, and sequence-preserving attention, which allows to maintain order information. Our convolutional neural network with external k-max average attention sets the new state of the art on a Wikipedia benchmark dataset with an F1 score of 68%. To the best of our knowledge, we are the first to integrate an uncertainty detection component into a slot filling pipeline. It improves precision by 1.4% and micro F1 by 0.4%. In the last part of the thesis, we investigate type-aware relation extraction with neural networks. We compare different models for joint entity and relation classification: pipeline models, jointly trained models and globally normalized models based on structured prediction. First, we show that using entity class prediction scores instead of binary decisions helps relation classification. Second, joint training clearly outperforms pipeline models on a large-scale distantly supervised dataset with fine-grained entity classes. It improves the area under the precision-recall curve from 0.53 to 0.66. Third, we propose a model with a structured prediction output layer, which globally normalizes the score of a triple consisting of the classes of two entities and the relation between them. It improves relation extraction results by 4.4% F1 on a manually labeled benchmark dataset. Our analysis shows that the model learns correct correlations between entity and relation classes. Finally, we are the first to use neural networks for joint entity and relation classification in a slot filling pipeline. The jointly trained model achieves the best micro F1 score with a score of 22% while the neural structured prediction model performs best in terms of macro F1 with a score of 25%

    Deep learning methods for knowledge base population

    Get PDF
    Knowledge bases store structured information about entities or concepts of the world and can be used in various applications, such as information retrieval or question answering. A major drawback of existing knowledge bases is their incompleteness. In this thesis, we explore deep learning methods for automatically populating them from text, addressing the following tasks: slot filling, uncertainty detection and type-aware relation extraction. Slot filling aims at extracting information about entities from a large text corpus. The Text Analysis Conference yearly provides new evaluation data in the context of an international shared task. We develop a modular system to address this challenge. It was one of the top-ranked systems in the shared task evaluations in 2015. For its slot filler classification module, we propose contextCNN, a convolutional neural network based on context splitting. It improves the performance of the slot filling system by 5.0% micro and 2.9% macro F1. To train our binary and multiclass classification models, we create a dataset using distant supervision and reduce the number of noisy labels with a self-training strategy. For model optimization and evaluation, we automatically extract a labeled benchmark for slot filler classification from the manual shared task assessments from 2012-2014. We show that results on this benchmark are correlated with slot filling pipeline results with a Pearson's correlation coefficient of 0.89 (0.82) on data from 2013 (2014). The combination of patterns, support vector machines and contextCNN achieves the best results on the benchmark with a micro (macro) F1 of 51% (53%) on test. Finally, we analyze the results of the slot filling pipeline and the impact of its components. For knowledge base population, it is essential to assess the factuality of the statements extracted from text. From the sentence "Obama was rumored to be born in Kenya", a system should not conclude that Kenya is the place of birth of Obama. Therefore, we address uncertainty detection in the second part of this thesis. We investigate attention-based models and make a first attempt to systematize the attention design space. Moreover, we propose novel attention variants: External attention, which incorporates an external knowledge source, k-max average attention, which only considers the vectors with the k maximum attention weights, and sequence-preserving attention, which allows to maintain order information. Our convolutional neural network with external k-max average attention sets the new state of the art on a Wikipedia benchmark dataset with an F1 score of 68%. To the best of our knowledge, we are the first to integrate an uncertainty detection component into a slot filling pipeline. It improves precision by 1.4% and micro F1 by 0.4%. In the last part of the thesis, we investigate type-aware relation extraction with neural networks. We compare different models for joint entity and relation classification: pipeline models, jointly trained models and globally normalized models based on structured prediction. First, we show that using entity class prediction scores instead of binary decisions helps relation classification. Second, joint training clearly outperforms pipeline models on a large-scale distantly supervised dataset with fine-grained entity classes. It improves the area under the precision-recall curve from 0.53 to 0.66. Third, we propose a model with a structured prediction output layer, which globally normalizes the score of a triple consisting of the classes of two entities and the relation between them. It improves relation extraction results by 4.4% F1 on a manually labeled benchmark dataset. Our analysis shows that the model learns correct correlations between entity and relation classes. Finally, we are the first to use neural networks for joint entity and relation classification in a slot filling pipeline. The jointly trained model achieves the best micro F1 score with a score of 22% while the neural structured prediction model performs best in terms of macro F1 with a score of 25%

    Recherche d'information et fouille de textes

    Get PDF
    National audienceIntroduction Comprendre un texte est un but que l'Intelligence Artificielle (IA) s'est fixĂ© depuis ses dĂ©buts et les premiers travaux apportant des rĂ©ponses ont vu le jour dans les annĂ©es 70s. Depuis lors, le thĂšme est toujours d'actualitĂ©, bien que les buts et mĂ©thodes qu'il recouvre aient considĂ©rablement Ă©voluĂ©s. Il est donc nĂ©cessaire de regarder de plus prĂšs ce qui se cache derriĂšre cette dĂ©nomination gĂ©nĂ©rale de « comprĂ©hension de texte ». Les premiers travaux, qui ont eu lieu du milieu des annĂ©es 70 jusqu'au milieu des annĂ©es 80 [Charniak 1972; Dyer 1983; Schank et al. 1977], Ă©tudiaient des textes relatant de courtes histoires et comprendre signifiait mettre en Ă©vidence les tenants et aboutissants de l'histoire-les sujets traitĂ©s, les Ă©vĂ©nements dĂ©crits, les relations de causalitĂ© les reliant-ainsi que le rĂŽle de chaque personnage, ses motivations et ses intentions. La comprĂ©hension Ă©tait vue comme un processus d'infĂ©rence visant Ă  expliciter tout l'implicite prĂ©sent dans un texte en le retrouvant Ă  partir des connaissances sĂ©mantiques et pragmatiques dont disposait la machine. Cela prĂ©supposait une modĂ©lisation prĂ©alable de ces connaissances. On rejoint ici les travaux effectuĂ©s sur les diffĂ©rents formalismes de reprĂ©sentation des connaissances en IA, dĂ©crivant d'une part les sens associĂ©s aux mots de la langue (rĂ©seaux sĂ©mantiques vs logique, et notamment graphes conceptuels [Sowa 1984] et d'autre part les connaissances pragmatiques [Schank 1982]. Tous ces travaux ont montrĂ© leur limite dĂšs lors qu'il s'agissait de modĂ©liser manuellement ces connaissances pour tous les domaines, ou de les apprendre automatiquement. Le problĂšme de la comprĂ©hension automatique en domaine ouvert restait donc entier. Puisque le problĂšme ainsi posĂ© est insoluble en l'Ă©tat des connaissances, une approche alternative consiste Ă  le redĂ©finir et Ă  le dĂ©composer en sous-tĂąches potentiellement plus faciles Ă  rĂ©soudre. Ainsi la comprĂ©hension de texte peut ĂȘtre redĂ©finie selon diffĂ©rents points de vue sur le texte qui permettent de rĂ©pondre Ă  des besoins spĂ©cifiques. De mĂȘme qu'un lecteur ne lit pas un texte de façon identique selon qu'il veut Ă©valuer sa pertinence par rapport Ă  un thĂšme qui l'intĂ©resse (tĂąche de type recherche documentaire), qu'il veut classer des documents, prendre connaissances des Ă©vĂ©nements relatĂ©s ou rechercher une information prĂ©cise, de mĂȘme les processus automatiques seront multiples et s'intĂ©resseront Ă  des aspects diffĂ©rents du texte en fonction de la tĂąche visĂ©e. Suivant le type de connaissance cherchĂ© dans un document, le lecteur n'extraira du texte que l'information qui l'intĂ©resse et s'appuiera pour cela sur les indices et sur les connaissances qui lui permettent de rĂ©aliser sa tĂąche de lecture, et donc de comprĂ©hension, sans avoir Ă  tout assimiler. On peut alors parler de comprĂ©hension Ă  niveaux variables, qui va permettre d'accĂ©der Ă  des niveaux de sens diffĂ©rents. Cette dĂ©marche est bien illustrĂ©e par les travaux en extraction d'information, Ă©valuĂ©s dans le cadre des confĂ©rences MUC [Grishman and Sundheim 1996], qui ont eu lieu de la fin des annĂ©es 1980 jusqu'en 1998. L'extraction d'information consistait alors Ă  modĂ©liser un besoin d'information par un patron, dĂ©crit par un ensemble d'attributs typĂ©s, et Ă  chercher Ă  remplir ces attributs selon l'information contenue dans les textes. C'est ainsi que se sont notamment dĂ©veloppĂ©es les recherches sur les « entitĂ©s nommĂ©es » (Ă  savoir le repĂ©rage de noms de personne, d'organisation, de lieu, de date, etc.) et sur les relations entre ces entitĂ©s. C'est aussi dans cette optique que se sont dĂ©veloppĂ©es les approches se situant au niveau du document, que ce soit pour la recherche d'information ou pour en dĂ©terminer la structur
    corecore