Multi-document person name resolution

Abstract

Multi-document person name resolution focuses on the problem of determining if two instances with the same name and from different documents refer to the same individual. We present a two-step approach in which a Maximum Entropy model is trained to give the probability that two names refer to the same individual. We then apply a modified agglomerative clustering technique to partition the instances according to their referents. 1 Intro Artists and philosophers have long noted that multiple distinct entities are often referred to by one and the same name (Cohen and Cohen, 1998; Martinich, 2000). Recently, this referential ambiguity of names has become of increasing concern to computational linguists, as well. As the Internet increases in size and coverage, it becomes less and less likely that a single name will refer to the same individual on two different web sites. This poses a great challenge to information retrieval (IR) and question-answering (QA) applications, which often rely on little data when responding to user queries. Another area in which referential ambiguity is problematic involves the automatic population of ontologies with instances. For such tasks, conceptinstance pairs (such as Paul Simon/pop star) are extracted from the web, cleaned of noise, and then inserted into an already existing ontology. Th

    Similar works

    Full text

    thumbnail-image

    Available Versions