694 research outputs found
Anaphora Resolution with Real Preprocessing
In this paper we focus on anaphora resolution for German, a highly inflected language which also allows for closed form compounds (i.e. compounds without spaces). Especially, we describe a system that only uses real preprocessing components, e.g. a dependency parser, a two-level morphological analyser etc. We trace the performance drop occurring under these conditions back to underspecification and ambiguity at the morphological level. A demanding subtask of anaphora resolution are the so-called bridging anaphora, a special variant of nominal anaphora where the heads of the coreferent noun phrases do not match. We experiment with two different resources in order to find out how to cope best with this problem
Real Anaphora Resolution is Hard
We introduce a system for anaphora resolution for German that uses various resources in order to develop a real system as opposed to systems based on idealized assumptions, e.g. the use of true mentions only or perfect parse trees and perfect morphology. The components that we use to replace such idealizations comprise a full-fledged morphology, a Wikipedia-based named entity recognition, a rule-based dependency parser and a German wordnet. We show that under these conditions coreference resolution is (at least for German) still far from being perfect
Evaluating anaphora and coreference resolution to improve automatic keyphrase extraction
In this paper we analyze the effectiveness of using linguistic knowledge from coreference and
anaphora resolution for improving the performance for supervised keyphrase extraction. In order
to verify the impact of these features, we de\ufb01ne a baseline keyphrase extraction system and
evaluate its performance on a standard dataset using different machine learning algorithms. Then,
we consider new sets of features by adding combinations of the linguistic features we propose
and we evaluate the new performance of the system. We also use anaphora and coreference
resolution to transform the documents, trying to simulate the cohesion process performed by the
human mind. We found that our approach has a slightly positive impact on the performance of
automatic keyphrase extraction, in particular when considering the ranking of the results
Resolving pronominal anaphora using commonsense knowledge
Coreference resolution is the task of resolving all expressions in a text that refer to the same entity. Such expressions are often used in writing and speech as shortcuts to avoid repetition. The most frequent form of coreference is the anaphor. To resolve anaphora not only grammatical and syntactical strategies are required, but also semantic approaches should be taken into consideration. This dissertation presents a framework for automatically resolving pronominal anaphora by integrating recent findings from the field of linguistics with new semantic features. Commonsense knowledge is the routine knowledge people have of the everyday world. Because such knowledge is widely used it is frequently omitted from social communications such as texts. It is understandable that without this knowledge computers will have difficulty making sense of textual information. In this dissertation a new set of computational and linguistic features are used in a supervised learning approach to resolve the pronominal anaphora in document. Commonsense knowledge sources such as ConceptNet and WordNet are used and similarity measures are extracted to uncover the elaborative information embedded in the words that can help in the process of anaphora resolution. The anaphoric system is tested on 350 Wall Street Journal articles from the BBN corpus. When compared with other systems available such as BART (Versley et al. 2008) and Charniak and Elsner 2009, our system performed better and also resolved a much wider range of anaphora. We were able to achieve a 92% F-measure on the BBN corpus and an average of 85% F-measure when tested on other genres of documents such as children stories and short stories selected from the web
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
- …