170 research outputs found
Three Algorithms for Competence-Oriented Anaphor Resolution
In the last decade, much effort went into the design of robust third-person pronominal anaphor resolution algorithms. Typical approaches are reported to achieve an accuracy of 60-85%. Recent research addresses the question of how to deal with the remaining difficult-toresolve anaphors. Lappin (2004) proposes a sequenced model of anaphor resolution according to which a cascade of processing modules employing knowledge and inferencing techniques of increasing complexity should be applied. The individual modules should only deal with and, hence, recognize the subset of anaphors for which they are competent. It will be shown that the problem of focusing on the competence cases is equivalent to the problem of giving precision precedence over recall. Three systems for high precision robust knowledge-poor anaphor resolution will be designed and compared: a ruleset-based approach, a salience threshold approach, and a machine-learning-based approach. According to corpus-based evaluation, there is no unique best approach. Which approach scores highest depends upon type of pronominal anaphor as well as upon text genre
Using \u27Low-cost\u27 Learning Features for Pronoun Resolution
PACLIC / The University of the Philippines Visayas Cebu College Cebu City, Philippines / November 20-22, 200
Comparing knowledge sources for nominal anaphora resolution
We compare two ways of obtaining lexical knowledge for antecedent selection in other-anaphora
and definite noun phrase coreference. Specifically, we compare an algorithm that relies on links
encoded in the manually created lexical hierarchy WordNet and an algorithm that mines corpora
by means of shallow lexico-semantic patterns. As corpora we use the British National
Corpus (BNC), as well as the Web, which has not been previously used for this task. Our
results show that (a) the knowledge encoded in WordNet is often insufficient, especially for
anaphor-antecedent relations that exploit subjective or context-dependent knowledge; (b) for
other-anaphora, the Web-based method outperforms the WordNet-based method; (c) for definite
NP coreference, the Web-based method yields results comparable to those obtained using
WordNet over the whole dataset and outperforms the WordNet-based method on subsets of the
dataset; (d) in both case studies, the BNC-based method is worse than the other methods because
of data sparseness. Thus, in our studies, the Web-based method alleviated the lexical knowledge
gap often encountered in anaphora resolution, and handled examples with context-dependent relations
between anaphor and antecedent. Because it is inexpensive and needs no hand-modelling
of lexical knowledge, it is a promising knowledge source to integrate in anaphora resolution systems
Keyword Detection in Text Summarization
Summarization is the process of reducing a text document in order to create a summary that retains the most important points of the original document. As the problem of information overload has grown, and as the quantity of data has increased, so has interest in automatic summarization. Extractive summary works on the given text to extract sentences that best convey the message hidden in the text. Most extractive summarization techniques revolve around the concept of indexing keywords and extracting sentences that have more keywords than the rest. Keyword extraction usually is done by extracting important words having a higher frequency than others, with stress on important. However the current techniques to handle this importance include a stop list which might include words that are critically important to the text. In this thesis, I present a work in progress to define an algorithm to extract truly significant keywords which might have lost its significance if subjected to the current keyword extraction algorithms
- …