7 research outputs found

    Implicit reference to citations: a study of astronomy

    Get PDF
    The research in this paper presents results in the automatic classification of pronouns within articles into those which refer to cited research and those which do not. It also discusses the automatic linking of pronouns which do refer to citations to their corresponding citations. The current study focused on the pronoun they as used in papers in Astronomy journals. The paper describes a classifier trained on maximum entropy principles using features defined by the distance to preceding citations and the category of verbs associated to the pronoun under consideration

    A Dutch coreference resolution system with an evaluation on literary fiction

    Get PDF
    Coreference resolution is the task of identifying descriptions that refer to the same entity. In this paper we consider the task of entity coreference resolution for Dutch with a particular focus on literary texts. We make three main contributions. First, we propose a simplified annotation scheme to reduce annotation effort. This scheme is used for the annotation of a corpus of 107k tokens from 21 contemporary works of literature. Second, we present a rule-based coreference resolution system for Dutch based on the Stanford deterministic multi-sieve coreference architecture and heuristic rules for quote attribution. Our system (dutchcoref) forms a simple but strong baseline and improves on previous systems in shared task evaluations. Finally, we perform an evaluation and error analysis on literary texts which highlights difficult cases of coreference in general, and the literary domain in particular. The code of our system is made available at https://github.com/andreasvc/dutchcoref

    A Dutch coreference resolution system with an evaluation on literary fiction

    Get PDF
    Coreference resolution is the task of identifying descriptions that refer to the same entity. In this paper we consider the task of entity coreference resolution for Dutch with a particular focus on literary texts. We make three main contributions. First, we propose a simplified annotation scheme to reduce annotation effort. This scheme is used for the annotation of a corpus of 107k tokens from 21 contemporary works of literature. Second, we present a rule-based coreference resolution system for Dutch based on the Stanford deterministic multi-sieve coreference architecture and heuristic rules for quote attribution. Our system (dutchcoref) forms a simple but strong baseline and improves on previous systems in shared task evaluations. Finally, we perform an evaluation and error analysis on literary texts which highlights difficult cases of coreference in general, and the literary domain in particular. The code of our system is made available at https://github.com/andreasvc/dutchcoref

    Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog

    No full text
    We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog

    Automatic detection of nonreferential it in spoken multi-party dialog

    No full text
    We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing filter for a coreference resolution system. We also report results of an annotation study dealing with the classification of it by naive subjects

    Automatic Detection of Nonreferential It in Spoken Multi-Party Dialog

    No full text
    We present an implemented machine learning system for the automatic detection of nonreferential it in spoken dialog. The system builds on shallow features extracted from dialog transcripts. Our experiments indicate a level of performance that makes the system usable as a preprocessing filter for a coreference resolution system. We also report results of an annotation study dealing with the classification of it by naive subjects.
    corecore