303 research outputs found

    Annotating a Japanese text corpus with predicate-argument and coreference relations

    Get PDF
    In this paper, we discuss how to annotate coreference and predicate-argument relations in Japanese written text. There have been research activities for building Japanese text corpora annotated with coreference and predicate-argument relations as are done in the Kyoto Text Corpus version 4.0 (Kawahara et al., 2002) and the GDA-Tagged Corpus (Hasida, 2005). However, there is still much room for refining their specifications. For this reason, we discuss issues in annotating these two types of relations, and propose a new specification for each. In accordance with the specification, we built a large-scaled annotated corpus, and examined its reliability. As a result of our current work, we have released an annotated corpus named the NAIST Text Corpus1, which is used as the evaluation data set in the coreference and zero-anaphora resolution tasks in Iida et al. (2005) and Iida et al. (2006).

    Building a Diverse Document Leads Corpus Annotated with Semantic Relations

    Get PDF

    Utilizing Features of Verbs in Statistical Zero Pronoun Resolution for Japanese Speech

    Get PDF
    PACLIC 23 / City University of Hong Kong / 3-5 December 200

    Opinion Piece: Can we Fix the Scope for Coreference? Problems and Solutions for Benchmarks beyond OntoNotes

    Get PDF
    Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency. However many aspects of the OntoNotes annotation scheme are not well understood by NLP practitioners, including the treatment of generic NPs, noun modifiers, indefinite anaphora, predication and more. These often lead to counterintuitive claims, results and system behaviors. This opinion piece aims to highlight some of the problems with the OntoNotes rendition of coreference, and to propose a way forward relying on three principles: 1. a focus on semantics, not morphosyntax; 2. cross-linguistic generalizability; and 3. a separation of identity and scope, which can resolve old problems involving temporal and modal domain consistency

    ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

    Get PDF
    International audienceThis article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource

    Recipe instruction semantics corpus (RISeC) : resolving semantic structure and zero anaphora in recipes

    Get PDF
    We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructionsin MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation (e.g., ProPara). We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps). Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extractionas well an identification of zero anaphora. These basic building blocks can facilitate more advanced downstream applications (e.g., question answering, conversational agents)
    • …
    corecore