Search CORE

303 research outputs found

Annotating a Japanese text corpus with predicate-argument and coreference relations

Author: Kentaro Inui
Mamoru Komachi
Ryu Iida
Yuji Matsumoto
Publication venue
Publication date: 01/01/2007
Field of study

In this paper, we discuss how to annotate coreference and predicate-argument relations in Japanese written text. There have been research activities for building Japanese text corpora annotated with coreference and predicate-argument relations as are done in the Kyoto Text Corpus version 4.0 (Kawahara et al., 2002) and the GDA-Tagged Corpus (Hasida, 2005). However, there is still much room for refining their specifications. For this reason, we discuss issues in annotating these two types of relations, and propose a new specification for each. In accordance with the specification, we built a large-scaled annotated corpus, and examined its reliability. As a result of our current work, we have released an annotated corpus named the NAIST Text Corpus1, which is used as the evaluation data set in the coreference and zero-anaphora resolution tasks in Iida et al. (2005) and Iida et al. (2006).

CiteSeerX

Crossref

Building a Diverse Document Leads Corpus Annotated with Semantic Relations

Author: Hangyo Masatsugu
Kawahara Daisuke
Kurohashi Sadao
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

Utilizing Features of Verbs in Statistical Zero Pronoun Resolution for Japanese Speech

Author: Nagata Masaaki
Yoshida Sen
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Opinion Piece: Can we Fix the Scope for Coreference? Problems and Solutions for Benchmarks beyond OntoNotes

Author: Zeldes Amir
Publication venue: University of Illinois at Chicago Library
Publication date: 15/04/2022
Field of study

Current work on automatic coreference resolution has focused on the OntoNotes benchmark dataset, due to both its size and consistency. However many aspects of the OntoNotes annotation scheme are not well understood by NLP practitioners, including the treatment of generic NPs, noun modifiers, indefinite anaphora, predication and more. These often lead to counterintuitive claims, results and system behaviors. This opinion piece aims to highlight some of the problems with the OntoNotes rendition of coreference, and to propose a way forward relying on three principles: 1. a focus on semantics, not morphosyntax; 2. cross-linguistic generalizability; and 3. a separation of identity and scope, which can resolve old problems involving temporal and modal domain consistency

University of Illinois at Chicago: Journals@UIC

ANCOR_Centre, a Large Free Spoken French Coreference Corpus: description of the Resource and Reliability Measures

Author: Antoine Jean-Yves
Eshkol Iris
Lefeuvre Anaïs
Maurel Denis
Muzerelle Judith
Pelletier Aurore
Schang Emmanuel
Villaneau Jeanne
Publication venue: HAL CCSD
Publication date: 26/05/2014
Field of study

International audienceThis article presents ANCOR_Centre, a French coreference corpus, available under the Creative Commons Licence. With a size of around 500,000 words, the corpus is large enough to serve the needs of data-driven approaches in NLP and represents one of the largest coreference resources currently available. The corpus focuses exclusively on spoken language, it aims at representing a certain variety of spoken genders. ANCOR_Centre includes anaphora as well as coreference relations which involve nominal and pronominal mentions. The paper describes into details the annotation scheme and the reliability measures computed on the resource

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL Université de Tours

HAL-Rennes 1

Recipe instruction semantics corpus (RISeC) : resolving semantic structure and zero anaphora in recipes

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Jiang Yiwei
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

We propose a newly annotated dataset for information extraction on recipes. Unlike previous approaches to machine comprehension of procedural texts, we avoid a priori pre-defining domain-specific predicates to recognize (e.g., the primitive instructionsin MILK) and focus on basic understanding of the expressed semantics rather than directly reduce them to a simplified state representation (e.g., ProPara). We thus frame the semantic comprehension of procedural text such as recipes, as fairly generic NLP subtasks, covering (i) entity recognition (ingredients, tools and actions), (ii) relation extraction (what ingredients and tools are involved in the actions), and (iii) zero anaphora resolution (link actions to implicit arguments, e.g., results from previous recipe steps). Further, our Recipe Instruction Semantic Corpus (RISeC) dataset includes textual descriptions for the zero anaphora, to facilitate language generation thereof. Besides the dataset itself, we contribute a pipeline neural architecture that addresses entity and relation extractionas well an identification of zero anaphora. These basic building blocks can facilitate more advanced downstream applications (e.g., question answering, conversational agents)

Ghent University Academic Bibliography