477 research outputs found
Vagueness and referential ambiguity in a large-scale annotated corpus
In this paper, we argue that difficulties in the definition of coreference itself contribute to lower inter-annotator agreement in certain cases. Data from a large referentially annotated corpus serves to corroborate this point, using a quantitative investigation to assess which effects or problems are likely to be the most prominent. Several examples where such problems occur are discussed in more detail, and we then propose a generalisation of Poesio, Reyle and Stevensonâs Justified Sloppiness Hypothesis to provide a unified model for these cases of disagreement and argue that a deeper understanding of the phenomena involved allows to tackle problematic cases in a more principled fashion than would be possible using only pre-theoretic intuitions
Unifying context with labeled property graph: A pipeline-based system for comprehensive text representation in NLP
Extracting valuable insights from vast amounts of unstructured digital text presents significant challenges across diverse domains. This research addresses this challenge by proposing a novel pipeline-based system that generates domain-agnostic and task-agnostic text representations. The proposed approach leverages labeled property graphs (LPG) to encode contextual information, facilitating the integration of diverse linguistic elements into a unified representation. The proposed system enables efficient graph-based querying and manipulation by addressing the crucial aspect of comprehensive context modeling and fine-grained semantics. The effectiveness of the proposed system is demonstrated through the implementation of NLP components that operate on LPG-based representations. Additionally, the proposed approach introduces specialized patterns and algorithms to enhance specific NLP tasks, including nominal mention detection, named entity disambiguation, event enrichments, event participant detection, and temporal link detection. The evaluation of the proposed approach, using the MEANTIME corpus comprising manually annotated documents, provides encouraging results and valuable insights into the system\u27s strengths. The proposed pipeline-based framework serves as a solid foundation for future research, aiming to refine and optimize LPG-based graph structures to generate comprehensive and semantically rich text representations, addressing the challenges associated with efficient information extraction and analysis in NLP
The Influence of Conceptual Number in Coreference Establishing: An ERP Study on Brazilian and European Portuguese
Number agreement depends on two kinds of information: grammatical and conceptual information. And, generally, they converge. However, for collective nouns, syntactic and conceptual number do not match. When collective nouns are involved in coreference establishing, the pronoun agrees with the nounâs conceptual number, thus creating a number disagreement (e.g. the bandSG played last night. TheyPL were great). This PhD Thesis aims to investigate how conceptual number affects coreference establishing and we explore such linguistic phenomena in both Brazilian (partial pro-drop) and European Portuguese (pro-drop). We also investigate whether intra and inter-sentential processing affects the way conceptual number influences coreference establishing
- âŚ