1,245 research outputs found
Reference resolution in multi-modal interaction: Preliminary observations
In this paper we present our research on multimodal interaction in and with virtual environments. The aim of this presentation is to emphasize the necessity to spend more research on reference resolution in multimodal contexts. In multi-modal interaction the human conversational partner can apply more than one modality in conveying his or her message to the environment in which a computer detects and interprets signals from different modalities. We show some naturally arising problems but do not give general solutions. Rather we decide to perform more detailed research on reference resolution in uni-modal contexts to obtain methods generalizable to multi-modal contexts. Since we try to build applications for a Dutch audience and since hardly any research has been done on reference resolution for Dutch, we give results on the resolution of anaphoric and deictic references in Dutch texts. We hope to be able to extend these results to our multimodal contexts later
Modelling the flow of discourse in a corpus of written academic English
Discourse studies attempt to describe how context affects text, and how text progresses from
one sentence to the next. Systemic Functional Linguistics (SFL) offers a model of language
to describe how information flow varies according to context and co-text through the Textual
metafunction, especially using the functions of Participant Identification and Tracking,
Theme and Information Structure. These systems were evaluated by assembling a corpus of
academic texts and assessing their information flow. Results of the analysis of the three
grammatical systems in the Textual Metafunction demonstrate significant patterns, or
unmarked choices, where the participant, thematic and information systems combine to
powerful effect. Where the systems are not aligned, there is a recognisable effect on the flow
of information
Corpora for Computational Linguistics
Since the mid 90s corpora has become very important for computational linguistics. This paper offers a survey of how they are currently used in different fields of the discipline, with particular emphasis on anaphora and coreference resolution, automatic summarisation and term extraction.
Their influence on other fields is also briefly discussed
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Translation of Pronominal Anaphora between English and Spanish: Discrepancies and Evaluation
This paper evaluates the different tasks carried out in the translation of
pronominal anaphora in a machine translation (MT) system. The MT interlingua
approach named AGIR (Anaphora Generation with an Interlingua Representation)
improves upon other proposals presented to date because it is able to translate
intersentential anaphors, detect co-reference chains, and translate Spanish
zero pronouns into English---issues hardly considered by other systems. The
paper presents the resolution and evaluation of these anaphora problems in AGIR
with the use of different kinds of knowledge (lexical, morphological,
syntactic, and semantic). The translation of English and Spanish anaphoric
third-person personal pronouns (including Spanish zero pronouns) into the
target language has been evaluated on unrestricted corpora. We have obtained a
precision of 80.4% and 84.8% in the translation of Spanish and English
pronouns, respectively. Although we have only studied the Spanish and English
languages, our approach can be easily extended to other languages such as
Portuguese, Italian, or Japanese
Coreference chains in Czech, English and Russian: Preliminary findings
Tento článek je pilotní srovnavací výzkum koreferenčních řetězců v češtině, angličtině a ruštině. Podrobili jsme analýze 16 srovnatelných textů ve třech jazycích. Naší motivací bylo zjistit lingvistickou strukturu koreferenčních řetězců v těchto jazycích a určit, které faktory ovlivňují tuto strukturu
Investigating Multilingual Coreference Resolution by Universal Annotations
Multilingual coreference resolution (MCR) has been a long-standing and
challenging task. With the newly proposed multilingual coreference dataset,
CorefUD (Nedoluzhko et al., 2022), we conduct an investigation into the task by
using its harmonized universal morphosyntactic and coreference annotations.
First, we study coreference by examining the ground truth data at different
linguistic levels, namely mention, entity and document levels, and across
different genres, to gain insights into the characteristics of coreference
across multiple languages. Second, we perform an error analysis of the most
challenging cases that the SotA system fails to resolve in the CRAC 2022 shared
task using the universal annotations. Last, based on this analysis, we extract
features from universal morphosyntactic annotations and integrate these
features into a baseline system to assess their potential benefits for the MCR
task. Our results show that our best configuration of features improves the
baseline by 0.9% F1 score.Comment: Accepted at Findings of EMNLP202
Towards Multilingual Coreference Resolution
The current work investigates the problems that occur when coreference resolution is considered as a multilingual task. We assess the issues that arise when a framework using the mention-pair coreference resolution model and memory-based learning for the resolution process are used. Along the way, we revise three essential subtasks of coreference resolution: mention detection, mention head detection and feature selection. For each of these aspects we propose various multilingual solutions including both heuristic, rule-based and machine learning methods. We carry out a detailed analysis that includes eight different languages (Arabic, Catalan, Chinese, Dutch, English, German, Italian and Spanish) for which datasets were provided by the only two multilingual shared tasks on coreference resolution held so far: SemEval-2 and CoNLL-2012. Our investigation shows that, although complex, the coreference resolution task can be targeted in a multilingual and even language independent way. We proposed machine learning methods for each of the subtasks that are affected by the transition, evaluated and compared them to the performance of rule-based and heuristic approaches. Our results confirmed that machine learning provides the needed flexibility for the multilingual task and that the minimal requirement for a language independent system is a part-of-speech annotation layer provided for each of the approached languages. We also showed that the performance of the system can be improved by introducing other layers of linguistic annotations, such as syntactic parses (in the form of either constituency or dependency parses), named entity information, predicate argument structure, etc. Additionally, we discuss the problems occurring in the proposed approaches and suggest possibilities for their improvement
- …