77 research outputs found
Text as scene: discourse deixis and bridging relations
En este artículo se presenta un nuevo marco, “el texto como escena”, que establece
las bases para la anotación de dos relaciones de correferencia: la deixis discursiva y las
relaciones de bridging. La incorporación de lo que llamamos escenas textuales y contextuales
proporciona unas directrices de anotación más flexibles, que diferencian claramente entre tipos
de categorías generales. Un marco como éste, capaz de tratar la deixis discursiva y las
relaciones de bridging desde una perspectiva común, tiene como objetivo mejorar el bajo grado
de acuerdo entre anotadores obtenido por esquemas de anotación anteriores, que son incapaces
de captar las referencias vagas inherentes a estos dos tipos de relaciones. Las directrices aquí
presentadas completan el esquema de anotación diseñado para enriquecer el corpus español
CESS-ECE con información correferencial y así construir el corpus CESS-Ancora.This paper presents a new framework, “text as scene”, which lays the foundations for
the annotation of two coreferential links: discourse deixis and bridging relations. The
incorporation of what we call textual and contextual scenes provides more flexible annotation
guidelines, broad type categories being clearly differentiated. Such a framework that is capable
of dealing with discourse deixis and bridging relations from a common perspective aims at
improving the poor reliability scores obtained by previous annotation schemes, which fail to
capture the vague references inherent in both these links. The guidelines presented here
complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with
coreference information, thus building the CESS-Ancora corpus.This paper has been supported by the FPU
grant (AP2006-00994) from the Spanish
Ministry of Education and Science. It is based
on work supported by the CESS-ECE
(HUM2004-21127), Lang2World (TIN2006-
15265-C06-06), and Praxem (HUM2006-
27378-E) projects
Generating Abstractive Summaries from Meeting Transcripts
Summaries of meetings are very important as they convey the essential content
of discussions in a concise form. Generally, it is time consuming to read and
understand the whole documents. Therefore, summaries play an important role as
the readers are interested in only the important context of discussions. In
this work, we address the task of meeting document summarization. Automatic
summarization systems on meeting conversations developed so far have been
primarily extractive, resulting in unacceptable summaries that are hard to
read. The extracted utterances contain disfluencies that affect the quality of
the extractive summaries. To make summaries much more readable, we propose an
approach to generating abstractive summaries by fusing important content from
several utterances. We first separate meeting transcripts into various topic
segments, and then identify the important utterances in each segment using a
supervised learning approach. The important utterances are then combined
together to generate a one-sentence summary. In the text generation step, the
dependency parses of the utterances in each segment are combined together to
create a directed graph. The most informative and well-formed sub-graph
obtained by integer linear programming (ILP) is selected to generate a
one-sentence summary for each topic segment. The ILP formulation reduces
disfluencies by leveraging grammatical relations that are more prominent in
non-conversational style of text, and therefore generates summaries that is
comparable to human-written abstractive summaries. Experimental results show
that our method can generate more informative summaries than the baselines. In
addition, readability assessments by human judges as well as log-likelihood
estimates obtained from the dependency parser show that our generated summaries
are significantly readable and well-formed.Comment: 10 pages, Proceedings of the 2015 ACM Symposium on Document
Engineering, DocEng' 201
Text as Scene: Discourse Deixis and Bridging Relations
This paper presents a new framework, "text as scene", which lays the foundations for the annotation of two coreferential links: discourse deixis and bridging relations. The incorporation of what we call textual and contextual scenes provides more flexible annotation guidelines, broad type categories being clearly differentiated. Such a framework that is capable of dealing with discourse deixis and bridging relations from a common perspective aims at improving the poor reliability scores obtained by previous annotation schemes, which fail to capture the vague references inherent in both these links. The guidelines presented here complete the annotation scheme designed to enrich the Spanish CESS-ECE corpus with coreference information, thus building the CESS-Ancora corpus
Review of coreference resolution in English and Persian
Coreference resolution (CR) is one of the most challenging areas of natural
language processing. This task seeks to identify all textual references to the
same real-world entity. Research in this field is divided into coreference
resolution and anaphora resolution. Due to its application in textual
comprehension and its utility in other tasks such as information extraction
systems, document summarization, and machine translation, this field has
attracted considerable interest. Consequently, it has a significant effect on
the quality of these systems. This article reviews the existing corpora and
evaluation metrics in this field. Then, an overview of the coreference
algorithms, from rule-based methods to the latest deep learning techniques, is
provided. Finally, coreference resolution and pronoun resolution systems in
Persian are investigated.Comment: 44 pages, 11 figures, 5 table
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
'Healthy' Coreference: Applying Coreference Resolution to the Health Education Domain
This thesis investigates coreference and its resolution within the domain of health education. Coreference is the relationship between two linguistic expressions that refer to the same real-world entity, and resolution involves identifying this relationship among sets of referring expressions. The coreference resolution task is considered among the most difficult of problems in Artificial Intelligence; in some cases, resolution is impossible even for humans. For example, "she" in the sentence "Lynn called Jennifer while she was on vacation" is genuinely ambiguous: the vacationer could be either Lynn or Jennifer.
There are three primary motivations for this thesis. The first is that health education has never before been studied in this context. So far, the vast majority of coreference research has focused on news. Secondly, achieving domain-independent resolution is unlikely without understanding the extent to which coreference varies across different genres. Finally, coreference pervades language and is an essential part of coherent discourse. Its effective use is a key component of easy-to-understand health education materials, where readability is paramount.
No suitable corpus of health education materials existed, so our first step was to create one. The comprehensive analysis of this corpus, which required manual annotation of coreference, confirmed our hypothesis that the coreference used in health education differs substantially from that in previously studied domains. This analysis was then used to shape the design of a knowledge-lean algorithm for resolving coreference. This algorithm performed surprisingly well on this corpus, e.g., successfully resolving over 85% of all pronouns when evaluated on unseen data.
Despite the importance of coreferentially annotated corpora, only a handful are known to exist, likely because of the difficulty and cost of reliably annotating coreference. The paucity of genres represented in these existing annotated corpora creates an implicit bias in domain-independent coreference resolution. In an effort to address these issues, we plan to make our health education corpus available to the wider research community, hopefully encouraging a broader focus in the future
- …