2,122 research outputs found
Generation of anaphors in Chinese
The goal of this thesis is to investigate the computer generation of various kinds of
anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬
mantic representation of multisentential text. The work is divided into two steps: the
first is to investigate linguistic behaviour of Chinese anaphora, and the other is to
implement the result of the first part in a Chinese natural language generation system
to see how it works.The first step is in general to construct a set of rules governing the use of all kinds
of anaphors. To achieve this, we performed a sequence of experiments in a stepwise
refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated
text and those generated by algorithms employing the rules, assuming the
same semantic and discourse structures as the text. We started by distinguishing
between the use of zero and other anaphors, termed non-zeroes. Then we performed
experiments to distinguish between pronouns and nominal anaphors within the nonzeroes.
Finally, we refined the previous result to consider different kinds of descriptions
for nominal anaphors. In this research we confine ourselves to descriptive texts. Three
sets of test data consisting of scientific questions and answers and an introduction to
Chinese grammar were selected. The rules we obtained from the experiments make
use of the following conditions: locality between anaphor and antecedent, syntactic
constraints on zero anaphors, discourse segment structures, salience of objects and
animacy of objects. The results show that the anaphors generated by using the rules
we obtained are very close to those in the real texts.To carry out the second step, we built up a Chinese natural language generation system
which is able to generate descriptive texts. The system is divided into a strategic and
a tactical component. The strategic component arranges message contents in response
to the input goal into a well-organised hierarchical discourse structure by using a
text planner. The tactical component takes the hierarchical discourse structure as
input and produces surface sentences with punctuation marks inserted appropriately.
Within the tactical component, the first task consists of linearising in depth-first order
the message units in the discourse structure and mapping them into syntactic-oriented
representations. Referring expressions, the main concern in this thesis, are generated
within the mapping process. A linguistic realisation program is then invoked to convert
the syntactic representation into surface strings in Chinese.After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to
investigate the quality of the generated anaphors. The results of the comparison show
that the rules we obtained are effective in dealing with the generation of anaphors in
Chinese
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Centering, Anaphora Resolution, and Discourse Structure
Centering was formulated as a model of the relationship between attentional
state, the form of referring expressions, and the coherence of an utterance
within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and
Weinstein, 1995). In this chapter, I argue that the restriction of centering to
operating within a discourse segment should be abandoned in order to integrate
centering with a model of global discourse structure. The within-segment
restriction causes three problems. The first problem is that centers are often
continued over discourse segment boundaries with pronominal referring
expressions whose form is identical to those that occur within a discourse
segment. The second problem is that recent work has shown that listeners
perceive segment boundaries at various levels of granularity. If centering
models a universal processing phenomenon, it is implausible that each listener
is using a different centering algorithm.The third issue is that even for
utterances within a discourse segment, there are strong contrasts between
utterances whose adjacent utterance within a segment is hierarchically recent
and those whose adjacent utterance within a segment is linearly recent. This
chapter argues that these problems can be eliminated by replacing Grosz and
Sidner's stack model of attentional state with an alternate model, the cache
model. I show how the cache model is easily integrated with the centering
algorithm, and provide several types of data from naturally occurring
discourses that support the proposed integrated model. Future work should
provide additional support for these claims with an examination of a larger
corpus of naturally occurring discourses.Comment: 35 pages, uses elsart12, lingmacros, named, psfi
A Framework for Interpreting Bridging Anaphora
In this paper we present a novel framework for resolving bridging anaphora.We argue that anaphora, particularly bridging anaphora, is used as a shortcut device similar to the use of compound nouns. Hence, the two natural language usage phenomena would have to be based on the same theoretical framework. We use an existing theory on compound nouns to test its validity for anaphora usages. To do this, we used hu- man annotators to interpret indirect anaphora from naturally occurring discourses. The annotators were asked to classify the relations between anaphor-antecedent pairs into relation types that have been previously used to describe the relations between a modi er and the head noun of a compound noun. We obtained very encouraging results with an average Fleiss's value of 0.66 for inter-annotation agreement. The results were evaluated against other similar natural language interpretation annota- tion experiments and were found to compare well. In order to determine the prevalence of the proposed set of anaphora relations we did a detailed analysis of a subset 20 newspaper articles. The results obtained from this also indicated that a majority (98%) of the relations could be described by the relations in the framework. The results from this analysis also showed the distribution of the relation types in the genre of news paper article discourses
- …