Search CORE

2,122 research outputs found

Generation of anaphors in Chinese

Author: Yeh Ching-Long
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

The goal of this thesis is to investigate the computer generation of various kinds of anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬ mantic representation of multisentential text. The work is divided into two steps: the first is to investigate linguistic behaviour of Chinese anaphora, and the other is to implement the result of the first part in a Chinese natural language generation system to see how it works.The first step is in general to construct a set of rules governing the use of all kinds of anaphors. To achieve this, we performed a sequence of experiments in a stepwise refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated text and those generated by algorithms employing the rules, assuming the same semantic and discourse structures as the text. We started by distinguishing between the use of zero and other anaphors, termed non-zeroes. Then we performed experiments to distinguish between pronouns and nominal anaphors within the nonzeroes. Finally, we refined the previous result to consider different kinds of descriptions for nominal anaphors. In this research we confine ourselves to descriptive texts. Three sets of test data consisting of scientific questions and answers and an introduction to Chinese grammar were selected. The rules we obtained from the experiments make use of the following conditions: locality between anaphor and antecedent, syntactic constraints on zero anaphors, discourse segment structures, salience of objects and animacy of objects. The results show that the anaphors generated by using the rules we obtained are very close to those in the real texts.To carry out the second step, we built up a Chinese natural language generation system which is able to generate descriptive texts. The system is divided into a strategic and a tactical component. The strategic component arranges message contents in response to the input goal into a well-organised hierarchical discourse structure by using a text planner. The tactical component takes the hierarchical discourse structure as input and produces surface sentences with punctuation marks inserted appropriately. Within the tactical component, the first task consists of linearising in depth-first order the message units in the discourse structure and mapping them into syntactic-oriented representations. Referring expressions, the main concern in this thesis, are generated within the mapping process. A linguistic realisation program is then invoked to convert the syntactic representation into surface strings in Chinese.After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to investigate the quality of the generated anaphors. The results of the comparison show that the rules we obtained are effective in dealing with the generation of anaphors in Chinese

Edinburgh Research Archive

A Survey on Semantic Processing Techniques

Author: Cambria Erik
Chen Guanyi
He Kai
Mao Rui
Ni Jinjie
Yang Zonglin
Zhang Xulang
Publication venue
Publication date: 22/10/2023
Field of study

Semantic processing is a fundamental research domain in computational linguistics. In the era of powerful pre-trained language models and large language models, the advancement of research in this domain appears to be decelerating. However, the study of semantics is multi-dimensional in linguistics. The research depth and breadth of computational semantic processing can be largely improved with new technologies. In this survey, we analyzed five semantic processing tasks, e.g., word sense disambiguation, anaphora resolution, named entity recognition, concept extraction, and subjectivity detection. We study relevant theoretical research in these fields, advanced methods, and downstream applications. We connect the surveyed tasks with downstream applications because this may inspire future scholars to fuse these low-level semantic processing tasks with high-level natural language processing tasks. The review of theoretical research may also inspire new tasks and technologies in the semantic processing domain. Finally, we compare the different semantic processing techniques and summarize their technical trends, application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN 1566-2535. The equal contribution mark is missed in the published version due to the publication policies. Please contact Prof. Erik Cambria for detail

arXiv.org e-Print Archive

Centering, Anaphora Resolution, and Discourse Structure

Author: Walker Marilyn A.
Publication venue
Publication date: 11/08/1997
Field of study

Centering was formulated as a model of the relationship between attentional state, the form of referring expressions, and the coherence of an utterance within a discourse segment (Grosz, Joshi and Weinstein, 1986; Grosz, Joshi and Weinstein, 1995). In this chapter, I argue that the restriction of centering to operating within a discourse segment should be abandoned in order to integrate centering with a model of global discourse structure. The within-segment restriction causes three problems. The first problem is that centers are often continued over discourse segment boundaries with pronominal referring expressions whose form is identical to those that occur within a discourse segment. The second problem is that recent work has shown that listeners perceive segment boundaries at various levels of granularity. If centering models a universal processing phenomenon, it is implausible that each listener is using a different centering algorithm.The third issue is that even for utterances within a discourse segment, there are strong contrasts between utterances whose adjacent utterance within a segment is hierarchically recent and those whose adjacent utterance within a segment is linearly recent. This chapter argues that these problems can be eliminated by replacing Grosz and Sidner's stack model of attentional state with an alternate model, the cache model. I show how the cache model is easily integrated with the centering algorithm, and provide several types of data from naturally occurring discourses that support the proposed integrated model. Future work should provide additional support for these claims with an examination of a larger corpus of naturally occurring discourses.Comment: 35 pages, uses elsart12, lingmacros, named, psfi

arXiv.org e-Print Archive

CiteSeerX

Using Zero Anaphora Resolution to Improve Text Categorization

Author: Chen Yi-Chun
Yeh Ching-Long
Publication venue: COLIPS PUBLICATIONS
Publication date: 01/01/2003
Field of study

Waseda University Repository

A Framework for Interpreting Bridging Anaphora

Author: C. Butnariu
D. Bean
D. Ó Séaghdha
I. Hendrickx
J. Levi
J.N. Levi
J.R. Hobbs
K. Fraurud
M. Lauer
M. Poesio
P. Downing
R. Girju
R. Vieira
S. Tratz
S.-N. Kim
S.N. Kim
T. Sanders
Publication venue: Springer
Publication date: 01/01/2013
Field of study

In this paper we present a novel framework for resolving bridging anaphora.We argue that anaphora, particularly bridging anaphora, is used as a shortcut device similar to the use of compound nouns. Hence, the two natural language usage phenomena would have to be based on the same theoretical framework. We use an existing theory on compound nouns to test its validity for anaphora usages. To do this, we used hu- man annotators to interpret indirect anaphora from naturally occurring discourses. The annotators were asked to classify the relations between anaphor-antecedent pairs into relation types that have been previously used to describe the relations between a modi er and the head noun of a compound noun. We obtained very encouraging results with an average Fleiss's value of 0.66 for inter-annotation agreement. The results were evaluated against other similar natural language interpretation annota- tion experiments and were found to compare well. In order to determine the prevalence of the proposed set of anaphora relations we did a detailed analysis of a subset 20 newspaper articles. The results obtained from this also indicated that a majority (98%) of the relations could be described by the relations in the framework. The results from this analysis also showed the distribution of the relation types in the genre of news paper article discourses

Crossref

AUT Scholarly Commons