Search CORE

118,613 research outputs found

Generation of anaphors in Chinese

Author: Yeh Ching-Long
Publication venue: The University of Edinburgh
Publication date: 01/01/1996
Field of study

The goal of this thesis is to investigate the computer generation of various kinds of anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬ mantic representation of multisentential text. The work is divided into two steps: the first is to investigate linguistic behaviour of Chinese anaphora, and the other is to implement the result of the first part in a Chinese natural language generation system to see how it works.The first step is in general to construct a set of rules governing the use of all kinds of anaphors. To achieve this, we performed a sequence of experiments in a stepwise refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated text and those generated by algorithms employing the rules, assuming the same semantic and discourse structures as the text. We started by distinguishing between the use of zero and other anaphors, termed non-zeroes. Then we performed experiments to distinguish between pronouns and nominal anaphors within the nonzeroes. Finally, we refined the previous result to consider different kinds of descriptions for nominal anaphors. In this research we confine ourselves to descriptive texts. Three sets of test data consisting of scientific questions and answers and an introduction to Chinese grammar were selected. The rules we obtained from the experiments make use of the following conditions: locality between anaphor and antecedent, syntactic constraints on zero anaphors, discourse segment structures, salience of objects and animacy of objects. The results show that the anaphors generated by using the rules we obtained are very close to those in the real texts.To carry out the second step, we built up a Chinese natural language generation system which is able to generate descriptive texts. The system is divided into a strategic and a tactical component. The strategic component arranges message contents in response to the input goal into a well-organised hierarchical discourse structure by using a text planner. The tactical component takes the hierarchical discourse structure as input and produces surface sentences with punctuation marks inserted appropriately. Within the tactical component, the first task consists of linearising in depth-first order the message units in the discourse structure and mapping them into syntactic-oriented representations. Referring expressions, the main concern in this thesis, are generated within the mapping process. A linguistic realisation program is then invoked to convert the syntactic representation into surface strings in Chinese.After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to investigate the quality of the generated anaphors. The results of the comparison show that the rules we obtained are effective in dealing with the generation of anaphors in Chinese

Edinburgh Research Archive

Guess who? Multilingual approach for the automated generation of author-stylized poetry

Author: Tikhonov Alexey
Yamshchikov Ivan P.
Publication venue
Publication date: 17/09/2018
Field of study

This paper addresses the problem of stylized text generation in a multilingual setup. A version of a language model based on a long short-term memory (LSTM) artificial neural network with extended phonetic and semantic embeddings is used for stylized poetry generation. The quality of the resulting poems generated by the network is estimated through bilingual evaluation understudy (BLEU), a survey and a new cross-entropy based metric that is suggested for the problems of such type. The experiments show that the proposed model consistently outperforms random sample and vanilla-LSTM baselines, humans also tend to associate machine generated texts with the target author

arXiv.org e-Print Archive

LCSTS: A Large Scale Chinese Short Text Summarization Dataset

Author: Chen Qingcai
Hu Baotian
Zhu Fangze
Publication venue
Publication date: 04/12/2015
Field of study

Automatic text summarization is widely regarded as the highly difficult problem, partially because of the lack of large text summarization data set. Due to the great challenge of constructing the large scale summaries for full text, in this paper, we introduce a large corpus of Chinese short text summarization dataset constructed from the Chinese microblogging website Sina Weibo, which is released to the public {http://icrc.hitsz.edu.cn/Article/show/139.html}. This corpus consists of over 2 million real Chinese short texts with short summaries given by the author of each text. We also manually tagged the relevance of 10,666 short summaries with their corresponding short texts. Based on the corpus, we introduce recurrent neural network for the summary generation and achieve promising results, which not only shows the usefulness of the proposed corpus for short text summarization research, but also provides a baseline for further research on this topic.Comment: Recently, we received feedbacks from Yuya Taguchi from NAIST in Japan and Qian Chen from USTC of China, that the results in the EMNLP2015 version seem to be underrated. So we carefully checked our results and find out that we made a mistake while using the standard ROUGE. Then we re-evaluate all methods in the paper and get corrected results listed in Table 2 of this versio

arXiv.org e-Print Archive

CiteSeerX

A Multilingual Study of Compressive Cross-Language Text Summarization

Author: E Linhares Pontes
F Boudin
HM de Caseli
J Zhang
Publication venue
Publication date: 01/01/2018
Field of study

Cross-Language Text Summarization (CLTS) generates summaries in a language different from the language of the source documents. Recent methods use information from both languages to generate summaries with the most informative sentences. However, these methods have performance that can vary according to languages, which can reduce the quality of summaries. In this paper, we propose a compressive framework to generate cross-language summaries. In order to analyze performance and especially stability, we tested our system and extractive baselines on a dataset available in four languages (English, French, Portuguese, and Spanish) to generate English and French summaries. An automatic evaluation showed that our method outperformed extractive state-of-art CLTS methods with better and more stable ROUGE scores for all languages

arXiv.org e-Print Archive

Crossref

PolyPublie

Implementing Open Access Policy: First case studies

Author: ARMBRUSTER Chris
Chris Armbruster (E-mail: [email protected])
Publication venue
Publication date: 31/03/2011
Field of study

When implementing open access, policy pioneers and flagship institutions alike have faced considerable challenges in meeting their own aims and achieving a recognized success. Legitimate authority, sufficient resources and the right timing are crucial, but the professionals charged with implementing policy still need several years to accomplish significant progress. This study defines a methodological standard for evaluating the first generation of open access policies. Evaluating implementation establishes evidence, enables reflection, and may foster the emergence of a second generation of open access policies. While the study is based on a small number of cases, these case studies cover most of the pioneer institutions, present the most significant issues and offer an international overview. Each case is reconstructed individually on the basis of public documents and background information, and supported by interviews with professionals responsible for open access implementation. This article presents the highlights from each case study. The results are utilized to indicate how a second generation of policies might define open access as a key component of digital research infrastructures that provide inputs and outputs for research, teaching and learning in real time.</p

National Science Library,Chinese Academy of Sciences

Recommended from our members

Landscapes and Sublime Memories: Revisiting Liang Xiaosheng's "A Land of Wonder and Mystery"

Author: Scruggs BM
Publication venue: eScholarship, University of California
Publication date: 01/12/2014
Field of study

This essay suggests memory studies, ecocriticism, and trauma studies as new avenues for the study of rusticated youth narratives. Towards reaching this goal, I first introduce a meditation on memory by Paul Ricoeur (1913-2005), especially his sketch of memory and imagination with classical Greek philosophy. His ideas on affective and practical memories are then telescoped into individual and communal memories. Onze Fleurs (Wo shiyi, 2011), directed by Wang Xiaoshuai (1966- ), and The River without Buoys (Meiyou hangbiao de heliu, 1984), directed by Wu Tianming (1939-2014) provide illustrative examples of each. Building upon these notions of personal memory I turn to the popular memory of rustication, especially that of the natural environment in Liang Xiaosheng's "A Land of Wonder and Mystery" ("Zhe shi yipian shenqi de tudi," 1985). More specifically I examine the evocation of the ghost marsh, narratives of departure, the family left in the city, and the menace of nature in Liang's short story to force not only a reconsideration of rustication, but also of nature in contemporary China. Moreover, in addition to noting the questioning of the sanitization of rusticated memories as a means of conforming to dominant state ideological discourses, I introduce a comparison of the story of doomed rusticated youth to the doomed youth in Sean Penn's Into the Wild, in order to force a comparison of youth and the environment often overlooked in rusticated youth studies. Finally, this essay concludes by suggesting that by more carefully considering the interplay between memory and place more nuanced and perhaps more ecologically and critically engaged assessments of rusticated youth fiction become possible

eScholarship - University of California

Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation

Author: Li Sujian
Li Wei
Li Wenjie
Ma Shuming
Ren Xuancheng
Sun Xu
Publication venue
Publication date: 01/01/2018
Field of study

Most recent approaches use the sequence-to-sequence model for paraphrase generation. The existing sequence-to-sequence model tends to memorize the words and the patterns in the training dataset instead of learning the meaning of the words. Therefore, the generated sentences are often grammatically correct but semantically improper. In this work, we introduce a novel model based on the encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our proposed model generates the words by querying distributed word representations (i.e. neural word embeddings), hoping to capturing the meaning of the according words. Following previous work, we evaluate our model on two paraphrase-oriented tasks, namely text simplification and short text abstractive summarization. Experimental results show that our model outperforms the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a Chinese summarization dataset. Moreover, our model achieves state-of-the-art performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231

arXiv.org e-Print Archive

Crossref