118,613 research outputs found
Generation of anaphors in Chinese
The goal of this thesis is to investigate the computer generation of various kinds of
anaphors in Chinese, including zero, pronominal and nominal anaphors, from the se¬
mantic representation of multisentential text. The work is divided into two steps: the
first is to investigate linguistic behaviour of Chinese anaphora, and the other is to
implement the result of the first part in a Chinese natural language generation system
to see how it works.The first step is in general to construct a set of rules governing the use of all kinds
of anaphors. To achieve this, we performed a sequence of experiments in a stepwise
refined manner. In the experiments, we examined the occurrence of anaphors in humangenerated
text and those generated by algorithms employing the rules, assuming the
same semantic and discourse structures as the text. We started by distinguishing
between the use of zero and other anaphors, termed non-zeroes. Then we performed
experiments to distinguish between pronouns and nominal anaphors within the nonzeroes.
Finally, we refined the previous result to consider different kinds of descriptions
for nominal anaphors. In this research we confine ourselves to descriptive texts. Three
sets of test data consisting of scientific questions and answers and an introduction to
Chinese grammar were selected. The rules we obtained from the experiments make
use of the following conditions: locality between anaphor and antecedent, syntactic
constraints on zero anaphors, discourse segment structures, salience of objects and
animacy of objects. The results show that the anaphors generated by using the rules
we obtained are very close to those in the real texts.To carry out the second step, we built up a Chinese natural language generation system
which is able to generate descriptive texts. The system is divided into a strategic and
a tactical component. The strategic component arranges message contents in response
to the input goal into a well-organised hierarchical discourse structure by using a
text planner. The tactical component takes the hierarchical discourse structure as
input and produces surface sentences with punctuation marks inserted appropriately.
Within the tactical component, the first task consists of linearising in depth-first order
the message units in the discourse structure and mapping them into syntactic-oriented
representations. Referring expressions, the main concern in this thesis, are generated
within the mapping process. A linguistic realisation program is then invoked to convert
the syntactic representation into surface strings in Chinese.After the implementation, we sent some generated texts to a number of native speakers of Chinese and compared human-created results and computer-generated text to
investigate the quality of the generated anaphors. The results of the comparison show
that the rules we obtained are effective in dealing with the generation of anaphors in
Chinese
Guess who? Multilingual approach for the automated generation of author-stylized poetry
This paper addresses the problem of stylized text generation in a
multilingual setup. A version of a language model based on a long short-term
memory (LSTM) artificial neural network with extended phonetic and semantic
embeddings is used for stylized poetry generation. The quality of the resulting
poems generated by the network is estimated through bilingual evaluation
understudy (BLEU), a survey and a new cross-entropy based metric that is
suggested for the problems of such type. The experiments show that the proposed
model consistently outperforms random sample and vanilla-LSTM baselines, humans
also tend to associate machine generated texts with the target author
LCSTS: A Large Scale Chinese Short Text Summarization Dataset
Automatic text summarization is widely regarded as the highly difficult
problem, partially because of the lack of large text summarization data set.
Due to the great challenge of constructing the large scale summaries for full
text, in this paper, we introduce a large corpus of Chinese short text
summarization dataset constructed from the Chinese microblogging website Sina
Weibo, which is released to the public
{http://icrc.hitsz.edu.cn/Article/show/139.html}. This corpus consists of over
2 million real Chinese short texts with short summaries given by the author of
each text. We also manually tagged the relevance of 10,666 short summaries with
their corresponding short texts. Based on the corpus, we introduce recurrent
neural network for the summary generation and achieve promising results, which
not only shows the usefulness of the proposed corpus for short text
summarization research, but also provides a baseline for further research on
this topic.Comment: Recently, we received feedbacks from Yuya Taguchi from NAIST in Japan
and Qian Chen from USTC of China, that the results in the EMNLP2015 version
seem to be underrated. So we carefully checked our results and find out that
we made a mistake while using the standard ROUGE. Then we re-evaluate all
methods in the paper and get corrected results listed in Table 2 of this
versio
A Multilingual Study of Compressive Cross-Language Text Summarization
Cross-Language Text Summarization (CLTS) generates summaries in a language
different from the language of the source documents. Recent methods use
information from both languages to generate summaries with the most informative
sentences. However, these methods have performance that can vary according to
languages, which can reduce the quality of summaries. In this paper, we propose
a compressive framework to generate cross-language summaries. In order to
analyze performance and especially stability, we tested our system and
extractive baselines on a dataset available in four languages (English, French,
Portuguese, and Spanish) to generate English and French summaries. An automatic
evaluation showed that our method outperformed extractive state-of-art CLTS
methods with better and more stable ROUGE scores for all languages
Implementing Open Access Policy: First case studies
When implementing open access, policy pioneers and flagship institutions alike have faced considerable challenges in meeting their own aims and achieving a recognized success. Legitimate authority, sufficient resources and the right timing are crucial, but the professionals charged with implementing policy still need several years to accomplish significant progress. This study defines a methodological standard for evaluating the first generation of open access policies. Evaluating implementation establishes evidence, enables reflection, and may foster the emergence of a second generation of open access policies. While the study is based on a small number of cases, these case studies cover most of the pioneer institutions, present the most significant issues and offer an international overview. Each case is reconstructed individually on the basis of public documents and background information, and supported by interviews with professionals responsible for open access implementation. This article presents the highlights from each case study. The results are utilized to indicate how a second generation of policies might define open access as a key component of digital research infrastructures that provide inputs and outputs for research, teaching and learning in real time.</p
Recommended from our members
Landscapes and Sublime Memories: Revisiting Liang Xiaosheng's "A Land of Wonder and Mystery"
This essay suggests memory studies, ecocriticism, and trauma studies as new avenues for the study of rusticated youth narratives. Towards reaching this goal, I first introduce a meditation on memory by Paul Ricoeur (1913-2005), especially his sketch of memory and imagination with classical Greek philosophy. His ideas on affective and practical memories are then telescoped into individual and communal memories. Onze Fleurs (Wo shiyi, 2011), directed by Wang Xiaoshuai (1966- ), and The River without Buoys (Meiyou hangbiao de heliu, 1984), directed by Wu Tianming (1939-2014) provide illustrative examples of each. Building upon these notions of personal memory I turn to the popular memory of rustication, especially that of the natural environment in Liang Xiaosheng's "A Land of Wonder and Mystery" ("Zhe shi yipian shenqi de tudi," 1985). More specifically I examine the evocation of the ghost marsh, narratives of departure, the family left in the city, and the menace of nature in Liang's short story to force not only a reconsideration of rustication, but also of nature in contemporary China. Moreover, in addition to noting the questioning of the sanitization of rusticated memories as a means of conforming to dominant state ideological discourses, I introduce a comparison of the story of doomed rusticated youth to the doomed youth in Sean Penn's Into the Wild, in order to force a comparison of youth and the environment often overlooked in rusticated youth studies. Finally, this essay concludes by suggesting that by more carefully considering the interplay between memory and place more nuanced and perhaps more ecologically and critically engaged assessments of rusticated youth fiction become possible
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Most recent approaches use the sequence-to-sequence model for paraphrase
generation. The existing sequence-to-sequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphrase-oriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves state-of-the-art
performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231
- …