24,658 research outputs found

    A Hierarchical Neural Autoencoder for Paragraphs and Documents

    Full text link
    Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at www.stanford.edu/~jiweil/

    Move Forward and Tell: A Progressive Generator of Video Descriptions

    Full text link
    We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -- what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201

    TEACHING THEME AND THEMATIC PROGRESSION TO TOURISM STUDENTS AND ITS IMPLICATIONS ON THEIR WRITINGS

    Get PDF
    Teaching writing to tourism students is challenging. Many students of diploma programs in hotel and tourism tend to focus to building practical skills such as cooking techniques or cutting methods, setting up table for lunch or dinner, or how to handle check-in or checkout, etc. They ‘hate’ writing exercises because they believe they will start career at operational level in the tourism industry where academic writing is not needed. As the result of this belief they lack of ability to expand idea or topic and what they can only produce short and undeveloped paragraphs. This paper discusses one alternative to approach to teaching writing i.e. using thematic progression. Using Halliday (2004) model students are introduced to the concept of information structure then to theme and rheme, and finally to thematic progression of texts. Then, the teacher guides students to identify how text is developed through its theme and rheme and to identify the types thematic progressions. Two writing tests were given i.e. pre and post teaching to see if there is any different in terms of text development, paragraph coherence, paragraph structure, thematic progression, and focus of text. A set of questions to measure students’ perception toward the lessons was also administered. The results show that students can manage to produce longer better-developed and cohesive paragraphs. Students’ positive perception toward theme-rheme and thematic progression concepts enable them to expand idea into longer text

    Automatic generation of large-scale paraphrases

    Get PDF
    Research on paraphrase has mostly focussed on lexical or syntactic variation within individual sentences. Our concern is with larger-scale paraphrases, from multiple sentences or paragraphs to entire documents. In this paper we address the problem of generating paraphrases of large chunks of texts. We ground our discussion through a worked example of extending an existing NLG system to accept as input a source text, and to generate a range of fluent semantically-equivalent alternatives, varying not only at the lexical and syntactic levels, but also in document structure and layout

    An Exploratory Application of Rhetorical Structure Theory to Detect Coherence Errors in L2 English Writing: Possible Implications for Automated Writing Evaluation Software

    Get PDF
    This paper presents an initial attempt to examine whether Rhetorical Structure Theory (RST) (Mann & Thompson, 1988) can be fruitfully applied to the detection of the coherence errors made by Taiwanese low-intermediate learners of English. This investigation is considered warranted for three reasons. First, other methods for bottom-up coherence analysis have proved ineffective (e.g., Watson Todd et al., 2007). Second, this research provides a preliminary categorization of the coherence errors made by first language (L1) Chinese learners of English. Third, second language discourse errors in general have received little attention in applied linguistic research. The data are 45 written samples from the LTTC English Learner Corpus, a Taiwanese learner corpus of English currently under construction. The rationale of this study is that diagrams which violate some of the rules of RST diagram formation will point to coherence errors. No reliability test has been conducted since this work is at an initial stage. Therefore, this study is exploratory and results are preliminary. Results are discussed in terms of the practicality of using this method to detect coherence errors, their possible consequences about claims for a typical inductive content order in the writing of L1 Chinese learners of English, and their potential implications for Automated Writing Evaluation (AWE) software, since discourse organization is one of the essay characteristics assessed by this software. In particular, the extent to which the kinds of errors detected through the RST analysis match those located by Criterion (Burstein, Chodorow, & Leachock, 2004), a well-known AWE software by Educational Testing Service (ETS), is discussed
    • …
    corecore