1,888 research outputs found

    A Hierarchical Neural Autoencoder for Paragraphs and Documents

    Full text link
    Natural language generation of coherent long texts like paragraphs or longer documents is a challenging problem for recurrent networks models. In this paper, we explore an important step toward this generation task: training an LSTM (Long-short term memory) auto-encoder to preserve and reconstruct multi-sentence paragraphs. We introduce an LSTM model that hierarchically builds an embedding for a paragraph from embeddings for sentences and words, then decodes this embedding to reconstruct the original paragraph. We evaluate the reconstructed paragraph using standard metrics like ROUGE and Entity Grid, showing that neural models are able to encode texts in a way that preserve syntactic, semantic, and discourse coherence. While only a first step toward generating coherent text units from neural models, our work has the potential to significantly impact natural language generation and summarization\footnote{Code for the three models described in this paper can be found at www.stanford.edu/~jiweil/

    Reinforced Video Captioning with Entailment Rewards

    Full text link
    Sequence-to-sequence models have shown promising improvements on the temporal task of video captioning, but they optimize word-level cross-entropy loss during training. First, using policy gradient and mixed-loss methods for reinforcement learning, we directly optimize sentence-level task-based metrics (as rewards), achieving significant improvements over the baseline, based on both automatic metrics and human evaluation on multiple datasets. Next, we propose a novel entailment-enhanced reward (CIDEnt) that corrects phrase-matching based metrics (such as CIDEr) to only allow for logically-implied partial matches and avoid contradictions, achieving further significant improvements over the CIDEr-reward model. Overall, our CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.Comment: EMNLP 2017 (9 pages

    Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application

    Full text link
    We present two novel models of document coherence and their application to information retrieval (IR). Both models approximate document coherence using discourse entities, e.g. the subject or object of a sentence. Our first model views text as a Markov process generating sequences of discourse entities (entity n-grams); we use the entropy of these entity n-grams to approximate the rate at which new information appears in text, reasoning that as more new words appear, the topic increasingly drifts and text coherence decreases. Our second model extends the work of Guinaudeau & Strube [28] that represents text as a graph of discourse entities, linked by different relations, such as their distance or adjacency in text. We use several graph topology metrics to approximate different aspects of the discourse flow that can indicate coherence, such as the average clustering or betweenness of discourse entities in text. Experiments with several instantiations of these models show that: (i) our models perform on a par with two other well-known models of text coherence even without any parameter tuning, and (ii) reranking retrieval results according to their coherence scores gives notable performance gains, confirming a relation between document coherence and relevance. This work contributes two novel models of document coherence, the application of which to IR complements recent work in the integration of document cohesiveness or comprehensibility to ranking [5, 56]

    Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries

    Get PDF
    In this article, we propose a method of text summary\u27s content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores

    Automatic summary evaluation. Roug e modifications

    Full text link
    Nowadays there is no common approach to summary. Manual evaluation is expensive and subjective and it is not applicable in real time or on a large corpus. Widely used approaches involve little human efforts and assume comparison with a set of reference summaries. We tried to overcome drawbacks of existing metrics such as ignoring redundant information, synonyms and sentence ordering. Our method combines edit distance, ROUGE-SU and trigrams similarity measure enriched by weights for different parts of speech and synonyms. Since nouns provide the most valuable information, each sentence is mapped into a set of nouns. If the normalized intersection of any pair is greater than a predefined threshold the sentences are penalized. Doing extracts there is no need to analyze sentence structure but sentence ordering is crucial. Sometimes it is impossible to compare sentence order with a gold standard. Therefore similarity between adjacent sentences may be used as a measure of text coherence. Chronological constraint violation should be penalized. Relevance score and readability assessment may be combined in the F-measure. In order to choose the best parameter values machine learning can be applied

    Move Forward and Tell: A Progressive Generator of Video Descriptions

    Full text link
    We present an efficient framework that can generate a coherent paragraph to describe a given video. Previous works on video captioning usually focus on video clips. They typically treat an entire video as a whole and generate the caption conditioned on a single embedding. On the contrary, we consider videos with rich temporal structures and aim to generate paragraph descriptions that can preserve the story flow while being coherent and concise. Towards this goal, we propose a new approach, which produces a descriptive paragraph by assembling temporally localized descriptions. Given a video, it selects a sequence of distinctive clips and generates sentences thereon in a coherent manner. Particularly, the selection of clips and the production of sentences are done jointly and progressively driven by a recurrent network -- what to describe next depends on what have been said before. Here, the recurrent network is learned via self-critical sequence training with both sentence-level and paragraph-level rewards. On the ActivityNet Captions dataset, our method demonstrated the capability of generating high-quality paragraph descriptions for videos. Compared to those by other methods, the descriptions produced by our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201
    corecore