1,888 research outputs found
A Hierarchical Neural Autoencoder for Paragraphs and Documents
Natural language generation of coherent long texts like paragraphs or longer
documents is a challenging problem for recurrent networks models. In this
paper, we explore an important step toward this generation task: training an
LSTM (Long-short term memory) auto-encoder to preserve and reconstruct
multi-sentence paragraphs. We introduce an LSTM model that hierarchically
builds an embedding for a paragraph from embeddings for sentences and words,
then decodes this embedding to reconstruct the original paragraph. We evaluate
the reconstructed paragraph using standard metrics like ROUGE and Entity Grid,
showing that neural models are able to encode texts in a way that preserve
syntactic, semantic, and discourse coherence. While only a first step toward
generating coherent text units from neural models, our work has the potential
to significantly impact natural language generation and
summarization\footnote{Code for the three models described in this paper can be
found at www.stanford.edu/~jiweil/
Reinforced Video Captioning with Entailment Rewards
Sequence-to-sequence models have shown promising improvements on the temporal
task of video captioning, but they optimize word-level cross-entropy loss
during training. First, using policy gradient and mixed-loss methods for
reinforcement learning, we directly optimize sentence-level task-based metrics
(as rewards), achieving significant improvements over the baseline, based on
both automatic metrics and human evaluation on multiple datasets. Next, we
propose a novel entailment-enhanced reward (CIDEnt) that corrects
phrase-matching based metrics (such as CIDEr) to only allow for
logically-implied partial matches and avoid contradictions, achieving further
significant improvements over the CIDEr-reward model. Overall, our
CIDEnt-reward model achieves the new state-of-the-art on the MSR-VTT dataset.Comment: EMNLP 2017 (9 pages
Entropy and Graph Based Modelling of Document Coherence using Discourse Entities: An Application
We present two novel models of document coherence and their application to
information retrieval (IR). Both models approximate document coherence using
discourse entities, e.g. the subject or object of a sentence. Our first model
views text as a Markov process generating sequences of discourse entities
(entity n-grams); we use the entropy of these entity n-grams to approximate the
rate at which new information appears in text, reasoning that as more new words
appear, the topic increasingly drifts and text coherence decreases. Our second
model extends the work of Guinaudeau & Strube [28] that represents text as a
graph of discourse entities, linked by different relations, such as their
distance or adjacency in text. We use several graph topology metrics to
approximate different aspects of the discourse flow that can indicate
coherence, such as the average clustering or betweenness of discourse entities
in text. Experiments with several instantiations of these models show that: (i)
our models perform on a par with two other well-known models of text coherence
even without any parameter tuning, and (ii) reranking retrieval results
according to their coherence scores gives notable performance gains, confirming
a relation between document coherence and relevance. This work contributes two
novel models of document coherence, the application of which to IR complements
recent work in the integration of document cohesiveness or comprehensibility to
ranking [5, 56]
Mix Multiple Features to Evaluate the Content and the Linguistic Quality of Text Summaries
In this article, we propose a method of text summary\u27s content and linguistic quality evaluation that is based on a machine learning approach. This method operates by combining multiple features to build predictive models that evaluate the content and the linguistic quality of new summaries (unseen) constructed from the same source documents as the summaries used in the training and the validation of models. To obtain the best model, many single and ensemble learning classifiers are tested. Using the constructed models, we have achieved a good performance in predicting the content and the linguistic quality scores. In order to evaluate the summarization systems, we calculated the system score as the average of the score of summaries that are built from the same system. Then, we evaluated the correlation of the system score with the manual system score. The obtained correlation indicates that the system score outperforms the baseline scores
Automatic summary evaluation. Roug e modifications
Nowadays there is no common approach to summary. Manual evaluation is expensive and subjective and it is not applicable in real time or on a large corpus. Widely used approaches involve little human efforts and assume comparison with a set of reference summaries. We tried to overcome drawbacks of existing metrics such as ignoring redundant information, synonyms and sentence ordering. Our method combines edit distance, ROUGE-SU and trigrams similarity measure enriched by weights for different parts of speech and synonyms. Since nouns provide the most valuable information, each sentence is mapped into a set of nouns. If the normalized intersection of any pair is greater than a predefined threshold the sentences are penalized. Doing extracts there is no need to analyze sentence structure but sentence ordering is crucial. Sometimes it is impossible to compare sentence order with a gold standard. Therefore similarity between adjacent sentences may be used as a measure of text coherence. Chronological constraint violation should be penalized. Relevance score and readability assessment may be combined in the F-measure. In order to choose the best parameter values machine learning can be applied
Move Forward and Tell: A Progressive Generator of Video Descriptions
We present an efficient framework that can generate a coherent paragraph to
describe a given video. Previous works on video captioning usually focus on
video clips. They typically treat an entire video as a whole and generate the
caption conditioned on a single embedding. On the contrary, we consider videos
with rich temporal structures and aim to generate paragraph descriptions that
can preserve the story flow while being coherent and concise. Towards this
goal, we propose a new approach, which produces a descriptive paragraph by
assembling temporally localized descriptions. Given a video, it selects a
sequence of distinctive clips and generates sentences thereon in a coherent
manner. Particularly, the selection of clips and the production of sentences
are done jointly and progressively driven by a recurrent network -- what to
describe next depends on what have been said before. Here, the recurrent
network is learned via self-critical sequence training with both sentence-level
and paragraph-level rewards. On the ActivityNet Captions dataset, our method
demonstrated the capability of generating high-quality paragraph descriptions
for videos. Compared to those by other methods, the descriptions produced by
our method are often more relevant, more coherent, and more concise.Comment: Accepted by ECCV 201
- …