721 research outputs found
Deep attentive video summarization with distribution consistency learning
This article studies supervised video summarization by formulating it into a sequence-to-sequence learning framework, in which the input and output are sequences of original video frames and their predicted importance scores, respectively. Two critical issues are addressed in this article: short-term contextual attention insufficiency and distribution inconsistency. The former lies in the insufficiency of capturing the short-term contextual attention information within the video sequence itself since the existing approaches focus a lot on the long-term encoder-decoder attention. The latter refers to the distributions of predicted importance score sequence and the ground-truth sequence is inconsistent, which may lead to a suboptimal solution. To better mitigate the first issue, we incorporate a self-attention mechanism in the encoder to highlight the important keyframes in a short-term context. The proposed approach alongside the encoder-decoder attention constitutes our deep attentive models for video summarization. For the second one, we propose a distribution consistency learning method by employing a simple yet effective regularization loss term, which seeks a consistent distribution for the two sequences. Our final approach is dubbed as Attentive and Distribution consistent video Summarization (ADSum). Extensive experiments on benchmark data sets demonstrate the superiority of the proposed ADSum approach against state-of-the-art approaches
Dilated Temporal Relational Adversarial Network for Generic Video Summarization
The large amount of videos popping up every day, make it more and more
critical that key information within videos can be extracted and understood in
a very short time. Video summarization, the task of finding the smallest subset
of frames, which still conveys the whole story of a given video, is thus of
great significance to improve efficiency of video understanding. We propose a
novel Dilated Temporal Relational Generative Adversarial Network (DTR-GAN) to
achieve frame-level video summarization. Given a video, it selects the set of
key frames, which contain the most meaningful and compact information.
Specifically, DTR-GAN learns a dilated temporal relational generator and a
discriminator with three-player loss in an adversarial manner. A new dilated
temporal relation (DTR) unit is introduced to enhance temporal representation
capturing. The generator uses this unit to effectively exploit global
multi-scale temporal context to select key frames and to complement the
commonly used Bi-LSTM. To ensure that summaries capture enough key video
representation from a global perspective rather than a trivial randomly shorten
sequence, we present a discriminator that learns to enforce both the
information completeness and compactness of summaries via a three-player loss.
The loss includes the generated summary loss, the random summary loss, and the
real summary (ground-truth) loss, which play important roles for better
regularizing the learned model to obtain useful summaries. Comprehensive
experiments on three public datasets show the effectiveness of the proposed
approach
Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation
Most recent approaches use the sequence-to-sequence model for paraphrase
generation. The existing sequence-to-sequence model tends to memorize the words
and the patterns in the training dataset instead of learning the meaning of the
words. Therefore, the generated sentences are often grammatically correct but
semantically improper. In this work, we introduce a novel model based on the
encoder-decoder framework, called Word Embedding Attention Network (WEAN). Our
proposed model generates the words by querying distributed word representations
(i.e. neural word embeddings), hoping to capturing the meaning of the according
words. Following previous work, we evaluate our model on two
paraphrase-oriented tasks, namely text simplification and short text
abstractive summarization. Experimental results show that our model outperforms
the sequence-to-sequence baseline by the BLEU score of 6.3 and 5.5 on two
English text simplification datasets, and the ROUGE-2 F1 score of 5.7 on a
Chinese summarization dataset. Moreover, our model achieves state-of-the-art
performances on these three benchmark datasets.Comment: arXiv admin note: text overlap with arXiv:1710.0231
- …