2,125 research outputs found
Multi-Task Video Captioning with Video and Entailment Generation
Video captioning, the task of describing the content of a video, has seen
some promising improvements in recent years with sequence-to-sequence models,
but accurately learning the temporal and logical dynamics involved in the task
still remains a challenge, especially given the lack of sufficient annotated
data. We improve video captioning by sharing knowledge with two related
directed-generation tasks: a temporally-directed unsupervised video prediction
task to learn richer context-aware video encoder representations, and a
logically-directed language entailment generation task to learn better
video-entailed caption decoder representations. For this, we present a
many-to-many multi-task learning model that shares parameters across the
encoders and decoders of the three tasks. We achieve significant improvements
and the new state-of-the-art on several standard video captioning datasets
using diverse automatic and human evaluations. We also show mutual multi-task
improvements on the entailment generation task.Comment: ACL 2017 (14 pages w/ supplementary
Distinctive-attribute Extraction for Image Captioning
Image captioning, an open research issue, has been evolved with the progress
of deep neural networks. Convolutional neural networks (CNNs) and recurrent
neural networks (RNNs) are employed to compute image features and generate
natural language descriptions in the research. In previous works, a caption
involving semantic description can be generated by applying additional
information into the RNNs. In this approach, we propose a distinctive-attribute
extraction (DaE) which explicitly encourages significant meanings to generate
an accurate caption describing the overall meaning of the image with their
unique situation. Specifically, the captions of training images are analyzed by
term frequency-inverse document frequency (TF-IDF), and the analyzed semantic
information is trained to extract distinctive-attributes for inferring
captions. The proposed scheme is evaluated on a challenge data, and it improves
an objective performance while describing images in more detail.Comment: 14 main pages, 4 supplementary page
- …