19 research outputs found
Knowledge Graph Extraction from Videos
Nearly all existing techniques for automated video annotation (or captioning)
describe videos using natural language sentences. However, this has several
shortcomings: (i) it is very hard to then further use the generated natural
language annotations in automated data processing, (ii) generating natural
language annotations requires to solve the hard subtask of generating
semantically precise and syntactically correct natural language sentences,
which is actually unrelated to the task of video annotation, (iii) it is
difficult to quantitatively measure performance, as standard metrics (e.g.,
accuracy and F1-score) are inapplicable, and (iv) annotations are
language-specific. In this paper, we propose the new task of knowledge graph
extraction from videos, i.e., producing a description in the form of a
knowledge graph of the contents of a given video. Since no datasets exist for
this task, we also include a method to automatically generate them, starting
from datasets where videos are annotated with natural language. We then
describe an initial deep-learning model for knowledge graph extraction from
videos, and report results on MSVD* and MSR-VTT*, two datasets obtained from
MSVD and MSR-VTT using our method.Comment: 10 pages, 4 figure