45,122 research outputs found

    Spatio-Temporal Representation for Reasoning with Action Genome

    Get PDF
    Representing Spatio-temporal information in videos has proven to be a difficult task compared to action recognition in videos involving multiple actions. A single activity consists many smaller actions that can provide a better understanding of the activity. This paper tries to represent the varying information in a scene-graph format in order to answer temporal questions to obtain improved insights for the video, resulting in a directed temporal information graph. This project will use the Action Genome dataset, which is a variation of the charades dataset, to capture pairwise relationships in a graph. The model performs significantly better than the benchmark results of the dataset providing state-of-the-art results in predicate classification. The paper presents a novel Spatio-temporal scene graph for videos, represented as a directed acyclic graph that maximizes the information in the scene. The results obtained in the counting task suggest some interesting finds that are described in the paper. The graph can be used for reasoning with a much lower computational requirement explored in this work among other downstream tasks such as video captioning, action recognition, and more, trying to bridge the gap between videos and textual analysis
    • …
    corecore