A Framework to Generate and Label Synthetic/Real Video Data to Feed Temporal Segment Networks

Abstract

In this project, we propose an action prediction and a data generation pipeline. While, the former makes use of Deep Learning, the latter results in a pipeline that makes possible the generation of real and synthetic data. Moreover, to feed the deep learning method a large amount of annotated data is needed. For this purpose an action tagging tool is also featured. Furthermore, in order to supply the lack of data, we have also proposed a video data augmentation pipeline for action recognition purposes. While the 3DPLab team developed a photorealistic synthetic data generator called UnrealRox, we will use this system working with some sequences recorded with a mocap to generate the necessary synthetic data. We have generated a total of 5 different useful sequences with a complex setup of 3 kinects and a capture motion suit. Finally, we have deployed and tested the novel Temporal Segment Network with the state of the art Action Recognition dataset UCF-101

    Similar works