Deep Learning Approaches to Predict Future Frames in Videos

Abstract

Deep neural networks are becoming central in several areas of computer vision. While there have been a lot of studies regarding the classification of images and videos, future frame prediction is still a rarely investigated approach, and even some applications could make good use of the knowledge regarding the next frame of an image sequence in pixel-space. Examples include video compression and autonomous agents in robotics that have to act in natural environments. Learning how to forecast the future of an image sequence requires the system to understand and efficiently encode the content and dynamics for a certain period. It is viewed as a promising avenue from which even supervised tasks could benefit since labeled video data is limited and hard to obtain. Therefore, this work gives an overview of scientific advances covering future frame prediction and proposes a recurrent network model which utilizes recent techniques from deep learning research. The presented architecture is based on the recurrent decoder-encoder framework with convolutional cells, which allows the preservation of Spatio-temporal data correlations. Driven by perceptual-motivated objective functions and a modern recurrent learning strategy, it can outperform existing approaches concerning future frame generation in several video content types. All this can be achieved with fewer training iterations and model parameters

    Similar works