4 research outputs found

    Video Prediction by Efficient Transformers

    Full text link
    Video prediction is a challenging computer vision task that has a wide range of applications. In this work, we present a new family of Transformer-based models for video prediction. Firstly, an efficient local spatial-temporal separation attention mechanism is proposed to reduce the complexity of standard Transformers. Then, a full autoregressive model, a partial autoregressive model and a non-autoregressive model are developed based on the new efficient Transformer. The partial autoregressive model has a similar performance with the full autoregressive model but a faster inference speed. The non-autoregressive model not only achieves a faster inference speed but also mitigates the quality degradation problem of the autoregressive counterparts, but it requires additional parameters and loss function for learning. Given the same attention mechanism, we conducted a comprehensive study to compare the proposed three video prediction variants. Experiments show that the proposed video prediction models are competitive with more complex state-of-the-art convolutional-LSTM based models. The source code is available at https://github.com/XiYe20/VPTR.Comment: Accepted by Image and Vision Computing. arXiv admin note: text overlap with arXiv:2203.1583

    Long-Term Video Prediction via Criticization and Retrospection

    No full text
    Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction

    Long-Term Video Prediction via Criticization and Retrospection

    Full text link
    © 1992-2012 IEEE. Video prediction refers to predicting and generating future video frames given a set of consecutive frames. Conventional video prediction methods usually criticize the discrepancy between the ground-truth and predictions frame by frame. As the prediction error accumulates recursively, these methods would easily become out of control and are often confined to the short-term horizon. In this paper, we introduce a retrospection process to rectify the prediction errors beyond criticizing the future prediction. The introduced retrospection process is designed to look back what have been learned from the past and rectify the prediction deficiencies. To this end, we build a retrospection network to reconstruct the past frames given the currently predicted frames. A retrospection loss is introduced to push the retrospection frames being consistent with the observed frames, so that the prediction error is alleviated. On the other hand, an auxiliary route is built by reversing the flow of time and executing a similar retrospection. These two routes interact with each other to boost the performance of retrospection network and enhance the understanding of dynamics across frames, especially for the long-term horizon. An adversarial loss is employed to generate more realistic results in both prediction and retrospection process. In addition, the proposed method can be used to extend many state-of-the-art video prediction methods. Extensive experiments on the natural video dataset demonstrate the advantage of introducing the retrospection process for long-term video prediction
    corecore