Long future frame prediction using optical flow informed deep neural networks for enhancement of robotic teleoperation in high latency environments

Abstract

High latency in teleoperation has a significant negative impact on operator performance. While deep learning has revolutionized many domains recently, it has not previously been applied to teleoperation enhancement. We propose a novel approach to predict video frames deep into the future using neural networks informed by synthetically generated optical flow information. This can be employed in teleoperated robotic systems that rely on video feeds for operator situational awareness. We have used the image-to-image translation technique as a basis for the prediction of future frames. The Pix2Pix conditional generative adversarial network (cGAN) has been selected as a base network. Optical flow components reflecting real-time control inputs are added to the standard RGB channels of the input image. We have experimented with three data sets of 20,000 input images each that were generated using our custom-designed teleoperation simulator with a 500-ms delay added between the input and target frames. Structural Similarity Index Measures (SSIMs) of 0.60 and Multi-SSIMs of 0.68 were achieved when training the cGAN with three-channel RGB image data. With the five-channel input data (incorporating optical flow) these values improved to 0.67 and 0.74, respectively. Applying Fleiss\u27 κ gave a score of 0.40 for three-channel RGB data, and 0.55 for five-channel optical flow-added data. We are confident the predicted synthetic frames are of sufficient quality and reliability to be presented to teleoperators as a video feed that will enhance teleoperation. To the best of our knowledge, we are the first to attempt to reduce the impacts of latency through future frame prediction using deep neural networks

    Similar works