Recent work on dense optical flow has shown significant progress, primarily
in a supervised learning manner requiring a large amount of labeled data. Due
to the expensiveness of obtaining large scale real-world data, computer
graphics are typically leveraged for constructing datasets. However, there is a
common belief that synthetic-to-real domain gaps limit generalization to real
scenes. In this paper, we show that the required characteristics in an optical
flow dataset are rather simple and present a simpler synthetic data generation
method that achieves a certain level of realism with compositions of elementary
operations. With 2D motion-based datasets, we systematically analyze the
simplest yet critical factors for generating synthetic datasets. Furthermore,
we propose a novel method of utilizing occlusion masks in a supervised method
and observe that suppressing gradients on occluded regions serves as a powerful
initial state in the curriculum learning sense. The RAFT network initially
trained on our dataset outperforms the original RAFT on the two most
challenging online benchmarks, MPI Sintel and KITTI 2015