As demands for high-quality videos continue to rise, high-resolution and
high-dynamic range (HDR) imaging techniques are drawing attention. To generate
an HDR video from low dynamic range (LDR) images, one of the critical steps is
the motion compensation between LDR frames, for which most existing works
employed the optical flow algorithm. However, these methods suffer from flow
estimation errors when saturation or complicated motions exist. In this paper,
we propose an end-to-end HDR video composition framework, which aligns LDR
frames in the feature space and then merges aligned features into an HDR frame,
without relying on pixel-domain optical flow. Specifically, we propose a
luminance-based alignment network for HDR (LAN-HDR) consisting of an alignment
module and a hallucination module. The alignment module aligns a frame to the
adjacent reference by evaluating luminance-based attention, excluding color
information. The hallucination module generates sharp details, especially for
washed-out areas due to saturation. The aligned and hallucinated features are
then blended adaptively to complement each other. Finally, we merge the
features to generate a final HDR frame. In training, we adopt a temporal loss,
in addition to frame reconstruction losses, to enhance temporal consistency and
thus reduce flickering. Extensive experiments demonstrate that our method
performs better or comparable to state-of-the-art methods on several
benchmarks.Comment: ICCV 202