Recording surgery in operating rooms is an essential task for education and
evaluation of medical treatment. However, recording the desired targets, such
as the surgery field, surgical tools, or doctor's hands, is difficult because
the targets are heavily occluded during surgery. We use a recording system in
which multiple cameras are embedded in the surgical lamp, and we assume that at
least one camera is recording the target without occlusion at any given time.
As the embedded cameras obtain multiple video sequences, we address the task of
selecting the camera with the best view of the surgery. Unlike the conventional
method, which selects the camera based on the area size of the surgery field,
we propose a deep neural network that predicts the camera selection probability
from multiple video sequences by learning the supervision of the expert
annotation. We created a dataset in which six different types of plastic
surgery are recorded, and we provided the annotation of camera switching. Our
experiments show that our approach successfully switched between cameras and
outperformed three baseline methods.Comment: MICCAI 202