Understanding a surgical scene is crucial for computer-assisted surgery
systems to provide any intelligent assistance functionality. One way of
achieving this scene understanding is via scene segmentation, where every pixel
of a frame is classified and therefore identifies the visible structures and
tissues. Progress on fully segmenting surgical scenes has been made using
machine learning. However, such models require large amounts of annotated
training data, containing examples of all relevant object classes. Such fully
annotated datasets are hard to create, as every pixel in a frame needs to be
annotated by medical experts and, therefore, are rarely available. In this
work, we propose a method to combine multiple partially annotated datasets,
which provide complementary annotations, into one model, enabling better scene
segmentation and the use of multiple readily available datasets. Our method
aims to combine available data with complementary labels by leveraging mutual
exclusive properties to maximize information. Specifically, we propose to use
positive annotations of other classes as negative samples and to exclude
background pixels of binary annotations, as we cannot tell if they contain a
class not annotated but predicted by the model. We evaluate our method by
training a DeepLabV3 on the publicly available Dresden Surgical Anatomy
Dataset, which provides multiple subsets of binary segmented anatomical
structures. Our approach successfully combines 6 classes into one model,
increasing the overall Dice Score by 4.4% compared to an ensemble of models
trained on the classes individually. By including information on multiple
classes, we were able to reduce confusion between stomach and colon by 24%. Our
results demonstrate the feasibility of training a model on multiple datasets.
This paves the way for future work further alleviating the need for one large,
fully segmented datasets.Comment: Accepted at IPCAI 2024; submitted to IJCARS (under revision