This contribution presents a deep learning method for the extraction and
fusion of information relating to kidney stone fragments acquired from
different viewpoints of the endoscope. Surface and section fragment images are
jointly used during the training of the classifier to improve the
discrimination power of the features by adding attention layers at the end of
each convolutional block. This approach is specifically designed to mimic the
morpho-constitutional analysis performed in ex-vivo by biologists to visually
identify kidney stones by inspecting both views. The addition of attention
mechanisms to the backbone improved the results of single view extraction
backbones by 4% on average. Moreover, in comparison to the state-of-the-art,
the fusion of the deep features improved the overall results up to 11% in terms
of kidney stone classification accuracy.Comment: This work has been submitted to the IEEE for possible publication.
Copyright may be transferred without notice, after which this version may no
longer be accessibl