While object recognition in deep neural networks (DNN)
has shown remarkable success in natural images, endoscopic
images still cannot be fully analysed using DNNs, since
analysing endoscopic images must account for occlusion,
light reflection and image blur. UNet based deep convolutional
neural networks (DNNs) offer great potential to extract
high-level spatial features, thanks to its hierarchical nature
with multiple levels of abstraction, which is especially useful
for working with multimodal endoscopic images with white
light and fluoroscopy in the diagnosis of esophageal disease.
However, the currently reported inference time for DNNs is
above 200ms, which is unsuitable to integrate into robotic
control loops. This work addresses real-time object detection
and semantic segmentation in endoscopic devices. We
show that endoscopic assistive diagnosis can approach satisfy
detection rates with a fast inference time