36 research outputs found
Pixel-level semantic understanding of ophthalmic images and beyond
Computer-assisted semantic image understanding constitutes the substrate of applications that range from biomarker detection to intraoperative guidance or street scene understanding for self-driving systems. This PhD thesis is on the development of deep learning-based, pixel-level, semantic segmentation methods for medical and natural images. For vessel segmentation in OCT-A, a method comprising iterative refinement of the extracted vessel maps and an auxiliary loss function that penalizes structural inaccuracies, is proposed and tested on data captured from real clinical conditions comprising various pathological cases. Ultimately, the presented method enables the extraction of a detailed vessel map of the retina with potential applications to diagnostics or intraoperative localization. Furthermore, for scene segmentation in cataract surgery, the major challenge of class imbalance is identified among several factors. Subsequently, a method addressing it is proposed, achieving state-of-the-art performance on a challenging public dataset. Accurate semantic segmentation in this domain can be used to monitor interactions between tools and anatomical parts for intraoperative guidance and safety. Finally, this thesis proposes a novel contrastive learning framework for supervised semantic segmentation, that aims to improve the discriminative power of features in deep neural networks. The proposed approach leverages contrastive loss function applied both at multiple model layers and across them. Importantly, the proposed framework is easy to combine with various model architectures and is experimentally shown to significantly improve performance on both natural and medical domain
On Improving Generalization of CNN-Based Image Classification with Delineation Maps Using the CORF Push-Pull Inhibition Operator
Deployed image classification pipelines are typically dependent on the images captured in real-world environments. This means that images might be affected by different sources of perturbations (e.g. sensor noise in low-light environments). The main challenge arises by the fact that image quality directly impacts the reliability and consistency of classification tasks. This challenge has, hence, attracted wide interest within the computer vision communities. We propose a transformation step that attempts to enhance the generalization ability of CNN models in the presence of unseen noise in the test set. Concretely, the delineation maps of given images are determined using the CORF push-pull inhibition operator. Such an operation transforms an input image into a space that is more robust to noise before being processed by a CNN. We evaluated our approach on the Fashion MNIST data set with an AlexNet model. It turned out that the proposed CORF-augmented pipeline achieved comparable results on noise-free images to those of a conventional AlexNet classification model without CORF delineation maps, but it consistently achieved significantly superior performance on test images perturbed with different levels of Gaussian and uniform noise
Gaze-Based Human-Robot Interaction by the Brunswick Model
We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered
Recommended from our members
Computational models of object motion detectors accelerated using FPGA technology
The detection of moving objects is a trivial task when performed by vertebrate retinas, yet a complex computer vision task. This PhD research programme has made three key contributions, namely: 1) a multi-hierarchical spiking neural network (MHSNN) architecture for detecting horizontal and vertical movements, 2) a Hybrid Sensitive Motion Detector (HSMD) algorithm for detecting object motion and 3) the Neuromorphic Hybrid Sensitive Motion Detector (NeuroHSMD) , a real-time neuromorphic implementation of the HSMD algorithm.
The MHSNN is a customised 4 layers Spiking Neural Network (SNN) architecture designed to reflect the basic connectivity, similar to canonical behaviours found in the majority of vertebrate retinas (including human retinas). The architecture, was trained using images from a custom dataset generated in laboratory settings. Simulation results revealed that each cell model is sensitive to vertical and horizontal movements, with a detection error of 6.75% contrasted against the teaching signals (expected output signals) used to train the MHSNN. The experimental evaluation of the methodology shows that the MH SNN was not scalable because of the overall number of neurons and synapses which lead to the development of the HSMD.
The HSMD algorithm enhanced an existing Dynamic Background subtraction (DBS) algorithm using a customised 3-layer SNN. The customised 3-layer SNN was used to stabilise the foreground information of moving objects in the scene, which improves the object motion detection. The algorithm was compared against existing background subtraction approaches, available on the Open Computer Vision (OpenCV) library, specifically on the 2012 Change Detection (CDnet2012) and the 2014 Change Detection (CDnet2014) benchmark datasets. The accuracy results show that the HSMD was ranked overall first and performed better than all the other benchmarked algorithms on four of the categories, across all eight test metrics. Furthermore, the HSMD is the first to use an SNN to enhance the existing dynamic background subtraction algorithm without a substantial degradation of the frame rate, being capable of processing images 720 × 480 at 13.82 Frames Per Second (fps) (CDnet2014) and 720 × 480 at 13.92 fps (CDnet2012) on a High Performance computer (96 cores and 756 GB of RAM). Although the HSMD analysis shows good Percentage of Correct Classifications (PCC) on the CDnet2012 and CDnet2014, it was identified that the 3-layer customised SNN was the bottleneck, in terms of speed, and could be improved using dedicated hardware.
The NeuroHSMD is thus an adaptation of the HSMD algorithm whereby the SNN component has been fully implemented on dedicated hardware [Terasic DE10-pro Field-Programmable Gate Array (FPGA) board]. Open Computer Language (OpenCL) was used to simplify the FPGA design flow and allow the code portability to other devices such as FPGA and Graphical Processing Unit (GPU). The NeuroHSMD was also tested against the CDnet2012 and CDnet2014 datasets with an acceleration of 82% over the HSMD algorithm, being capable of processing 720 × 480 images at 28.06 fps (CDnet2012) and 28.71 fps (CDnet2014)