231 research outputs found

    Stochastic Methods for Fine-Grained Image Segmentation and Uncertainty Estimation in Computer Vision

    Get PDF
    In this dissertation, we exploit concepts of probability theory, stochastic methods and machine learning to address three existing limitations of deep learning-based models for image understanding. First, although convolutional neural networks (CNN) have substantially improved the state of the art in image understanding, conventional CNNs provide segmentation masks that poorly adhere to object boundaries, a critical limitation for many potential applications. Second, training deep learning models requires large amounts of carefully selected and annotated data, but large-scale annotation of image segmentation datasets is often prohibitively expensive. And third, conventional deep learning models also lack the capability of uncertainty estimation, which compromises both decision making and model interpretability. To address these limitations, we introduce the Region Growing Refinement (RGR) algorithm, an unsupervised post-processing algorithm that exploits Monte Carlo sampling and pixel similarities to propagate high-confidence labels into regions of low-confidence classification. The probabilistic Region Growing Refinement (pRGR) provides RGR with a rigorous mathematical foundation that exploits concepts of Bayesian estimation and variance reduction techniques. Experiments demonstrate both the effectiveness of (p)RGR for the refinement of segmentation predictions, as well as its suitability for uncertainty estimation, since its variance estimates obtained in the Monte Carlo iterations are highly correlated with segmentation accuracy. We also introduce FreeLabel, an intuitive open-source web interface that exploits RGR to allow users to obtain high-quality segmentation masks with just a few freehand scribbles, in a matter of seconds. Designed to benefit the computer vision community, FreeLabel can be used for both crowdsourced or private annotation and has a modular structure that can be easily adapted for any image dataset. The practical relevance of methods developed in this dissertation are illustrated through applications on agricultural and healthcare-related domains. We have combined RGR and modern CNNs for fine segmentation of fruit flowers, motivated by the importance of automated bloom intensity estimation for optimization of fruit orchard management and, possibly, automatizing procedures such as flower thinning and pollination. We also exploited an early version of FreeLabel to annotate novel datasets for segmentation of fruit flowers, which are currently publicly available. Finally, this dissertation also describes works on fine segmentation and gaze estimation for images collected from assisted living environments, with the ultimate goal of assisting geriatricians in evaluating health status of patients in such facilities

    Frame-to-Frame Aggregation of Active Regions in Web Videos for Weakly Supervised Semantic Segmentation

    Full text link
    When a deep neural network is trained on data with only image-level labeling, the regions activated in each image tend to identify only a small region of the target object. We propose a method of using videos automatically harvested from the web to identify a larger region of the target object by using temporal information, which is not present in the static image. The temporal variations in a video allow different regions of the target object to be activated. We obtain an activated region in each frame of a video, and then aggregate the regions from successive frames into a single image, using a warping technique based on optical flow. The resulting localization maps cover more of the target object, and can then be used as proxy ground-truth to train a segmentation network. This simple approach outperforms existing methods under the same level of supervision, and even approaches relying on extra annotations. Based on VGG-16 and ResNet 101 backbones, our method achieves the mIoU of 65.0 and 67.4, respectively, on PASCAL VOC 2012 test images, which represents a new state-of-the-art.Comment: ICCV 201

    Liver segmentation using 3D CT scans.

    Get PDF
    Master of Science in Computer Science. University of KwaZulu-Natal, Durban, 2018.Abstract available in PDF file

    Dual Progressive Transformations for Weakly Supervised Semantic Segmentation

    Full text link
    Weakly supervised semantic segmentation (WSSS), which aims to mine the object regions by merely using class-level labels, is a challenging task in computer vision. The current state-of-the-art CNN-based methods usually adopt Class-Activation-Maps (CAMs) to highlight the potential areas of the object, however, they may suffer from the part-activated issues. To this end, we try an early attempt to explore the global feature attention mechanism of vision transformer in WSSS task. However, since the transformer lacks the inductive bias as in CNN models, it can not boost the performance directly and may yield the over-activated problems. To tackle these drawbacks, we propose a Convolutional Neural Networks Refined Transformer (CRT) to mine a globally complete and locally accurate class activation maps in this paper. To validate the effectiveness of our proposed method, extensive experiments are conducted on PASCAL VOC 2012 and CUB-200-2011 datasets. Experimental evaluations show that our proposed CRT achieves the new state-of-the-art performance on both the weakly supervised semantic segmentation task the weakly supervised object localization task, which outperform others by a large margin

    A Semi-Automated Approach to Medical Image Segmentation using Conditional Random Field Inference

    Full text link
    Medical image segmentation plays a crucial role in delivering effective patient care in various diagnostic and treatment modalities. Manual delineation of target volumes and all critical structures is a very tedious and highly time-consuming process and introduce uncertainties of treatment outcomes of patients. Fully automatic methods holds great promise for reducing cost and time, while at the same time improving accuracy and eliminating expert variability, yet there are still great challenges. Legally and ethically, human oversight must be integrated with ”smart tools” favoring a semi-automatic technique which can leverage the best aspects of both human and computer. In this work we show that we can formulate a semi-automatic framework for the segmentation problem by formulating it as an energy minimization problem in Conditional Random Field (CRF). We show that human input can be used as adaptive training data to condition a probabilistic boundary term modeled for the heterogeneous boundary characteristics of anatomical structures. We demonstrated that our method can effortlessly adapt to multiple structures and image modalities using a single CRF framework and tools to learn probabilistic terms interactively. To tackle a more difficult multi-class segmentation problem, we developed a new ensemble one-vs-rest graph cut algorithm. Each graph in the ensemble performs a simple and efficient bi-class (a target class vs the rest of the classes) segmentation. The final segmentation is obtained by majority vote. Our algorithm is both faster and more accurate when compared with the prior multi-class method which iteratively swaps classes. In this Thesis, we also include novel volumetric segmentation algorithms which employ deep learning and indicate how to synthesize our CRF framework with convolutional neural networks (CNN). This would allow incorporating user guidance into CNN based deep learning for this task. We think a deep learning based method interactively guided by human expert is the ideal solution for medical image segmentation