5,246 research outputs found

    Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis

    Full text link
    Machine learning (ML) algorithms have made a tremendous impact in the field of medical imaging. While medical imaging datasets have been growing in size, a challenge for supervised ML algorithms that is frequently mentioned is the lack of annotated data. As a result, various methods which can learn with less/other types of supervision, have been proposed. We review semi-supervised, multiple instance, and transfer learning in medical imaging, both in diagnosis/detection or segmentation tasks. We also discuss connections between these learning scenarios, and opportunities for future research.Comment: Submitted to Medical Image Analysi

    A Review of Co-saliency Detection Technique: Fundamentals, Applications, and Challenges

    Full text link
    Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community. As a novel branch of visual saliency, co-saliency detection refers to the discovery of common and salient foregrounds from two or more relevant images, and can be widely used in many computer vision tasks. The existing co-saliency detection algorithms mainly consist of three components: extracting effective features to represent the image regions, exploring the informative cues or factors to characterize co-saliency, and designing effective computational frameworks to formulate co-saliency. Although numerous methods have been developed, the literature is still lacking a deep review and evaluation of co-saliency detection techniques. In this paper, we aim at providing a comprehensive review of the fundamentals, challenges, and applications of co-saliency detection. Specifically, we provide an overview of some related computer vision works, review the history of co-saliency detection, summarize and categorize the major algorithms in this research area, discuss some open issues in this area, present the potential applications of co-saliency detection, and finally point out some unsolved challenges and promising future works. We expect this review to be beneficial to both fresh and senior researchers in this field, and give insights to researchers in other related areas regarding the utility of co-saliency detection algorithms.Comment: 28 pages, 12 figures, 3 table

    Deep Patch Learning for Weakly Supervised Object Classification and Discovery

    Full text link
    Patch-level image representation is very important for object classification and detection, since it is robust to spatial transformation, scale variation, and cluttered background. Many existing methods usually require fine-grained supervisions (e.g., bounding-box annotations) to learn patch features, which requires a great effort to label images may limit their potential applications. In this paper, we propose to learn patch features via weak supervisions, i.e., only image-level supervisions. To achieve this goal, we treat images as bags and patches as instances to integrate the weakly supervised multiple instance learning constraints into deep neural networks. Also, our method integrates the traditional multiple stages of weakly supervised object classification and discovery into a unified deep convolutional neural network and optimizes the network in an end-to-end way. The network processes the two tasks object classification and discovery jointly, and shares hierarchical deep features. Through this jointly learning strategy, weakly supervised object classification and discovery are beneficial to each other. We test the proposed method on the challenging PASCAL VOC datasets. The results show that our method can obtain state-of-the-art performance on object classification, and very competitive results on object discovery, with faster testing speed than competitors.Comment: Accepted by Pattern Recognitio

    Incorporating Network Built-in Priors in Weakly-supervised Semantic Segmentation

    Full text link
    Pixel-level annotations are expensive and time consuming to obtain. Hence, weak supervision using only image tags could have a significant impact in semantic segmentation. Recently, CNN-based methods have proposed to fine-tune pre-trained networks using image tags. Without additional information, this leads to poor localization accuracy. This problem, however, was alleviated by making use of objectness priors to generate foreground/background masks. Unfortunately these priors either require pixel-level annotations/bounding boxes, or still yield inaccurate object boundaries. Here, we propose a novel method to extract accurate masks from networks pre-trained for the task of object recognition, thus forgoing external objectness modules. We first show how foreground/background masks can be obtained from the activations of higher-level convolutional layers of a network. We then show how to obtain multi-class masks by the fusion of foreground/background ones with information extracted from a weakly-supervised localization network. Our experiments evidence that exploiting these masks in conjunction with a weakly-supervised training loss yields state-of-the-art tag-based weakly-supervised semantic segmentation results.Comment: 14 pages, 11 figures, 8 tables, Accepted in IEEE Transaction on Pattern Analysis and Machine Intelligence (IEEE TPAMI

    Weakly Supervised Learning of Affordances

    Full text link
    Localizing functional regions of objects or affordances is an important aspect of scene understanding. In this work, we cast the problem of affordance segmentation as that of semantic image segmentation. In order to explore various levels of supervision, we introduce a pixel-annotated affordance dataset of 3090 images containing 9916 object instances with rich contextual information in terms of human-object interactions. We use a deep convolutional neural network within an expectation maximization framework to take advantage of weakly labeled data like image level annotations or keypoint annotations. We show that a further reduction in supervision is possible with a minimal loss in performance when human pose is used as context

    Data Efficient and Weakly Supervised Computational Pathology on Whole Slide Images

    Full text link
    The rapidly emerging field of computational pathology has the potential to enable objective diagnosis, therapeutic response prediction and identification of new morphological features of clinical relevance. However, deep learning-based computational pathology approaches either require manual annotation of gigapixel whole slide images (WSIs) in fully-supervised settings or thousands of WSIs with slide-level labels in a weakly-supervised setting. Moreover, whole slide level computational pathology methods also suffer from domain adaptation and interpretability issues. These challenges have prevented the broad adaptation of computational pathology for clinical and research purposes. Here we present CLAM - Clustering-constrained attention multiple instance learning, an easy-to-use, high-throughput, and interpretable WSI-level processing and learning method that only requires slide-level labels while being data efficient, adaptable and capable of handling multi-class subtyping problems. CLAM is a deep-learning-based weakly-supervised method that uses attention-based learning to automatically identify sub-regions of high diagnostic value in order to accurately classify the whole slide, while also utilizing instance-level clustering over the representative regions identified to constrain and refine the feature space. In three separate analyses, we demonstrate the data efficiency and adaptability of CLAM and its superior performance over standard weakly-supervised classification. We demonstrate that CLAM models are interpretable and can be used to identify well-known and new morphological features. We further show that models trained using CLAM are adaptable to independent test cohorts, cell phone microscopy images, and biopsies. CLAM is a general-purpose and adaptable method that can be used for a variety of different computational pathology tasks in both clinical and research settings

    Hand Pose Estimation through Semi-Supervised and Weakly-Supervised Learning

    Full text link
    We propose a method for hand pose estimation based on a deep regressor trained on two different kinds of input. Raw depth data is fused with an intermediate representation in the form of a segmentation of the hand into parts. This intermediate representation contains important topological information and provides useful cues for reasoning about joint locations. The mapping from raw depth to segmentation maps is learned in a semi/weakly-supervised way from two different datasets: (i) a synthetic dataset created through a rendering pipeline including densely labeled ground truth (pixelwise segmentations); and (ii) a dataset with real images for which ground truth joint positions are available, but not dense segmentations. Loss for training on real images is generated from a patch-wise restoration process, which aligns tentative segmentation maps with a large dictionary of synthetic poses. The underlying premise is that the domain shift between synthetic and real data is smaller in the intermediate representation, where labels carry geometric and topological meaning, than in the raw input domain. Experiments on the NYU dataset show that the proposed training method decreases error on joints over direct regression of joints from depth data by 15.7%.Comment: 13 pages, 10 figures, 4 table

    Salient Object Detection in the Deep Learning Era: An In-Depth Survey

    Full text link
    As an essential problem in computer vision, salient object detection (SOD) has attracted an increasing amount of research attention over the years. Recent advances in SOD are predominantly led by deep learning-based solutions (named deep SOD). To enable in-depth understanding of deep SOD, in this paper, we provide a comprehensive survey covering various aspects, ranging from algorithm taxonomy to unsolved issues. In particular, we first review deep SOD algorithms from different perspectives, including network architecture, level of supervision, learning paradigm, and object-/instance-level detection. Following that, we summarize and analyze existing SOD datasets and evaluation metrics. Then, we benchmark a large group of representative SOD models, and provide detailed analyses of the comparison results. Moreover, we study the performance of SOD algorithms under different attribute settings, which has not been thoroughly explored previously, by constructing a novel SOD dataset with rich attribute annotations covering various salient object types, challenging factors, and scene categories. We further analyze, for the first time in the field, the robustness of SOD models to random input perturbations and adversarial attacks. We also look into the generalization and difficulty of existing SOD datasets. Finally, we discuss several open issues of SOD and outline future research directions.Comment: Published on IEEE TPAMI. All the saliency prediction maps, our constructed dataset with annotations, and codes for evaluation are publicly available at \url{https://github.com/wenguanwang/SODsurvey

    Keypoint Based Weakly Supervised Human Parsing

    Full text link
    Fully convolutional networks (FCN) have achieved great success in human parsing in recent years. In conventional human parsing tasks, pixel-level labeling is required for guiding the training, which usually involves enormous human labeling efforts. To ease the labeling efforts, we propose a novel weakly supervised human parsing method which only requires simple object keypoint annotations for learning. We develop an iterative learning method to generate pseudo part segmentation masks from keypoint labels. With these pseudo masks, we train an FCN network to output pixel-level human parsing predictions. Furthermore, we develop a correlation network to perform joint prediction of part and object segmentation masks and improve the segmentation performance. The experiment results show that our weakly supervised method is able to achieve very competitive human parsing results. Despite our method only uses simple keypoint annotations for learning, we are able to achieve comparable performance with fully supervised methods which use the expensive pixel-level annotations

    Distantly Supervised Road Segmentation

    Full text link
    We present an approach for road segmentation that only requires image-level annotations at training time. We leverage distant supervision, which allows us to train our model using images that are different from the target domain. Using large publicly available image databases as distant supervisors, we develop a simple method to automatically generate weak pixel-wise road masks. These are used to iteratively train a fully convolutional neural network, which produces our final segmentation model. We evaluate our method on the Cityscapes dataset, where we compare it with a fully supervised approach. Further, we discuss the trade-off between annotation cost and performance. Overall, our distantly supervised approach achieves 93.8% of the performance of the fully supervised approach, while using orders of magnitude less annotation work.Comment: Accepted for ICCV workshop CVRSUAD201
    corecore