38 research outputs found

    Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection

    Get PDF
    Deep Convolutional Neural Networks (DCNNs) commonly use generic `max-pooling' (MP) layers to extract deformation-invariant features, but we argue in favor of a more refined treatment. First, we introduce epitomic convolution as a building block alternative to the common convolution-MP cascade of DCNNs; while having identical complexity to MP, Epitomic Convolution allows for parameter sharing across different filters, resulting in faster convergence and better generalization. Second, we introduce a Multiple Instance Learning approach to explicitly accommodate global translation and scaling when training a DCNN exclusively with class labels. For this we rely on a `patchwork' data structure that efficiently lays out all image scales and positions as candidates to a DCNN. Factoring global and local deformations allows a DCNN to `focus its resources' on the treatment of non-rigid deformations and yields a substantial classification accuracy improvement. Third, further pursuing this idea, we develop an efficient DCNN sliding window object detector that employs explicit search over position, scale, and aspect ratio. We provide competitive image classification and localization results on the ImageNet dataset and object detection results on the Pascal VOC 2007 benchmark.Comment: 13 pages, 7 figures, 5 tables. arXiv admin note: substantial text overlap with arXiv:1406.273

    Pushing the Boundaries of Boundary Detection using Deep Learning

    Get PDF
    In this work we show that adapting Deep Convolutional Neural Network training to the task of boundary detection can result in substantial improvements over the current state-of-the-art in boundary detection. Our contributions consist firstly in combining a careful design of the loss for boundary detection training, a multi-resolution architecture and training with external data to improve the detection accuracy of the current state of the art. When measured on the standard Berkeley Segmentation Dataset, we improve theoptimal dataset scale F-measure from 0.780 to 0.808 - while human performance is at 0.803. We further improve performance to 0.813 by combining deep learning with grouping, integrating the Normalized Cuts technique within a deep network. We also examine the potential of our boundary detector in conjunction with the task of semantic segmentation and demonstrate clear improvements over state-of-the-art systems. Our detector is fully integrated in the popular Caffe framework and processes a 320x420 image in less than a second.Comment: The previous version reported large improvements w.r.t. the LPO region proposal baseline, which turned out to be due to a wrong computation for the baseline. The improvements are currently less important, and are omitted. We are sorry if the reported results caused any confusion. We have also integrated reviewer feedback regarding human performance on the BSD benchmar

    Deep Learning for Semantic Part Segmentation with High-Level Guidance

    Full text link
    In this work we address the task of segmenting an object into its parts, or semantic part segmentation. We start by adapting a state-of-the-art semantic segmentation system to this task, and show that a combination of a fully-convolutional Deep CNN system coupled with Dense CRF labelling provides excellent results for a broad range of object categories. Still, this approach remains agnostic to high-level constraints between object parts. We introduce such prior information by means of the Restricted Boltzmann Machine, adapted to our task and train our model in an discriminative fashion, as a hidden CRF, demonstrating that prior information can yield additional improvements. We also investigate the performance of our approach ``in the wild'', without information concerning the objects' bounding boxes, using an object detector to guide a multi-scale segmentation scheme. We evaluate the performance of our approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing and face labelling respectively. We show superior performance with respect to competitive methods that have been extensively engineered on these benchmarks, as well as realistic qualitative results on part segmentation, even for occluded or deformable objects. We also provide quantitative and extensive qualitative results on three classes from the PASCAL Parts dataset. Finally, we show that our multi-scale segmentation scheme can boost accuracy, recovering segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table

    Application of the sliding window method and Mask-RCNN method to nuclear recognition in oral cytology

    Get PDF
    Background: We aimed to develop an artificial intelligence (AI)-assisted oral cytology method, similar to cervical cytology. We focused on the detection of cell nuclei because the ratio of cell nuclei to cytoplasm increases with increasing cell malignancy. As an initial step in the development of AI-assisted cytology, we investigated two methods for the automatic detection of cell nuclei in blue-stained cells in cytopreparation images.Methods: We evaluated the usefulness of the sliding window method (SWM) and mask region-based convolutional neural network (Mask-RCNN) in identifying the cell nuclei in oral cytopreparation images. Thirty cases of liquid-based oral cytology were analyzed. First, we performed the SWM by dividing each image into 96 Ă— 96 pixels. Overall, 591 images with or without blue-stained cell nuclei were prepared as the training data and 197 as the test data (total: 1,576 images). Next, we performed the Mask-RCNN by preparing 130 images of Class II and III lesions and creating mask images showing cell regions based on these images.Results: Using the SWM method, the highest detection rate for blue-stained cells in the evaluation group was 0.9314. For Mask-RCNN, 37 cell nuclei were identified, and 1 cell nucleus was identified as a non-nucleus after 40 epochs (error rate:0.027).Conclusions: Mask-RCNN is more accurate than SWM in identifying the cell nuclei. If the blue-stained cell nuclei can be correctly identified automatically, the entire cell morphology can be grasped faster, and the diagnostic performance of cytology can be improved

    Is object localization for free? – Weakly-supervised learning with convolutional neural networks

    Get PDF
    International audienceSuccessful methods for visual object recognition typically rely on training datasets containing lots of richly annotatedimages. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and often subjective.We describe a weakly supervised convolutional neural network (CNN) for object classification that relies onlyon image-level labels, yet can learn from cluttered scenes containing multiple objects. We quantify its object classification and object location prediction performance on the Pascal VOC 2012 (20 object classes) and the much larger Microsoft COCO (80 object classes) datasets. We find that the network (i) outputs accurate image-level labels, (ii) predicts approximate locations (but not extents) of objects, and (iii) performs comparably to its fully-supervised counterparts using object bounding box annotation for training

    A multi-stage GAN for multi-organ chest X-ray image generation and segmentation

    Full text link
    Multi-organ segmentation of X-ray images is of fundamental importance for computer aided diagnosis systems. However, the most advanced semantic segmentation methods rely on deep learning and require a huge amount of labeled images, which are rarely available due to both the high cost of human resources and the time required for labeling. In this paper, we present a novel multi-stage generation algorithm based on Generative Adversarial Networks (GANs) that can produce synthetic images along with their semantic labels and can be used for data augmentation. The main feature of the method is that, unlike other approaches, generation occurs in several stages, which simplifies the procedure and allows it to be used on very small datasets. The method has been evaluated on the segmentation of chest radiographic images, showing promising results. The multistage approach achieves state-of-the-art and, when very few images are used to train the GANs, outperforms the corresponding single-stage approach
    corecore