38 research outputs found
Untangling Local and Global Deformations in Deep Convolutional Networks for Image Classification and Sliding Window Detection
Deep Convolutional Neural Networks (DCNNs) commonly use generic `max-pooling'
(MP) layers to extract deformation-invariant features, but we argue in favor of
a more refined treatment. First, we introduce epitomic convolution as a
building block alternative to the common convolution-MP cascade of DCNNs; while
having identical complexity to MP, Epitomic Convolution allows for parameter
sharing across different filters, resulting in faster convergence and better
generalization. Second, we introduce a Multiple Instance Learning approach to
explicitly accommodate global translation and scaling when training a DCNN
exclusively with class labels. For this we rely on a `patchwork' data structure
that efficiently lays out all image scales and positions as candidates to a
DCNN. Factoring global and local deformations allows a DCNN to `focus its
resources' on the treatment of non-rigid deformations and yields a substantial
classification accuracy improvement. Third, further pursuing this idea, we
develop an efficient DCNN sliding window object detector that employs explicit
search over position, scale, and aspect ratio. We provide competitive image
classification and localization results on the ImageNet dataset and object
detection results on the Pascal VOC 2007 benchmark.Comment: 13 pages, 7 figures, 5 tables. arXiv admin note: substantial text
overlap with arXiv:1406.273
Pushing the Boundaries of Boundary Detection using Deep Learning
In this work we show that adapting Deep Convolutional Neural Network training
to the task of boundary detection can result in substantial improvements over
the current state-of-the-art in boundary detection.
Our contributions consist firstly in combining a careful design of the loss
for boundary detection training, a multi-resolution architecture and training
with external data to improve the detection accuracy of the current state of
the art. When measured on the standard Berkeley Segmentation Dataset, we
improve theoptimal dataset scale F-measure from 0.780 to 0.808 - while human
performance is at 0.803. We further improve performance to 0.813 by combining
deep learning with grouping, integrating the Normalized Cuts technique within a
deep network.
We also examine the potential of our boundary detector in conjunction with
the task of semantic segmentation and demonstrate clear improvements over
state-of-the-art systems. Our detector is fully integrated in the popular Caffe
framework and processes a 320x420 image in less than a second.Comment: The previous version reported large improvements w.r.t. the LPO
region proposal baseline, which turned out to be due to a wrong computation
for the baseline. The improvements are currently less important, and are
omitted. We are sorry if the reported results caused any confusion. We have
also integrated reviewer feedback regarding human performance on the BSD
benchmar
Deep Learning for Semantic Part Segmentation with High-Level Guidance
In this work we address the task of segmenting an object into its parts, or
semantic part segmentation. We start by adapting a state-of-the-art semantic
segmentation system to this task, and show that a combination of a
fully-convolutional Deep CNN system coupled with Dense CRF labelling provides
excellent results for a broad range of object categories. Still, this approach
remains agnostic to high-level constraints between object parts. We introduce
such prior information by means of the Restricted Boltzmann Machine, adapted to
our task and train our model in an discriminative fashion, as a hidden CRF,
demonstrating that prior information can yield additional improvements. We also
investigate the performance of our approach ``in the wild'', without
information concerning the objects' bounding boxes, using an object detector to
guide a multi-scale segmentation scheme. We evaluate the performance of our
approach on the Penn-Fudan and LFW datasets for the tasks of pedestrian parsing
and face labelling respectively. We show superior performance with respect to
competitive methods that have been extensively engineered on these benchmarks,
as well as realistic qualitative results on part segmentation, even for
occluded or deformable objects. We also provide quantitative and extensive
qualitative results on three classes from the PASCAL Parts dataset. Finally, we
show that our multi-scale segmentation scheme can boost accuracy, recovering
segmentations for finer parts.Comment: 11 pages (including references), 3 figures, 2 table
Application of the sliding window method and Mask-RCNN method to nuclear recognition in oral cytology
Background: We aimed to develop an artificial intelligence (AI)-assisted oral cytology method, similar to cervical cytology. We focused on the detection of cell nuclei because the ratio of cell nuclei to cytoplasm increases with increasing cell malignancy. As an initial step in the development of AI-assisted cytology, we investigated two methods for the automatic detection of cell nuclei in blue-stained cells in cytopreparation images.Methods: We evaluated the usefulness of the sliding window method (SWM) and mask region-based convolutional neural network (Mask-RCNN) in identifying the cell nuclei in oral cytopreparation images. Thirty cases of liquid-based oral cytology were analyzed. First, we performed the SWM by dividing each image into 96 Ă— 96 pixels. Overall, 591 images with or without blue-stained cell nuclei were prepared as the training data and 197 as the test data (total: 1,576 images). Next, we performed the Mask-RCNN by preparing 130 images of Class II and III lesions and creating mask images showing cell regions based on these images.Results: Using the SWM method, the highest detection rate for blue-stained cells in the evaluation group was 0.9314. For Mask-RCNN, 37 cell nuclei were identified, and 1 cell nucleus was identified as a non-nucleus after 40 epochs (error rate:0.027).Conclusions: Mask-RCNN is more accurate than SWM in identifying the cell nuclei. If the blue-stained cell nuclei can be correctly identified automatically, the entire cell morphology can be grasped faster, and the diagnostic performance of cytology can be improved
Is object localization for free? – Weakly-supervised learning with convolutional neural networks
International audienceSuccessful methods for visual object recognition typically rely on training datasets containing lots of richly annotatedimages. Detailed image annotation, e.g. by object bounding boxes, however, is both expensive and often subjective.We describe a weakly supervised convolutional neural network (CNN) for object classification that relies onlyon image-level labels, yet can learn from cluttered scenes containing multiple objects. We quantify its object classification and object location prediction performance on the Pascal VOC 2012 (20 object classes) and the much larger Microsoft COCO (80 object classes) datasets. We find that the network (i) outputs accurate image-level labels, (ii) predicts approximate locations (but not extents) of objects, and (iii) performs comparably to its fully-supervised counterparts using object bounding box annotation for training
A multi-stage GAN for multi-organ chest X-ray image generation and segmentation
Multi-organ segmentation of X-ray images is of fundamental importance for
computer aided diagnosis systems. However, the most advanced semantic
segmentation methods rely on deep learning and require a huge amount of labeled
images, which are rarely available due to both the high cost of human resources
and the time required for labeling. In this paper, we present a novel
multi-stage generation algorithm based on Generative Adversarial Networks
(GANs) that can produce synthetic images along with their semantic labels and
can be used for data augmentation. The main feature of the method is that,
unlike other approaches, generation occurs in several stages, which simplifies
the procedure and allows it to be used on very small datasets. The method has
been evaluated on the segmentation of chest radiographic images, showing
promising results. The multistage approach achieves state-of-the-art and, when
very few images are used to train the GANs, outperforms the corresponding
single-stage approach