4,565 research outputs found
Learning to Segment Breast Biopsy Whole Slide Images
We trained and applied an encoder-decoder model to semantically segment
breast biopsy images into biologically meaningful tissue labels. Since
conventional encoder-decoder networks cannot be applied directly on large
biopsy images and the different sized structures in biopsies present novel
challenges, we propose four modifications: (1) an input-aware encoding block to
compensate for information loss, (2) a new dense connection pattern between
encoder and decoder, (3) dense and sparse decoders to combine multi-level
features, (4) a multi-resolution network that fuses the results of
encoder-decoders run on different resolutions. Our model outperforms a
feature-based approach and conventional encoder-decoders from the literature.
We use semantic segmentations produced with our model in an automated diagnosis
task and obtain higher accuracies than a baseline approach that employs an SVM
for feature-based segmentation, both using the same segmentation-based
diagnostic features.Comment: Added more WSI images in appendi
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
Densely Supervised Grasp Detector (DSGD)
This paper presents Densely Supervised Grasp Detector (DSGD), a deep learning
framework which combines CNN structures with layer-wise feature fusion and
produces grasps and their confidence scores at different levels of the image
hierarchy (i.e., global-, region-, and pixel-levels). % Specifically, at the
global-level, DSGD uses the entire image information to predict a grasp. At the
region-level, DSGD uses a region proposal network to identify salient regions
in the image and predicts a grasp for each salient region. At the pixel-level,
DSGD uses a fully convolutional network and predicts a grasp and its confidence
at every pixel. % During inference, DSGD selects the most confident grasp as
the output. This selection from hierarchically generated grasp candidates
overcomes limitations of the individual models. % DSGD outperforms
state-of-the-art methods on the Cornell grasp dataset in terms of grasp
accuracy. % Evaluation on a multi-object dataset and real-world robotic
grasping experiments show that DSGD produces highly stable grasps on a set of
unseen objects in new environments. It achieves 97% grasp detection accuracy
and 90% robotic grasping success rate with real-time inference speed
GFF: Gated Fully Fusion for Semantic Segmentation
Semantic segmentation generates comprehensive understanding of scenes through
densely predicting the category for each pixel. High-level features from Deep
Convolutional Neural Networks already demonstrate their effectiveness in
semantic segmentation tasks, however the coarse resolution of high-level
features often leads to inferior results for small/thin objects where detailed
information is important. It is natural to consider importing low level
features to compensate for the lost detailed information in high-level
features.Unfortunately, simply combining multi-level features suffers from the
semantic gap among them. In this paper, we propose a new architecture, named
Gated Fully Fusion (GFF), to selectively fuse features from multiple levels
using gates in a fully connected way. Specifically, features at each level are
enhanced by higher-level features with stronger semantics and lower-level
features with more details, and gates are used to control the propagation of
useful information which significantly reduces the noises during fusion. We
achieve the state of the art results on four challenging scene parsing datasets
including Cityscapes, Pascal Context, COCO-stuff and ADE20K.Comment: accepted by AAAI-2020(oral
- …