26,807 research outputs found
A novel Region of Interest Extraction Layer for Instance Segmentation
Given the wide diffusion of deep neural network architectures for computer
vision tasks, several new applications are nowadays more and more feasible.
Among them, a particular attention has been recently given to instance
segmentation, by exploiting the results achievable by two-stage networks (such
as Mask R-CNN or Faster R-CNN), derived from R-CNN. In these complex
architectures, a crucial role is played by the Region of Interest (RoI)
extraction layer, devoted to extracting a coherent subset of features from a
single Feature Pyramid Network (FPN) layer attached on top of a backbone.
This paper is motivated by the need to overcome the limitations of existing
RoI extractors which select only one (the best) layer from FPN. Our intuition
is that all the layers of FPN retain useful information. Therefore, the
proposed layer (called Generic RoI Extractor - GRoIE) introduces non-local
building blocks and attention mechanisms to boost the performance.
A comprehensive ablation study at component level is conducted to find the
best set of algorithms and parameters for the GRoIE layer. Moreover, GRoIE can
be integrated seamlessly with every two-stage architecture for both object
detection and instance segmentation tasks. Therefore, the improvements brought
about by the use of GRoIE in different state-of-the-art architectures are also
evaluated. The proposed layer leads up to gain a 1.1% AP improvement on
bounding box detection and 1.7% AP improvement on instance segmentation.
The code is publicly available on GitHub repository at
https://github.com/IMPLabUniPr/mmdetection/tree/groie_de
UOLO - automatic object detection and segmentation in biomedical images
We propose UOLO, a novel framework for the simultaneous detection and
segmentation of structures of interest in medical images. UOLO consists of an
object segmentation module which intermediate abstract representations are
processed and used as input for object detection. The resulting system is
optimized simultaneously for detecting a class of objects and segmenting an
optionally different class of structures. UOLO is trained on a set of bounding
boxes enclosing the objects to detect, as well as pixel-wise segmentation
information, when available. A new loss function is devised, taking into
account whether a reference segmentation is accessible for each training image,
in order to suitably backpropagate the error. We validate UOLO on the task of
simultaneous optic disc (OD) detection, fovea detection, and OD segmentation
from retinal images, achieving state-of-the-art performance on public datasets.Comment: Publised on DLMIA 2018. Licensed under the Creative Commons
CC-BY-NC-ND 4.0 license: http://creativecommons.org/licenses/by-nc-nd/4.0
Baseline Detection in Historical Documents using Convolutional U-Nets
Baseline detection is still a challenging task for heterogeneous collections
of historical documents. We present a novel approach to baseline extraction in
such settings, turning out the winning entry to the ICDAR 2017 Competition on
Baseline detection (cBAD). It utilizes deep convolutional nets (CNNs) for both,
the actual extraction of baselines, as well as for a simple form of layout
analysis in a pre-processing step. To the best of our knowledge it is the first
CNN-based system for baseline extraction applying a U-net architecture and
sliding window detection, profiting from a high local accuracy of the candidate
lines extracted. Final baseline post-processing complements our approach,
compensating for inaccuracies mainly due to missing context information during
sliding window detection. We experimentally evaluate the components of our
system individually on the cBAD dataset. Moreover, we investigate how it
generalizes to different data by means of the dataset used for the baseline
extraction task of the ICDAR 2017 Competition on Layout Analysis for
Challenging Medieval Manuscripts (HisDoc). A comparison with the results
reported for HisDoc shows that it also outperforms the contestants of the
latter.Comment: 6 pages, accepted to DAS 201
Effective Use of Dilated Convolutions for Segmenting Small Object Instances in Remote Sensing Imagery
Thanks to recent advances in CNNs, solid improvements have been made in
semantic segmentation of high resolution remote sensing imagery. However, most
of the previous works have not fully taken into account the specific
difficulties that exist in remote sensing tasks. One of such difficulties is
that objects are small and crowded in remote sensing imagery. To tackle with
this challenging task we have proposed a novel architecture called local
feature extraction (LFE) module attached on top of dilated front-end module.
The LFE module is based on our findings that aggressively increasing dilation
factors fails to aggregate local features due to sparsity of the kernel, and
detrimental to small objects. The proposed LFE module solves this problem by
aggregating local features with decreasing dilation factor. We tested our
network on three remote sensing datasets and acquired remarkably good results
for all datasets especially for small objects
A Survey on Deep Learning-based Architectures for Semantic Segmentation on 2D images
Semantic segmentation is the pixel-wise labelling of an image. Since the
problem is defined at the pixel level, determining image class labels only is
not acceptable, but localising them at the original image pixel resolution is
necessary. Boosted by the extraordinary ability of convolutional neural
networks (CNN) in creating semantic, high level and hierarchical image
features; excessive numbers of deep learning-based 2D semantic segmentation
approaches have been proposed within the last decade. In this survey, we mainly
focus on the recent scientific developments in semantic segmentation,
specifically on deep learning-based methods using 2D images. We started with an
analysis of the public image sets and leaderboards for 2D semantic
segmantation, with an overview of the techniques employed in performance
evaluation. In examining the evolution of the field, we chronologically
categorised the approaches into three main periods, namely pre-and early deep
learning era, the fully convolutional era, and the post-FCN era. We technically
analysed the solutions put forward in terms of solving the fundamental problems
of the field, such as fine-grained localisation and scale invariance. Before
drawing our conclusions, we present a table of methods from all mentioned eras,
with a brief summary of each approach that explains their contribution to the
field. We conclude the survey by discussing the current challenges of the field
and to what extent they have been solved.Comment: Updated with new studie
A Robust Interpretable Deep Learning Classifier for Heart Anomaly Detection Without Segmentation
Traditionally, abnormal heart sound classification is framed as a three-stage
process. The first stage involves segmenting the phonocardiogram to detect
fundamental heart sounds; after which features are extracted and classification
is performed. Some researchers in the field argue the segmentation step is an
unwanted computational burden, whereas others embrace it as a prior step to
feature extraction. When comparing accuracies achieved by studies that have
segmented heart sounds before analysis with those who have overlooked that
step, the question of whether to segment heart sounds before feature extraction
is still open. In this study, we explicitly examine the importance of heart
sound segmentation as a prior step for heart sound classification, and then
seek to apply the obtained insights to propose a robust classifier for abnormal
heart sound detection. Furthermore, recognizing the pressing need for
explainable Artificial Intelligence (AI) models in the medical domain, we also
unveil hidden representations learned by the classifier using model
interpretation techniques. Experimental results demonstrate that the
segmentation plays an essential role in abnormal heart sound classification.
Our new classifier is also shown to be robust, stable and most importantly,
explainable, with an accuracy of almost 100% on the widely used PhysioNet
dataset
- …