5,653 research outputs found
Feature Extraction and Classification Based on Spatial-Spectral ConvLSTM Neural Network for Hyperspectral Images
In recent years, deep learning has presented a great advance in hyperspectral
image (HSI) classification. Particularly, Long Short-Term Memory (LSTM), as a
special deep learning structure, has shown great ability in modeling long-term
dependencies in the time dimension of video or the spectral dimension of HSIs.
However, the loss of spatial information makes it quite difficult to obtain the
better performance. In order to address this problem, two novel deep models are
proposed to extract more discriminative spatial-spectral features by exploiting
the Convolutional LSTM (ConvLSTM) for the first time. By taking the data patch
in a local sliding window as the input of each memory cell band by band, the
2-D extended architecture of LSTM is considered for building the
spatial-spectral ConvLSTM 2-D Neural Network (SSCL2DNN) to model long-range
dependencies in the spectral domain. To take advantage of spatial and spectral
information more effectively for extracting a more discriminative
spatial-spectral feature representation, the spatial-spectral ConvLSTM 3-D
Neural Network (SSCL3DNN) is further proposed by extending LSTM to 3-D version.
The experiments, conducted on three commonly used HSI data sets, demonstrate
that the proposed deep models have certain competitive advantages and can
provide better classification performance than other state-of-the-art
approaches
3D Robot Pose Estimation from 2D Images
This paper considers the task of locating articulated poses of multiple
robots in images. Our approach simultaneously infers the number of robots in a
scene, identifies joint locations and estimates sparse depth maps around joint
locations. The proposed method applies staged convolutional feature detectors
to 2D image inputs and computes robot instance masks using a recurrent network
architecture. In addition, regression maps of most likely joint locations in
pixel coordinates together with depth information are computed. Compositing 3D
robot joint kinematics is accomplished by applying masks to joint readout maps.
Our end-to-end formulation is in contrast to previous work in which the
composition of robot joints into kinematics is performed in a separate
post-processing step. Despite the fact that our models are trained on
artificial data, we demonstrate generalizability to real world images
SFCN-OPI: Detection and Fine-grained Classification of Nuclei Using Sibling FCN with Objectness Prior Interaction
Cell nuclei detection and fine-grained classification have been fundamental
yet challenging problems in histopathology image analysis. Due to the nuclei
tiny size, significant inter-/intra-class variances, as well as the inferior
image quality, previous automated methods would easily suffer from limited
accuracy and robustness. In the meanwhile, existing approaches usually deal
with these two tasks independently, which would neglect the close relatedness
of them. In this paper, we present a novel method of sibling fully
convolutional network with prior objectness interaction (called SFCN-OPI) to
tackle the two tasks simultaneously and interactively using a unified
end-to-end framework. Specifically, the sibling FCN branches share features in
earlier layers while holding respective higher layers for specific tasks. More
importantly, the detection branch outputs the objectness prior which
dynamically interacts with the fine-grained classification sibling branch
during the training and testing processes. With this mechanism, the
fine-grained classification successfully focuses on regions with high
confidence of nuclei existence and outputs the conditional probability, which
in turn benefits the detection through back propagation. Extensive experiments
on colon cancer histology images have validated the effectiveness of our
proposed SFCN-OPI and our method has outperformed the state-of-the-art methods
by a large margin.Comment: Accepted at AAAI 201
A Survey on Deep Learning in Medical Image Analysis
Deep learning algorithms, in particular convolutional networks, have rapidly
become a methodology of choice for analyzing medical images. This paper reviews
the major deep learning concepts pertinent to medical image analysis and
summarizes over 300 contributions to the field, most of which appeared in the
last year. We survey the use of deep learning for image classification, object
detection, segmentation, registration, and other tasks and provide concise
overviews of studies per application area. Open challenges and directions for
future research are discussed.Comment: Revised survey includes expanded discussion section and reworked
introductory section on common deep architectures. Added missed papers from
before Feb 1st 201
CIA-Net: Robust Nuclei Instance Segmentation with Contour-aware Information Aggregation
Accurate segmenting nuclei instances is a crucial step in computer-aided
image analysis to extract rich features for cellular estimation and following
diagnosis as well as treatment. While it still remains challenging because the
wide existence of nuclei clusters, along with the large morphological variances
among different organs make nuclei instance segmentation susceptible to
over-/under-segmentation. Additionally, the inevitably subjective annotating
and mislabeling prevent the network learning from reliable samples and
eventually reduce the generalization capability for robustly segmenting unseen
organ nuclei. To address these issues, we propose a novel deep neural network,
namely Contour-aware Informative Aggregation Network (CIA-Net) with multi-level
information aggregation module between two task-specific decoders. Rather than
independent decoders, it leverages the merit of spatial and texture
dependencies between nuclei and contour by bi-directionally aggregating
task-specific features. Furthermore, we proposed a novel smooth truncated loss
that modulates losses to reduce the perturbation from outliers. Consequently,
the network can focus on learning from reliable and informative samples, which
inherently improves the generalization capability. Experiments on the 2018
MICCAI challenge of Multi-Organ-Nuclei-Segmentation validated the effectiveness
of our proposed method, surpassing all the other 35 competitive teams by a
significant margin.Comment: Accepted for the 26th Conference on Information Processing in Medical
Imaging (IPMI 2019
Deep Radiomics for Brain Tumor Detection and Classification from Multi-Sequence MRI
Glioma constitutes 80% of malignant primary brain tumors and is usually
classified as HGG and LGG. The LGG tumors are less aggressive, with slower
growth rate as compared to HGG, and are responsive to therapy. Tumor biopsy
being challenging for brain tumor patients, noninvasive imaging techniques like
Magnetic Resonance Imaging (MRI) have been extensively employed in diagnosing
brain tumors. Therefore automated systems for the detection and prediction of
the grade of tumors based on MRI data becomes necessary for assisting doctors
in the framework of augmented intelligence. In this paper, we thoroughly
investigate the power of Deep ConvNets for classification of brain tumors using
multi-sequence MR images. We propose novel ConvNet models, which are trained
from scratch, on MRI patches, slices, and multi-planar volumetric slices. The
suitability of transfer learning for the task is next studied by applying two
existing ConvNets models (VGGNet and ResNet) trained on ImageNet dataset,
through fine-tuning of the last few layers. LOPO testing, and testing on the
holdout dataset are used to evaluate the performance of the ConvNets. Results
demonstrate that the proposed ConvNets achieve better accuracy in all cases
where the model is trained on the multi-planar volumetric dataset. Unlike
conventional models, it obtains a testing accuracy of 95% for the low/high
grade glioma classification problem. A score of 97% is generated for
classification of LGG with/without 1p/19q codeletion, without any additional
effort towards extraction and selection of features. We study the properties of
self-learned kernels/ filters in different layers, through visualization of the
intermediate layer outputs. We also compare the results with that of
state-of-the-art methods, demonstrating a maximum improvement of 7% on the
grading performance of ConvNets and 9% on the prediction of 1p/19q codeletion
status
A novel channel pruning method for deep neural network compression
In recent years, deep neural networks have achieved great success in the
field of computer vision. However, it is still a big challenge to deploy these
deep models on resource-constrained embedded devices such as mobile robots,
smart phones and so on. Therefore, network compression for such platforms is a
reasonable solution to reduce memory consumption and computation complexity. In
this paper, a novel channel pruning method based on genetic algorithm is
proposed to compress very deep Convolution Neural Networks (CNNs). Firstly, a
pre-trained CNN model is pruned layer by layer according to the sensitivity of
each layer. After that, the pruned model is fine-tuned based on knowledge
distillation framework. These two improvements significantly decrease the model
redundancy with less accuracy drop. Channel selection is a combinatorial
optimization problem that has exponential solution space. In order to
accelerate the selection process, the proposed method formulates it as a search
problem, which can be solved efficiently by genetic algorithm. Meanwhile, a
two-step approximation fitness function is designed to further improve the
efficiency of genetic process. The proposed method has been verified on three
benchmark datasets with two popular CNN models: VGGNet and ResNet. On the
CIFAR-100 and ImageNet datasets, our approach outperforms several
state-of-the-art methods. On the CIFAR-10 and SVHN datasets, the pruned VGGNet
achieves better performance than the original model with 8 times parameters
compression and 3 times FLOPs reduction
Deep Facial Expression Recognition: A Survey
With the transition of facial expression recognition (FER) from
laboratory-controlled to challenging in-the-wild conditions and the recent
success of deep learning techniques in various fields, deep neural networks
have increasingly been leveraged to learn discriminative representations for
automatic FER. Recent deep FER systems generally focus on two important issues:
overfitting caused by a lack of sufficient training data and
expression-unrelated variations, such as illumination, head pose and identity
bias. In this paper, we provide a comprehensive survey on deep FER, including
datasets and algorithms that provide insights into these intrinsic problems.
First, we describe the standard pipeline of a deep FER system with the related
background knowledge and suggestions of applicable implementations for each
stage. We then introduce the available datasets that are widely used in the
literature and provide accepted data selection and evaluation principles for
these datasets. For the state of the art in deep FER, we review existing novel
deep neural networks and related training strategies that are designed for FER
based on both static images and dynamic image sequences, and discuss their
advantages and limitations. Competitive performances on widely used benchmarks
are also summarized in this section. We then extend our survey to additional
related issues and application scenarios. Finally, we review the remaining
challenges and corresponding opportunities in this field as well as future
directions for the design of robust deep FER systems
Activity Recognition based on a Magnitude-Orientation Stream Network
The temporal component of videos provides an important clue for activity
recognition, as a number of activities can be reliably recognized based on the
motion information. In view of that, this work proposes a novel temporal stream
for two-stream convolutional networks based on images computed from the optical
flow magnitude and orientation, named Magnitude-Orientation Stream (MOS), to
learn the motion in a better and richer manner. Our method applies simple
nonlinear transformations on the vertical and horizontal components of the
optical flow to generate input images for the temporal stream. Experimental
results, carried on two well-known datasets (HMDB51 and UCF101), demonstrate
that using our proposed temporal stream as input to existing neural network
architectures can improve their performance for activity recognition. Results
demonstrate that our temporal stream provides complementary information able to
improve the classical two-stream methods, indicating the suitability of our
approach to be used as a temporal video representation.Comment: 8 pages, SIBGRAPI 201
Multi-scale Volumes for Deep Object Detection and Localization
This study aims to analyze the benefits of improved multi-scale reasoning for
object detection and localization with deep convolutional neural networks. To
that end, an efficient and general object detection framework which operates on
scale volumes of a deep feature pyramid is proposed. In contrast to the
proposed approach, most current state-of-the-art object detectors operate on a
single-scale in training, while testing involves independent evaluation across
scales. One benefit of the proposed approach is in better capturing of
multi-scale contextual information, resulting in significant gains in both
detection performance and localization quality of objects on the PASCAL VOC
dataset and a multi-view highway vehicles dataset. The joint detection and
localization scale-specific models are shown to especially benefit detection of
challenging object categories which exhibit large scale variation as well as
detection of small objects.Comment: To appear in Pattern Recognition 201
- …