8,939 research outputs found
HyperDense-Net: A hyper-densely connected CNN for multi-modal image segmentation
Recently, dense connections have attracted substantial attention in computer
vision because they facilitate gradient flow and implicit deep supervision
during training. Particularly, DenseNet, which connects each layer to every
other layer in a feed-forward fashion, has shown impressive performances in
natural image classification tasks. We propose HyperDenseNet, a 3D fully
convolutional neural network that extends the definition of dense connectivity
to multi-modal segmentation problems. Each imaging modality has a path, and
dense connections occur not only between the pairs of layers within the same
path, but also between those across different paths. This contrasts with the
existing multi-modal CNN approaches, in which modeling several modalities
relies entirely on a single joint layer (or level of abstraction) for fusion,
typically either at the input or at the output of the network. Therefore, the
proposed network has total freedom to learn more complex combinations between
the modalities, within and in-between all the levels of abstraction, which
increases significantly the learning representation. We report extensive
evaluations over two different and highly competitive multi-modal brain tissue
segmentation challenges, iSEG 2017 and MRBrainS 2013, with the former focusing
on 6-month infant data and the latter on adult images. HyperDenseNet yielded
significant improvements over many state-of-the-art segmentation networks,
ranking at the top on both benchmarks. We further provide a comprehensive
experimental analysis of features re-use, which confirms the importance of
hyper-dense connections in multi-modal representation learning. Our code is
publicly available at https://www.github.com/josedolz/HyperDenseNet.Comment: Paper accepted at IEEE TMI in October 2018. Last version of this
paper updates the reference to the IEEE TMI paper which compares the
submissions to the iSEG 2017 MICCAI Challeng
Automatic Brain Tumor Segmentation using Cascaded Anisotropic Convolutional Neural Networks
A cascade of fully convolutional neural networks is proposed to segment
multi-modal Magnetic Resonance (MR) images with brain tumor into background and
three hierarchical regions: whole tumor, tumor core and enhancing tumor core.
The cascade is designed to decompose the multi-class segmentation problem into
a sequence of three binary segmentation problems according to the subregion
hierarchy. The whole tumor is segmented in the first step and the bounding box
of the result is used for the tumor core segmentation in the second step. The
enhancing tumor core is then segmented based on the bounding box of the tumor
core segmentation result. Our networks consist of multiple layers of
anisotropic and dilated convolution filters, and they are combined with
multi-view fusion to reduce false positives. Residual connections and
multi-scale predictions are employed in these networks to boost the
segmentation performance. Experiments with BraTS 2017 validation set show that
the proposed method achieved average Dice scores of 0.7859, 0.9050, 0.8378 for
enhancing tumor core, whole tumor and tumor core, respectively. The
corresponding values for BraTS 2017 testing set were 0.7831, 0.8739, and
0.7748, respectively.Comment: 12 pages, 5 figures. MICCAI Brats Challenge 201
Cross-Modal Message Passing for Two-stream Fusion
Processing and fusing information among multi-modal is a very useful
technique for achieving high performance in many computer vision problems. In
order to tackle multi-modal information more effectively, we introduce a novel
framework for multi-modal fusion: Cross-modal Message Passing (CMMP).
Specifically, we propose a cross-modal message passing mechanism to fuse
two-stream network for action recognition, which composes of an appearance
modal network (RGB image) and a motion modal (optical flow image) network. The
objectives of individual networks in this framework are two-fold: a standard
classification objective and a competing objective. The classification object
ensures that each modal network predicts the true action category while the
competing objective encourages each modal network to outperform the other one.
We quantitatively show that the proposed CMMP fuses the traditional two-stream
network more effectively, and outperforms all existing two-stream fusion method
on UCF-101 and HMDB-51 datasets.Comment: 2018 IEEE International Conference on Acoustics, Speech and Signal
Processin
ModDrop: adaptive multi-modal gesture recognition
We present a method for gesture detection and localisation based on
multi-scale and multi-modal deep learning. Each visual modality captures
spatial information at a particular spatial scale (such as motion of the upper
body or a hand), and the whole system operates at three temporal scales. Key to
our technique is a training strategy which exploits: i) careful initialization
of individual modalities; and ii) gradual fusion involving random dropping of
separate channels (dubbed ModDrop) for learning cross-modality correlations
while preserving uniqueness of each modality-specific representation. We
present experiments on the ChaLearn 2014 Looking at People Challenge gesture
recognition track, in which we placed first out of 17 teams. Fusing multiple
modalities at several spatial and temporal scales leads to a significant
increase in recognition rates, allowing the model to compensate for errors of
the individual classifiers as well as noise in the separate channels.
Futhermore, the proposed ModDrop training technique ensures robustness of the
classifier to missing signals in one or several channels to produce meaningful
predictions from any number of available modalities. In addition, we
demonstrate the applicability of the proposed fusion scheme to modalities of
arbitrary nature by experiments on the same dataset augmented with audio.Comment: 14 pages, 7 figure
- …