65,476 research outputs found
3D Convolutional Neural Networks for Tumor Segmentation using Long-range 2D Context
We present an efficient deep learning approach for the challenging task of
tumor segmentation in multisequence MR images. In recent years, Convolutional
Neural Networks (CNN) have achieved state-of-the-art performances in a large
variety of recognition tasks in medical imaging. Because of the considerable
computational cost of CNNs, large volumes such as MRI are typically processed
by subvolumes, for instance slices (axial, coronal, sagittal) or small 3D
patches. In this paper we introduce a CNN-based model which efficiently
combines the advantages of the short-range 3D context and the long-range 2D
context. To overcome the limitations of specific choices of neural network
architectures, we also propose to merge outputs of several cascaded 2D-3D
models by a voxelwise voting strategy. Furthermore, we propose a network
architecture in which the different MR sequences are processed by separate
subnetworks in order to be more robust to the problem of missing MR sequences.
Finally, a simple and efficient algorithm for training large CNN models is
introduced. We evaluate our method on the public benchmark of the BRATS 2017
challenge on the task of multiclass segmentation of malignant brain tumors. Our
method achieves good performances and produces accurate segmentations with
median Dice scores of 0.918 (whole tumor), 0.883 (tumor core) and 0.854
(enhancing core). Our approach can be naturally applied to various tasks
involving segmentation of lesions or organs.Comment: Submitted to the journal Computerized Medical Imaging and Graphic
Learning Audio Sequence Representations for Acoustic Event Classification
Acoustic Event Classification (AEC) has become a significant task for
machines to perceive the surrounding auditory scene. However, extracting
effective representations that capture the underlying characteristics of the
acoustic events is still challenging. Previous methods mainly focused on
designing the audio features in a 'hand-crafted' manner. Interestingly,
data-learnt features have been recently reported to show better performance. Up
to now, these were only considered on the frame-level. In this paper, we
propose an unsupervised learning framework to learn a vector representation of
an audio sequence for AEC. This framework consists of a Recurrent Neural
Network (RNN) encoder and a RNN decoder, which respectively transforms the
variable-length audio sequence into a fixed-length vector and reconstructs the
input sequence on the generated vector. After training the encoder-decoder, we
feed the audio sequences to the encoder and then take the learnt vectors as the
audio sequence representations. Compared with previous methods, the proposed
method can not only deal with the problem of arbitrary-lengths of audio
streams, but also learn the salient information of the sequence. Extensive
evaluation on a large-size acoustic event database is performed, and the
empirical results demonstrate that the learnt audio sequence representation
yields a significant performance improvement by a large margin compared with
other state-of-the-art hand-crafted sequence features for AEC
Learning Representations from EEG with Deep Recurrent-Convolutional Neural Networks
One of the challenges in modeling cognitive events from electroencephalogram
(EEG) data is finding representations that are invariant to inter- and
intra-subject differences, as well as to inherent noise associated with such
data. Herein, we propose a novel approach for learning such representations
from multi-channel EEG time-series, and demonstrate its advantages in the
context of mental load classification task. First, we transform EEG activities
into a sequence of topology-preserving multi-spectral images, as opposed to
standard EEG analysis techniques that ignore such spatial information. Next, we
train a deep recurrent-convolutional network inspired by state-of-the-art video
classification to learn robust representations from the sequence of images. The
proposed approach is designed to preserve the spatial, spectral, and temporal
structure of EEG which leads to finding features that are less sensitive to
variations and distortions within each dimension. Empirical evaluation on the
cognitive load classification task demonstrated significant improvements in
classification accuracy over current state-of-the-art approaches in this field.Comment: To be published as a conference paper at ICLR 201
Going Deeper into Action Recognition: A Survey
Understanding human actions in visual data is tied to advances in
complementary research areas including object recognition, human dynamics,
domain adaptation and semantic segmentation. Over the last decade, human action
analysis evolved from earlier schemes that are often limited to controlled
environments to nowadays advanced solutions that can learn from millions of
videos and apply to almost all daily activities. Given the broad range of
applications from video surveillance to human-computer interaction, scientific
milestones in action recognition are achieved more rapidly, eventually leading
to the demise of what used to be good in a short time. This motivated us to
provide a comprehensive review of the notable steps taken towards recognizing
human actions. To this end, we start our discussion with the pioneering methods
that use handcrafted representations, and then, navigate into the realm of deep
learning based approaches. We aim to remain objective throughout this survey,
touching upon encouraging improvements as well as inevitable fallbacks, in the
hope of raising fresh questions and motivating new research directions for the
reader
- …