2,792 research outputs found
A Deep Learning Framework for Unsupervised Affine and Deformable Image Registration
Image registration, the process of aligning two or more images, is the core
technique of many (semi-)automatic medical image analysis tasks. Recent studies
have shown that deep learning methods, notably convolutional neural networks
(ConvNets), can be used for image registration. Thus far training of ConvNets
for registration was supervised using predefined example registrations.
However, obtaining example registrations is not trivial. To circumvent the need
for predefined examples, and thereby to increase convenience of training
ConvNets for image registration, we propose the Deep Learning Image
Registration (DLIR) framework for \textit{unsupervised} affine and deformable
image registration. In the DLIR framework ConvNets are trained for image
registration by exploiting image similarity analogous to conventional
intensity-based image registration. After a ConvNet has been trained with the
DLIR framework, it can be used to register pairs of unseen images in one shot.
We propose flexible ConvNets designs for affine image registration and for
deformable image registration. By stacking multiple of these ConvNets into a
larger architecture, we are able to perform coarse-to-fine image registration.
We show for registration of cardiac cine MRI and registration of chest CT that
performance of the DLIR framework is comparable to conventional image
registration while being several orders of magnitude faster.Comment: Accepted: Medical Image Analysis - Elsevie
Two-Stream Convolutional Networks for Action Recognition in Videos
We investigate architectures of discriminatively trained deep Convolutional
Networks (ConvNets) for action recognition in video. The challenge is to
capture the complementary information on appearance from still frames and
motion between frames. We also aim to generalise the best performing
hand-crafted features within a data-driven learning framework.
Our contribution is three-fold. First, we propose a two-stream ConvNet
architecture which incorporates spatial and temporal networks. Second, we
demonstrate that a ConvNet trained on multi-frame dense optical flow is able to
achieve very good performance in spite of limited training data. Finally, we
show that multi-task learning, applied to two different action classification
datasets, can be used to increase the amount of training data and improve the
performance on both.
Our architecture is trained and evaluated on the standard video actions
benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of
the art. It also exceeds by a large margin previous attempts to use deep nets
for video classification
Learning Human Pose Estimation Features with Convolutional Networks
This paper introduces a new architecture for human pose estimation using a
multi- layer convolutional network architecture and a modified learning
technique that learns low-level features and higher-level weak spatial models.
Unconstrained human pose estimation is one of the hardest problems in computer
vision, and our new architecture and learning schema shows significant
improvement over the current state-of-the-art results. The main contribution of
this paper is showing, for the first time, that a specific variation of deep
learning is able to outperform all existing traditional architectures on this
task. The paper also discusses several lessons learned while researching
alternatives, most notably, that it is possible to learn strong low-level
feature detectors on features that might even just cover a few pixels in the
image. Higher-level spatial models improve somewhat the overall result, but to
a much lesser extent then expected. Many researchers previously argued that the
kinematic structure and top-down information is crucial for this domain, but
with our purely bottom up, and weak spatial model, we could improve other more
complicated architectures that currently produce the best results. This mirrors
what many other researchers, like those in the speech recognition, object
recognition, and other domains have experienced
- …