696 research outputs found
MorphPool: Efficient Non-linear Pooling & Unpooling in CNNs
Pooling is essentially an operation from the field of Mathematical
Morphology, with max pooling as a limited special case. The more general
setting of MorphPooling greatly extends the tool set for building neural
networks. In addition to pooling operations, encoder-decoder networks used for
pixel-level predictions also require unpooling. It is common to combine
unpooling with convolution or deconvolution for up-sampling. However, using its
morphological properties, unpooling can be generalised and improved. Extensive
experimentation on two tasks and three large-scale datasets shows that
morphological pooling and unpooling lead to improved predictive performance at
much reduced parameter counts.Comment: Accepted paper at the British Machine Vision Conference (BMVC) 202
Deep Retinal Optical Flow: From Synthetic Dataset Generation to Framework Creation and Evaluation
Sustained delivery of regenerative retinal therapies by robotic systems requires intra-operative tracking of the retinal fundus. This thesis presents a supervised convolutional neural network to densely predict optical flow of the retinal fundus, using semantic segmentation as an auxiliary task. Retinal flow information missing due to occlusion by surgical tools or other effects is implicitly inpainted, allowing for the robust tracking of surgical targets.
As manual annotation of optical flow is infeasible, a flexible algorithm for the generation of large synthetic training datasets on the basis of given intra-operative retinal images and tool templates is developed. The compositing of synthetic images is approached as a layer-wise operation implementing a number of transforms at every level which can be extended as required, mimicking the various phenomena visible in real data. Optical flow ground truth is calculated from motion transforms with the help of oflib, an open-source optical flow library available from the Python Package Index. It enables the user to manipulate, evaluate, and combine flow fields. The PyTorch version of oflib is fully differentiable and therefore suitable for use in deep learning methods requiring back-propagation.
The optical flow estimation from the network trained on synthetic data is evaluated using three performance metrics obtained from tracking a grid and sparsely annotated ground truth points. The evaluation benchmark consists of a series of challenging real intra-operative clips obtained from an extensive internally acquired dataset encompassing representative surgical cases. The deep learning approach clearly outperforms variational baseline methods and is shown to generalise well to real data showing scenarios routinely observed during vitreoretinal procedures. This indicates complex synthetic training datasets can be used to specifically guide optical flow estimation, laying the foundation for a robust system which can assist with intra-operative tracking of moving surgical targets even when occluded
Gated Multi-Resolution Transfer Network for Burst Restoration and Enhancement
Burst image processing is becoming increasingly popular in recent years.
However, it is a challenging task since individual burst images undergo
multiple degradations and often have mutual misalignments resulting in ghosting
and zipper artifacts. Existing burst restoration methods usually do not
consider the mutual correlation and non-local contextual information among
burst frames, which tends to limit these approaches in challenging cases.
Another key challenge lies in the robust up-sampling of burst frames. The
existing up-sampling methods cannot effectively utilize the advantages of
single-stage and progressive up-sampling strategies with conventional and/or
recent up-samplers at the same time. To address these challenges, we propose a
novel Gated Multi-Resolution Transfer Network (GMTNet) to reconstruct a
spatially precise high-quality image from a burst of low-quality raw images.
GMTNet consists of three modules optimized for burst processing tasks:
Multi-scale Burst Feature Alignment (MBFA) for feature denoising and alignment,
Transposed-Attention Feature Merging (TAFM) for multi-frame feature
aggregation, and Resolution Transfer Feature Up-sampler (RTFU) to up-scale
merged features and construct a high-quality output image. Detailed
experimental analysis on five datasets validates our approach and sets a
state-of-the-art for burst super-resolution, burst denoising, and low-light
burst enhancement.Comment: Accepted at CVPR 202
NiftyNet: a deep-learning platform for medical imaging
Medical image analysis and computer-assisted intervention problems are
increasingly being addressed with deep-learning-based solutions. Established
deep-learning platforms are flexible but do not provide specific functionality
for medical image analysis and adapting them for this application requires
substantial implementation effort. Thus, there has been substantial duplication
of effort and incompatible infrastructure developed across many research
groups. This work presents the open-source NiftyNet platform for deep learning
in medical imaging. The ambition of NiftyNet is to accelerate and simplify the
development of these solutions, and to provide a common mechanism for
disseminating research outputs for the community to use, adapt and build upon.
NiftyNet provides a modular deep-learning pipeline for a range of medical
imaging applications including segmentation, regression, image generation and
representation learning applications. Components of the NiftyNet pipeline
including data loading, data augmentation, network architectures, loss
functions and evaluation metrics are tailored to, and take advantage of, the
idiosyncracies of medical image analysis and computer-assisted intervention.
NiftyNet is built on TensorFlow and supports TensorBoard visualization of 2D
and 3D images and computational graphs by default.
We present 3 illustrative medical image analysis applications built using
NiftyNet: (1) segmentation of multiple abdominal organs from computed
tomography; (2) image regression to predict computed tomography attenuation
maps from brain magnetic resonance images; and (3) generation of simulated
ultrasound images for specified anatomical poses.
NiftyNet enables researchers to rapidly develop and distribute deep learning
solutions for segmentation, regression, image generation and representation
learning applications, or extend the platform to new applications.Comment: Wenqi Li and Eli Gibson contributed equally to this work. M. Jorge
Cardoso and Tom Vercauteren contributed equally to this work. 26 pages, 6
figures; Update includes additional applications, updated author list and
formatting for journal submissio
- âŠ