448 research outputs found
Projection-Based 2.5D U-net Architecture for Fast Volumetric Segmentation
Convolutional neural networks are state-of-the-art for various segmentation
tasks. While for 2D images these networks are also computationally efficient,
3D convolutions have huge storage requirements and require long training time.
To overcome this issue, we introduce a network structure for volumetric data
without 3D convolutional layers. The main idea is to include maximum intensity
projections from different directions to transform the volumetric data to a
sequence of images, where each image contains information of the full data. We
then apply 2D convolutions to these projection images and lift them again to
volumetric data using a trainable reconstruction algorithm.The proposed network
architecture has less storage requirements than network structures using 3D
convolutions. For a tested binary segmentation task, it even shows better
performance than the 3D U-net and can be trained much faster.Comment: presented at the SAMPTA 2019 conferenc
CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal Segmentation in MRI
Prostate cancer is the second leading cause of cancer death among men in the
United States. The diagnosis of prostate MRI often relies on the accurate
prostate zonal segmentation. However, state-of-the-art automatic segmentation
methods often fail to produce well-contained volumetric segmentation of the
prostate zones since certain slices of prostate MRI, such as base and apex
slices, are harder to segment than other slices. This difficulty can be
overcome by accounting for the cross-slice relationship of adjacent slices, but
current methods do not fully learn and exploit such relationships. In this
paper, we propose a novel cross-slice attention mechanism, which we use in a
Transformer module to systematically learn the cross-slice relationship at
different scales. The module can be utilized in any existing learning-based
segmentation framework with skip connections. Experiments show that our
cross-slice attention is able to capture the cross-slice information in
prostate zonal segmentation and improve the performance of current
state-of-the-art methods. Our method significantly improves segmentation
accuracy in the peripheral zone, such that the segmentation results are
consistent across all the prostate slices (apex, mid-gland, and base)
Metric-Scale Truncation-Robust Heatmaps for 3D Human Pose Estimation
Heatmap representations have formed the basis of 2D human pose estimation
systems for many years, but their generalizations for 3D pose have only
recently been considered. This includes 2.5D volumetric heatmaps, whose X and Y
axes correspond to image space and the Z axis to metric depth around the
subject. To obtain metric-scale predictions, these methods must include a
separate, explicit post-processing step to resolve scale ambiguity. Further,
they cannot encode body joint positions outside of the image boundaries,
leading to incomplete pose estimates in case of image truncation. We address
these limitations by proposing metric-scale truncation-robust (MeTRo)
volumetric heatmaps, whose dimensions are defined in metric 3D space near the
subject, instead of being aligned with image space. We train a
fully-convolutional network to estimate such heatmaps from monocular RGB in an
end-to-end manner. This reinterpretation of the heatmap dimensions allows us to
estimate complete metric-scale poses without test-time knowledge of the focal
length or person distance and without relying on anthropometric heuristics in
post-processing. Furthermore, as the image space is decoupled from the heatmap
space, the network can learn to reason about joints beyond the image boundary.
Using ResNet-50 without any additional learned layers, we obtain
state-of-the-art results on the Human3.6M and MPI-INF-3DHP benchmarks. As our
method is simple and fast, it can become a useful component for real-time
top-down multi-person pose estimation systems. We make our code publicly
available to facilitate further research (see
https://vision.rwth-aachen.de/metro-pose3d).Comment: Accepted for publication at the 2020 IEEE Conference on Automatic
Face and Gesture Recognition (FG
MeTRAbs: Metric-Scale Truncation-Robust Heatmaps for Absolute 3D Human Pose Estimation
Heatmap representations have formed the basis of human pose estimation
systems for many years, and their extension to 3D has been a fruitful line of
recent research. This includes 2.5D volumetric heatmaps, whose X and Y axes
correspond to image space and Z to metric depth around the subject. To obtain
metric-scale predictions, 2.5D methods need a separate post-processing step to
resolve scale ambiguity. Further, they cannot localize body joints outside the
image boundaries, leading to incomplete estimates for truncated images. To
address these limitations, we propose metric-scale truncation-robust (MeTRo)
volumetric heatmaps, whose dimensions are all defined in metric 3D space,
instead of being aligned with image space. This reinterpretation of heatmap
dimensions allows us to directly estimate complete, metric-scale poses without
test-time knowledge of distance or relying on anthropometric heuristics, such
as bone lengths. To further demonstrate the utility our representation, we
present a differentiable combination of our 3D metric-scale heatmaps with 2D
image-space ones to estimate absolute 3D pose (our MeTRAbs architecture). We
find that supervision via absolute pose loss is crucial for accurate
non-root-relative localization. Using a ResNet-50 backbone without further
learned layers, we obtain state-of-the-art results on Human3.6M, MPI-INF-3DHP
and MuPoTS-3D. Our code will be made publicly available to facilitate further
research.Comment: See project page at https://vision.rwth-aachen.de/metrabs . Accepted
for publication in the IEEE Transactions on Biometrics, Behavior, and
Identity Science (TBIOM), Special Issue "Selected Best Works From Automated
Face and Gesture Recognition 2020". Extended version of FG paper
arXiv:2003.0295
- …