132 research outputs found
Solid Spherical Energy (SSE) CNNs for Efficient 3D Medical Image Analysis
Invariance to local rotation, to differentiate from the global rotation of images and objects, is required in various texture analysis problems. It has led to several breakthrough methods such as local binary patterns, maximum response and steerable filterbanks. In particular, textures in medical images often exhibit local structures at arbitrary orientations. Locally Rotation Invariant (LRI) Convolutional Neural Networks (CNN) were recently proposed using 3D steerable filters to combine LRI with Directional Sensitivity (DS). The steerability avoids the expensive cost of convolutions with rotated kernels and comes with a parametric representation that results in a drastic reduction of the number of trainable parameters. Yet, the potential bottleneck (memory and computation) of this approach lies in the necessity to recombine responses for a set of predefined discretized orientations. In this paper, we propose to calculate invariants from the responses to the set of spherical harmonics projected onto 3D kernels in the form of a lightweight Solid Spherical Energy (SSE) CNN. It offers a compromise between the high kernel specificity of the LRI-CNN and a low memory/operations requirement. The computational gain is evaluated on 3D synthetic and pulmonary nodule classification experiments. The performance of the proposed approach is compared with steerable LRI-CNNs and standard 3D CNNs, showing competitive results with the state of the art
CubeNet: Equivariance to 3D Rotation and Translation
3D Convolutional Neural Networks are sensitive to transformations applied to
their input. This is a problem because a voxelized version of a 3D object, and
its rotated clone, will look unrelated to each other after passing through to
the last layer of a network. Instead, an idealized model would preserve a
meaningful representation of the voxelized object, while explaining the
pose-difference between the two inputs. An equivariant representation vector
has two components: the invariant identity part, and a discernable encoding of
the transformation. Models that can't explain pose-differences risk "diluting"
the representation, in pursuit of optimizing a classification or regression
loss function.
We introduce a Group Convolutional Neural Network with linear equivariance to
translations and right angle rotations in three dimensions. We call this
network CubeNet, reflecting its cube-like symmetry. By construction, this
network helps preserve a 3D shape's global and local signature, as it is
transformed through successive layers. We apply this network to a variety of 3D
inference problems, achieving state-of-the-art on the ModelNet10 classification
challenge, and comparable performance on the ISBI 2012 Connectome Segmentation
Benchmark. To the best of our knowledge, this is the first 3D rotation
equivariant CNN for voxel representations.Comment: Preprin
Regular SE(3) Group Convolutions for Volumetric Medical Image Analysis
Regular group convolutional neural networks (G-CNNs) have been shown to
increase model performance and improve equivariance to different geometrical
symmetries. This work addresses the problem of SE(3), i.e., roto-translation
equivariance, on volumetric data. Volumetric image data is prevalent in many
medical settings. Motivated by the recent work on separable group convolutions,
we devise a SE(3) group convolution kernel separated into a continuous SO(3)
(rotation) kernel and a spatial kernel. We approximate equivariance to the
continuous setting by sampling uniform SO(3) grids. Our continuous SO(3) kernel
is parameterized via RBF interpolation on similarly uniform grids. We
demonstrate the advantages of our approach in volumetric medical image
analysis. Our SE(3) equivariant models consistently outperform CNNs and regular
discrete G-CNNs on challenging medical classification tasks and show
significantly improved generalization capabilities. Our approach achieves up to
a 16.5% gain in accuracy over regular CNNs.Comment: 10 pages, 1 figure, 2 tables, accepted at MICCAI 2023. Updated
version to camera ready version
I2I: Image to Icosahedral Projection for Object Reasoning from Single-View Images
Reasoning about 3D objects based on 2D images is challenging due to large
variations in appearance caused by viewing the object from different
orientations. Ideally, our model would be invariant or equivariant to changes
in object pose. Unfortunately, this is typically not possible with 2D image
input because we do not have an a priori model of how the image would change
under out-of-plane object rotations. The only -equivariant
models that currently exist require point cloud input rather than 2D images. In
this paper, we propose a novel model architecture based on icosahedral group
convolution that reasons in by projecting the input image onto
an icosahedron. As a result of this projection, the model is approximately
equivariant to rotation in . We apply this model to an object
pose estimation task and find that it outperforms reasonable baselines
- …