129 research outputs found
Roto-Translation Covariant Convolutional Networks for Medical Image Analysis
We propose a framework for rotation and translation covariant deep learning
using group convolutions. The group product of the special Euclidean
motion group describes how a concatenation of two roto-translations
results in a net roto-translation. We encode this geometric structure into
convolutional neural networks (CNNs) via group convolutional layers,
which fit into the standard 2D CNN framework, and which allow to generically
deal with rotated input samples without the need for data augmentation.
We introduce three layers: a lifting layer which lifts a 2D (vector valued)
image to an -image, i.e., 3D (vector valued) data whose domain is
; a group convolution layer from and to an -image; and a
projection layer from an -image to a 2D image. The lifting and group
convolution layers are covariant (the output roto-translates with the
input). The final projection layer, a maximum intensity projection over
rotations, makes the full CNN rotation invariant.
We show with three different problems in histopathology, retinal imaging, and
electron microscopy that with the proposed group CNNs, state-of-the-art
performance can be achieved, without the need for data augmentation by rotation
and with increased performance compared to standard CNNs that do rely on
augmentation.Comment: 8 pages, 2 figures, 1 table, accepted at MICCAI 201
Learning to Convolve: A Generalized Weight-Tying Approach
Recent work (Cohen & Welling, 2016) has shown that generalizations of
convolutions, based on group theory, provide powerful inductive biases for
learning. In these generalizations, filters are not only translated but can
also be rotated, flipped, etc. However, coming up with exact models of how to
rotate a 3 x 3 filter on a square pixel-grid is difficult. In this paper, we
learn how to transform filters for use in the group convolution, focussing on
roto-translation. For this, we learn a filter basis and all rotated versions of
that filter basis. Filters are then encoded by a set of rotation invariant
coefficients. To rotate a filter, we switch the basis. We demonstrate we can
produce feature maps with low sensitivity to input rotations, while achieving
high performance on MNIST and CIFAR-10.Comment: Accepted to ICML 201
A General Theory of Equivariant CNNs on Homogeneous Spaces
We present a general theory of Group equivariant Convolutional Neural
Networks (G-CNNs) on homogeneous spaces such as Euclidean space and the sphere.
Feature maps in these networks represent fields on a homogeneous base space,
and layers are equivariant maps between spaces of fields. The theory enables a
systematic classification of all existing G-CNNs in terms of their symmetry
group, base space, and field type. We also consider a fundamental question:
what is the most general kind of equivariant linear map between feature spaces
(fields) of given types? Following Mackey, we show that such maps correspond
one-to-one with convolutions using equivariant kernels, and characterize the
space of such kernels
Affine Self Convolution
Attention mechanisms, and most prominently self-attention, are a powerful
building block for processing not only text but also images. These provide a
parameter efficient method for aggregating inputs. We focus on self-attention
in vision models, and we combine it with convolution, which as far as we know,
are the first to do. What emerges is a convolution with data dependent filters.
We call this an Affine Self Convolution. While this is applied differently at
each spatial location, we show that it is translation equivariant. We also
modify the Squeeze and Excitation variant of attention, extending both variants
of attention to the roto-translation group. We evaluate these new models on
CIFAR10 and CIFAR100 and show an improvement in the number of parameters, while
reaching comparable or higher accuracy at test time against self-trained
baselines
Roto-Translation Equivariant Convolutional Networks: Application to Histopathology Image Analysis
Rotation-invariance is a desired property of machine-learning models for
medical image analysis and in particular for computational pathology
applications. We propose a framework to encode the geometric structure of the
special Euclidean motion group SE(2) in convolutional networks to yield
translation and rotation equivariance via the introduction of SE(2)-group
convolution layers. This structure enables models to learn feature
representations with a discretized orientation dimension that guarantees that
their outputs are invariant under a discrete set of rotations. Conventional
approaches for rotation invariance rely mostly on data augmentation, but this
does not guarantee the robustness of the output when the input is rotated. At
that, trained conventional CNNs may require test-time rotation augmentation to
reach their full capability. This study is focused on histopathology image
analysis applications for which it is desirable that the arbitrary global
orientation information of the imaged tissues is not captured by the machine
learning models. The proposed framework is evaluated on three different
histopathology image analysis tasks (mitosis detection, nuclei segmentation and
tumor classification). We present a comparative analysis for each problem and
show that consistent increase of performances can be achieved when using the
proposed framework
H-NeXt: The next step towards roto-translation invariant networks
The widespread popularity of equivariant networks underscores the
significance of parameter efficient models and effective use of training data.
At a time when robustness to unseen deformations is becoming increasingly
important, we present H-NeXt, which bridges the gap between equivariance and
invariance. H-NeXt is a parameter-efficient roto-translation invariant network
that is trained without a single augmented image in the training set. Our
network comprises three components: an equivariant backbone for learning
roto-translation independent features, an invariant pooling layer for
discarding roto-translation information, and a classification layer. H-NeXt
outperforms the state of the art in classification on unaugmented training sets
and augmented test sets of MNIST and CIFAR-10.Comment: Appears in British Machine Vision Conference 2023 (BMVC 2023
- …