Multiview Self-Supervised Learning (MSSL) is based on learning invariances
with respect to a set of input transformations. However, invariance partially
or totally removes transformation-related information from the representations,
which might harm performance for specific downstream tasks that require such
information. We propose 2D strUctured and EquivarianT representations (coined
DUET), which are 2d representations organized in a matrix structure, and
equivariant with respect to transformations acting on the input data. DUET
representations maintain information about an input transformation, while
remaining semantically expressive. Compared to SimCLR (Chen et al., 2020)
(unstructured and invariant) and ESSL (Dangovski et al., 2022) (unstructured
and equivariant), the structured and equivariant nature of DUET representations
enables controlled generation with lower reconstruction error, while
controllability is not possible with SimCLR or ESSL. DUET also achieves higher
accuracy for several discriminative tasks, and improves transfer learning.Comment: Accepted at ICML 202