4 research outputs found
What Affects Learned Equivariance in Deep Image Recognition Models?
Equivariance w.r.t. geometric transformations in neural networks improves
data efficiency, parameter efficiency and robustness to out-of-domain
perspective shifts. When equivariance is not designed into a neural network,
the network can still learn equivariant functions from the data. We quantify
this learned equivariance, by proposing an improved measure for equivariance.
We find evidence for a correlation between learned translation equivariance and
validation accuracy on ImageNet. We therefore investigate what can increase the
learned equivariance in neural networks, and find that data augmentation,
reduced model capacity and inductive bias in the form of convolutions induce
higher learned equivariance in neural networks.Comment: Accepted at CVPR workshop L3D-IVU 202
Color Equivariant Convolutional Networks
Color is a crucial visual cue readily exploited by Convolutional Neural
Networks (CNNs) for object recognition. However, CNNs struggle if there is data
imbalance between color variations introduced by accidental recording
conditions. Color invariance addresses this issue but does so at the cost of
removing all color information, which sacrifices discriminative power. In this
paper, we propose Color Equivariant Convolutions (CEConvs), a novel deep
learning building block that enables shape feature sharing across the color
spectrum while retaining important color information. We extend the notion of
equivariance from geometric to photometric transformations by incorporating
parameter sharing over hue-shifts in a neural network. We demonstrate the
benefits of CEConvs in terms of downstream performance to various tasks and
improved robustness to color changes, including train-test distribution shifts.
Our approach can be seamlessly integrated into existing architectures, such as
ResNets, and offers a promising solution for addressing color-based domain
shifts in CNNs.Comment: NeurIPS 2023. Code available on https://github.com/Attila94/cecon
Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models
In temporal action localization, given an input video, the goal is to predict
which actions it contains, where they begin, and where they end. Training and
testing current state-of-the-art deep learning models requires access to large
amounts of data and computational power. However, gathering such data is
challenging and computational resources might be limited. This work explores
and measures how current deep temporal action localization models perform in
settings constrained by the amount of data or computational power. We measure
data efficiency by training each model on a subset of the training set. We find
that TemporalMaxer outperforms other models in data-limited settings.
Furthermore, we recommend TriDet when training time is limited. To test the
efficiency of the models during inference, we pass videos of different lengths
through each model. We find that TemporalMaxer requires the least computational
resources, likely due to its simple architecture.Comment: Accepted to the CVEU workshop at ICCV 202