28 research outputs found
HARD: Hard Augmentations for Robust Distillation
Knowledge distillation (KD) is a simple and successful method to transfer
knowledge from a teacher to a student model solely based on functional
activity. However, current KD has a few shortcomings: it has recently been
shown that this method is unsuitable to transfer simple inductive biases like
shift equivariance, struggles to transfer out of domain generalization, and
optimization time is magnitudes longer compared to default non-KD model
training. To improve these aspects of KD, we propose Hard Augmentations for
Robust Distillation (HARD), a generally applicable data augmentation framework,
that generates synthetic data points for which the teacher and the student
disagree. We show in a simple toy example that our augmentation framework
solves the problem of transferring simple equivariances with KD. We then apply
our framework in real-world tasks for a variety of augmentation models, ranging
from simple spatial transformations to unconstrained image manipulations with a
pretrained variational autoencoder. We find that our learned augmentations
significantly improve KD performance on in-domain and out-of-domain evaluation.
Moreover, our method outperforms even state-of-the-art data augmentations and
since the augmented training inputs can be visualized, they offer a qualitative
insight into the properties that are transferred from the teacher to the
student. Thus HARD represents a generally applicable, dynamically optimized
data augmentation technique tailored to improve the generalization and
convergence speed of models trained with KD
Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation
Single camera 3D pose estimation is an ill-defined problem due to inherent
ambiguities from depth, occlusion or keypoint noise. Multi-hypothesis pose
estimation accounts for this uncertainty by providing multiple 3D poses
consistent with the 2D measurements. Current research has predominantly
concentrated on generating multiple hypotheses for single frame static pose
estimation. In this study we focus on the new task of multi-hypothesis motion
estimation. Motion estimation is not simply pose estimation applied to multiple
frames, which would ignore temporal correlation across frames. Instead, it
requires distributions which are capable of generating temporally consistent
samples, which is significantly more challenging. To this end, we introduce
Platypose, a framework that uses a diffusion model pretrained on 3D human
motion sequences for zero-shot 3D pose sequence estimation. Platypose
outperforms baseline methods on multiple hypotheses for motion estimation.
Additionally, Platypose also achieves state-of-the-art calibration and
competitive joint error when tested on static poses from Human3.6M,
MPI-INF-3DHP and 3DPW. Finally, because it is zero-shot, our method generalizes
flexibly to different settings such as multi-camera inference
Natural Image Coding in V1: How Much Use is Orientation Selectivity?
Orientation selectivity is the most striking feature of simple cell coding in
V1 which has been shown to emerge from the reduction of higher-order
correlations in natural images in a large variety of statistical image models.
The most parsimonious one among these models is linear Independent Component
Analysis (ICA), whereas second-order decorrelation transformations such as
Principal Component Analysis (PCA) do not yield oriented filters. Because of
this finding it has been suggested that the emergence of orientation
selectivity may be explained by higher-order redundancy reduction. In order to
assess the tenability of this hypothesis, it is an important empirical question
how much more redundancies can be removed with ICA in comparison to PCA, or
other second-order decorrelation methods. This question has not yet been
settled, as over the last ten years contradicting results have been reported
ranging from less than five to more than hundred percent extra gain for ICA.
Here, we aim at resolving this conflict by presenting a very careful and
comprehensive analysis using three evaluation criteria related to redundancy
reduction: In addition to the multi-information and the average log-loss we
compute, for the first time, complete rate-distortion curves for ICA in
comparison with PCA. Without exception, we find that the advantage of the ICA
filters is surprisingly small. Furthermore, we show that a simple spherically
symmetric distribution with only two parameters can fit the data even better
than the probabilistic model underlying ICA. Since spherically symmetric models
are agnostic with respect to the specific filter shapes, we conlude that
orientation selectivity is unlikely to play a critical role for redundancy
reduction
Orientierungsselektivität und Kontrastverstärkungsregelung in Repräsentationen von natürlichen Bildern
This thesis explores the role of orientation selectivity and contrast gain control with respect to Barlow's normative redundancy reduction hypothesis in simple models of the early visual system. Our general approach uses the fact that-under the goal of redundancy reduction-early vision models are density models on natural images. We identify and develop new classes of probabilistic models for natural image patches that contain these early vision models. We use those classes to quantitatively explore their parameter space around the early vision models statistically and information theoretically with respect to the influence of filter shapes and contrast transforms on redundancy reduction. We identify an optimal contrast gain control transform and compare it to the standard model of cortical divisive contrast gain control, divisive normalization. We also identify a new estimation method for the true redundancy of natural images.
Our main findings are that, in contrast to divisive contrast gain control, orientation selectivity plays a minor role for redundancy reduction in the models investigated, and that the cortical model of divisive contrast normalization is not the optimal redundancy reducing contrast transformation on static image patches. However, we are able to specify a dynamical model of cortical contrast gain control with strong redundancy reduction, through extending the static model by adaptation to temporal correlations between consecutive contrasts caused by fixations under natural viewing conditions.Diese Dissertation untersucht die Rolle von Orientierungsselektivität und Kontrastverstärkungsregelung in Hinblick auf Barlows Redundanzreduktionshypothese in einfachen Modellen des frühen Sehsystems. Unser genereller Ansatz benutzt die Tatsache, daß - unter dem Ziel Redundanzreduktion - Modelle des frühen Sehsystems mit Wahrscheinlichkeitsmodellen auf natürlichen Bildern äquivalent sind. Wir identifizieren und entwickeln neue Klassen von Wahrscheinlichkeitsverteilungen für Ausschnitte natürlicher Bilder, welche die Modelle des frühen Sehsystems enthalten. Wir benutzen diese Klassen um den Parameterraum um diese Modelle des frühen Sehsystems statistisch und informationstheoretisch zu untersuchen. Dabei quantifizieren wir den Einfluß von Kontrasttransformationen und der Form von rezeptiven Feldern auf Redundanzreduktion. Wir identifizieren eine optimale Transformation für Kontrastverstärkungsregelung und vergleichen sie mit dem Standardmodell für Kontrastverstärkungsregelung: Divisive Normalization. Darüber hinaus entwickeln wir eine neue Schätzmethode für die tatsächliche Redundanz natürlicher Bilder.
Unsere wesentlichen Erkenntnisse sind, daß, im Gegensatz zu Kontrastverstärkungsregelung, Orientierungsselektivität eine untergeordnete Rolle für Redundanzreduktion in den untersuchten Modellen spielt und daß das kortikale Modell für Kontrastverstärkungsregelung (Divisive Normalization) nicht der optimalen Kontrastverstärkungsregelungstransformation auf statischen natürlichen Bildausschnitten entspricht. Allerdings können wir ein ein verbessertes dynamisches Modell für Kontrastverstärkungsregelung angeben, indem wir das statische Modell um eine Adaptatierung an zeitliche Korrelationen zwischen aufeinanderfolgenden Kontrasten erweitern, die durch Fixationen unter natürlichen Sehbedingungen erzeugt werden
What is the limit of redundancy reduction with divisive normalization?
Divisive normalization has been proposed as a nonlinear redundancy reduction mechanism capturing contrast correlations. Its basic function is a radial rescaling of the population response. Because of the saturation of divisive normalization, however, it is impossible to achieve a fully independent representation. In this letter, we derive an analytical upper bound on the inevitable residual redundancy of any saturating radial rescaling mechanism