24 research outputs found

    HARD: Hard Augmentations for Robust Distillation

    Full text link
    Knowledge distillation (KD) is a simple and successful method to transfer knowledge from a teacher to a student model solely based on functional activity. However, current KD has a few shortcomings: it has recently been shown that this method is unsuitable to transfer simple inductive biases like shift equivariance, struggles to transfer out of domain generalization, and optimization time is magnitudes longer compared to default non-KD model training. To improve these aspects of KD, we propose Hard Augmentations for Robust Distillation (HARD), a generally applicable data augmentation framework, that generates synthetic data points for which the teacher and the student disagree. We show in a simple toy example that our augmentation framework solves the problem of transferring simple equivariances with KD. We then apply our framework in real-world tasks for a variety of augmentation models, ranging from simple spatial transformations to unconstrained image manipulations with a pretrained variational autoencoder. We find that our learned augmentations significantly improve KD performance on in-domain and out-of-domain evaluation. Moreover, our method outperforms even state-of-the-art data augmentations and since the augmented training inputs can be visualized, they offer a qualitative insight into the properties that are transferred from the teacher to the student. Thus HARD represents a generally applicable, dynamically optimized data augmentation technique tailored to improve the generalization and convergence speed of models trained with KD

    Natural Image Coding in V1: How Much Use is Orientation Selectivity?

    Get PDF
    Orientation selectivity is the most striking feature of simple cell coding in V1 which has been shown to emerge from the reduction of higher-order correlations in natural images in a large variety of statistical image models. The most parsimonious one among these models is linear Independent Component Analysis (ICA), whereas second-order decorrelation transformations such as Principal Component Analysis (PCA) do not yield oriented filters. Because of this finding it has been suggested that the emergence of orientation selectivity may be explained by higher-order redundancy reduction. In order to assess the tenability of this hypothesis, it is an important empirical question how much more redundancies can be removed with ICA in comparison to PCA, or other second-order decorrelation methods. This question has not yet been settled, as over the last ten years contradicting results have been reported ranging from less than five to more than hundred percent extra gain for ICA. Here, we aim at resolving this conflict by presenting a very careful and comprehensive analysis using three evaluation criteria related to redundancy reduction: In addition to the multi-information and the average log-loss we compute, for the first time, complete rate-distortion curves for ICA in comparison with PCA. Without exception, we find that the advantage of the ICA filters is surprisingly small. Furthermore, we show that a simple spherically symmetric distribution with only two parameters can fit the data even better than the probabilistic model underlying ICA. Since spherically symmetric models are agnostic with respect to the specific filter shapes, we conlude that orientation selectivity is unlikely to play a critical role for redundancy reduction

    Orientierungsselektivität und Kontrastverstärkungsregelung in Repräsentationen von natürlichen Bildern

    No full text
    This thesis explores the role of orientation selectivity and contrast gain control with respect to Barlow's normative redundancy reduction hypothesis in simple models of the early visual system. Our general approach uses the fact that-under the goal of redundancy reduction-early vision models are density models on natural images. We identify and develop new classes of probabilistic models for natural image patches that contain these early vision models. We use those classes to quantitatively explore their parameter space around the early vision models statistically and information theoretically with respect to the influence of filter shapes and contrast transforms on redundancy reduction. We identify an optimal contrast gain control transform and compare it to the standard model of cortical divisive contrast gain control, divisive normalization. We also identify a new estimation method for the true redundancy of natural images. Our main findings are that, in contrast to divisive contrast gain control, orientation selectivity plays a minor role for redundancy reduction in the models investigated, and that the cortical model of divisive contrast normalization is not the optimal redundancy reducing contrast transformation on static image patches. However, we are able to specify a dynamical model of cortical contrast gain control with strong redundancy reduction, through extending the static model by adaptation to temporal correlations between consecutive contrasts caused by fixations under natural viewing conditions.Diese Dissertation untersucht die Rolle von Orientierungsselektivität und Kontrastverstärkungsregelung in Hinblick auf Barlows Redundanzreduktionshypothese in einfachen Modellen des frühen Sehsystems. Unser genereller Ansatz benutzt die Tatsache, daß - unter dem Ziel Redundanzreduktion - Modelle des frühen Sehsystems mit Wahrscheinlichkeitsmodellen auf natürlichen Bildern äquivalent sind. Wir identifizieren und entwickeln neue Klassen von Wahrscheinlichkeitsverteilungen für Ausschnitte natürlicher Bilder, welche die Modelle des frühen Sehsystems enthalten. Wir benutzen diese Klassen um den Parameterraum um diese Modelle des frühen Sehsystems statistisch und informationstheoretisch zu untersuchen. Dabei quantifizieren wir den Einfluß von Kontrasttransformationen und der Form von rezeptiven Feldern auf Redundanzreduktion. Wir identifizieren eine optimale Transformation für Kontrastverstärkungsregelung und vergleichen sie mit dem Standardmodell für Kontrastverstärkungsregelung: Divisive Normalization. Darüber hinaus entwickeln wir eine neue Schätzmethode für die tatsächliche Redundanz natürlicher Bilder. Unsere wesentlichen Erkenntnisse sind, daß, im Gegensatz zu Kontrastverstärkungsregelung, Orientierungsselektivität eine untergeordnete Rolle für Redundanzreduktion in den untersuchten Modellen spielt und daß das kortikale Modell für Kontrastverstärkungsregelung (Divisive Normalization) nicht der optimalen Kontrastverstärkungsregelungstransformation auf statischen natürlichen Bildausschnitten entspricht. Allerdings können wir ein ein verbessertes dynamisches Modell für Kontrastverstärkungsregelung angeben, indem wir das statische Modell um eine Adaptatierung an zeitliche Korrelationen zwischen aufeinanderfolgenden Kontrasten erweitern, die durch Fixationen unter natürlichen Sehbedingungen erzeugt werden

    What is the limit of redundancy reduction with divisive normalization?

    No full text
    Divisive normalization has been proposed as a nonlinear redundancy reduction mechanism capturing contrast correlations. Its basic function is a radial rescaling of the population response. Because of the saturation of divisive normalization, however, it is impossible to achieve a fully independent representation. In this letter, we derive an analytical upper bound on the inevitable residual redundancy of any saturating radial rescaling mechanism
    corecore