Search CORE

20 research outputs found

From CNNs to Shift-Invariant Twin Models Based on Complex Wavelets

Author: Alahari Karteek
Leterme Hubert
Perrier Valérie
Polisano Kévin
Publication venue
Publication date: 21/04/2023
Field of study

We propose a novel antialiasing method to increase shift invariance and prediction accuracy in convolutional neural networks. Specifically, we replace the first-layer combination "real-valued convolutions + max pooling" (

\mathbb{R}

Max) by "complex-valued convolutions + modulus" (

\mathbb{C}

Mod), which is stable to translations. To justify our approach, we claim that

\mathbb{C}

Mod and

\mathbb{R}

Max produce comparable outputs when the convolution kernel is band-pass and oriented (Gabor-like filter). In this context,

\mathbb{C}

Mod can be considered as a stable alternative to

\mathbb{R}

Max. Thus, prior to antialiasing, we force the convolution kernels to adopt such a Gabor-like structure. The corresponding architecture is called mathematical twin, because it employs a well-defined mathematical operator to mimic the behavior of the original, freely-trained model. Our antialiasing approach achieves superior accuracy on ImageNet and CIFAR-10 classification tasks, compared to prior methods based on low-pass filtering. Arguably, our approach's emphasis on retaining high-frequency details contributes to a better balance between shift invariance and information preservation, resulting in improved performance. Furthermore, it has a lower computational cost and memory footprint than concurrent work, making it a promising solution for practical implementation

arXiv.org e-Print Archive

Ondelettes Complexes pour des Réseaux de Neurones Convolutifs Invariants par Translation

Author: Leterme Hubert
Publication venue: HAL CCSD
Publication date: 14/06/2023
Field of study

Despite significant advancements in computer vision over the past decade, convolutional neural networks (CNNs) still suffer from a lack of mathematical understanding. In particular, stability properties with respect to small transformations such as translations, rotations, scaling or deformations are only partially understood. While there is a broad literature on this topic, some gaps remain, specifically with regards to the combined effect of convolution and max pooling layers in producing near shift-invariant feature representations. This property is of utmost importance for classification, since two shifted versions of a single input image are expected to receive the same label.It is well-known that subsampled convolutions with band-pass filters are prone to producing unstable image representations when inputs are shifted by a few pixels. The first contribution of this thesis consists in proving that a nonlinear max pooling operator can partially restore shift invariance. By applying results from the wavelet theory, and adopting a probabilistic point of view, we reveal a similarity between the max pooling of real-valued convolutions, as implemented in conventional architectures, and the modulus of complex-valued convolutions, for which a measure of shift invariance is established.However, for specific filter frequencies, this similarity is lost, and CNNs become unstable to translations. This phenomenon, known as aliasing, can be avoided by employing additional low-pass filters in strategic locations of the network architecture, as several authors have done in recent years. While their methods effectively increase both shift invariance and prediction accuracy, they come at the cost of significant loss of high-frequency information. As a second contribution, we present a novel antialiasing method which, unlike previous methods, preserves this information. Relying on our theoretical study, the key idea is to exploit the properties of complex convolutions to guarantee near-shift invariance for any filter frequency.By adding an imaginary part to high-frequency kernels and replacing the max pooling layer with a simple modulus operator, we empirically evidence an increase in the network's stability and a lower error rate compared to previous approaches based on low-pass filtering.In conclusion, the aim of this thesis is twofold: improving the mathematical understanding of CNNs from the perspective of shift invariance, and improving the tradeoff between stability and information preserving, based on our theoretical contribution which is grounded in wavelet theory. Our findings thus have the potential to positively impact various applications of computer vision, especially in fields that require theoretical guarantees.Malgré des progrès spectaculaires en vision par ordinateur au cours de la dernière décennie, les réseaux de neurones convolutifs (CNN) souffrent toujours d'un faible niveau de compréhension mathématique. En particulier, les propriétés de stabilité vis-à-vis de petites transformations (translations, rotations, mises à l'échelle, déformations) ne sont que partiellement comprises. Bien qu'il existe une vaste littérature sur ce sujet, certaines lacunes subsistent, notamment concernant l'effet combiné des couches de convolution et de max pooling dans la génération de représentations quasi-invariantes. Cette propriété est primordiale pour la classification, puisqu'il est attendu que deux versions translatées d'une même image soient classifiées de manière identique.Les convolutions sous-échantillonnées avec des filtres passe-bande sont connues pour produire des représentations instables lorsque les images en entrée sont translatées de quelques pixels. La première contribution de cette thèse consiste à prouver qu'un opérateur non linéaire de max pooling est susceptible de partiellement restaurer l'invariance par translation. En appliquant des résultats issus de la théorie des ondelettes, et en adoptant un point de vue probabiliste, nous révélons une similitude entre le max pooling de convolutions à valeurs réelles, tel qu'implémenté dans les architectures conventionnelles, et le module de convolutions à valeurs complexes, pour lequel une mesure d'invariance par translation est établie.Cependant, pour certaines fréquences de filtre, une telle similitude ne se vérifie pas et les CNN deviennent instables face aux petites translations. Ce phénomène, connu sous le nom d'aliasing, peut être évité en appliquant des filtres passe-bas supplémentaires à des endroits stratégiques du réseau, comme plusieurs auteurs l'ont proposé au cours des dernières années. Ces méthodes, bien qu'elles améliorent sensiblement l'invariance par translation et la qualité des prédictions, impliquent une perte importante d'informations de haute fréquence. Comme seconde contribution, nous présentons une nouvelle méthode d'antialiasing qui, contrairement aux précédentes, préserve cette information. En s'appuyant sur notre étude théorique, l'idée clé est d'exploiter les propriétés des convolutions complexes pour garantir une quasi-invariance par translation quelle que soit la fréquence du filtre. En ajoutant une partie imaginaire aux filtres de haute fréquence et en remplaçant l'opérateur de max pooling par un simple module, nous mettons empiriquement en évidence une augmentation de la stabilité du réseau et un taux d'erreur plus faible par rapport aux approches précédentes basées sur des filtres passe-bas.En conclusion, l'objectif de cette thèse est double: améliorer la compréhension mathématique des CNN en termes d'invariance par translation, et améliorer le compromis entre stabilité et préservation de l'information, sur la base de notre contribution théorique fondée sur la théorie des ondelettes. Ces travaux ont donc le potentiel d'impacter positivement diverses applications de la vision par ordinateur, en particulier dans les domaines nécessitant des garanties théoriques

Hal - Université Grenoble Alpes

A Complex Wavelet Approach for Shift-Invariant Convolutional Neural Networks

Author: Leterme Hubert
Publication venue
Publication date: 14/06/2023
Field of study

Malgré des progrès spectaculaires en vision par ordinateur au cours de la dernière décennie, les réseaux de neurones convolutifs (CNN) souffrent toujours d'un faible niveau de compréhension mathématique. En particulier, les propriétés de stabilité vis-à-vis de petites transformations (translations, rotations, mises à l'échelle, déformations) ne sont que partiellement comprises. Bien qu'il existe une vaste littérature sur ce sujet, certaines lacunes subsistent, notamment concernant l'effet combiné des couches de convolution et de max pooling dans la génération de représentations quasi-invariantes. Cette propriété est primordiale pour la classification, puisqu'il est attendu que deux versions translatées d'une même image soient classifiées de manière identique.Les convolutions sous-échantillonnées avec des filtres passe-bande sont connues pour produire des représentations instables lorsque les images en entrée sont translatées de quelques pixels. La première contribution de cette thèse consiste à prouver qu'un opérateur non linéaire de max pooling est susceptible de partiellement restaurer l'invariance par translation. En appliquant des résultats issus de la théorie des ondelettes, et en adoptant un point de vue probabiliste, nous révélons une similitude entre le max pooling de convolutions à valeurs réelles, tel qu'implémenté dans les architectures conventionnelles, et le module de convolutions à valeurs complexes, pour lequel une mesure d'invariance par translation est établie.Cependant, pour certaines fréquences de filtre, une telle similitude ne se vérifie pas et les CNN deviennent instables face aux petites translations. Ce phénomène, connu sous le nom d'aliasing, peut être évité en appliquant des filtres passe-bas supplémentaires à des endroits stratégiques du réseau, comme plusieurs auteurs l'ont proposé au cours des dernières années. Ces méthodes, bien qu'elles améliorent sensiblement l'invariance par translation et la qualité des prédictions, impliquent une perte importante d'informations de haute fréquence. Comme seconde contribution, nous présentons une nouvelle méthode d'antialiasing qui, contrairement aux précédentes, préserve cette information. En s'appuyant sur notre étude théorique, l'idée clé est d'exploiter les propriétés des convolutions complexes pour garantir une quasi-invariance par translation quelle que soit la fréquence du filtre. En ajoutant une partie imaginaire aux filtres de haute fréquence et en remplaçant l'opérateur de max pooling par un simple module, nous mettons empiriquement en évidence une augmentation de la stabilité du réseau et un taux d'erreur plus faible par rapport aux approches précédentes basées sur des filtres passe-bas.En conclusion, l'objectif de cette thèse est double: améliorer la compréhension mathématique des CNN en termes d'invariance par translation, et améliorer le compromis entre stabilité et préservation de l'information, sur la base de notre contribution théorique fondée sur la théorie des ondelettes. Ces travaux ont donc le potentiel d'impacter positivement diverses applications de la vision par ordinateur, en particulier dans les domaines nécessitant des garanties théoriques.Despite significant advancements in computer vision over the past decade, convolutional neural networks (CNNs) still suffer from a lack of mathematical understanding. In particular, stability properties with respect to small transformations such as translations, rotations, scaling or deformations are only partially understood. While there is a broad literature on this topic, some gaps remain, specifically with regards to the combined effect of convolution and max pooling layers in producing near shift-invariant feature representations. This property is of utmost importance for classification, since two shifted versions of a single input image are expected to receive the same label.It is well-known that subsampled convolutions with band-pass filters are prone to producing unstable image representations when inputs are shifted by a few pixels. The first contribution of this thesis consists in proving that a nonlinear max pooling operator can partially restore shift invariance. By applying results from the wavelet theory, and adopting a probabilistic point of view, we reveal a similarity between the max pooling of real-valued convolutions, as implemented in conventional architectures, and the modulus of complex-valued convolutions, for which a measure of shift invariance is established.However, for specific filter frequencies, this similarity is lost, and CNNs become unstable to translations. This phenomenon, known as aliasing, can be avoided by employing additional low-pass filters in strategic locations of the network architecture, as several authors have done in recent years. While their methods effectively increase both shift invariance and prediction accuracy, they come at the cost of significant loss of high-frequency information. As a second contribution, we present a novel antialiasing method which, unlike previous methods, preserves this information. Relying on our theoretical study, the key idea is to exploit the properties of complex convolutions to guarantee near-shift invariance for any filter frequency.By adding an imaginary part to high-frequency kernels and replacing the max pooling layer with a simple modulus operator, we empirically evidence an increase in the network's stability and a lower error rate compared to previous approaches based on low-pass filtering.In conclusion, the aim of this thesis is twofold: improving the mathematical understanding of CNNs from the perspective of shift invariance, and improving the tradeoff between stability and information preserving, based on our theoretical contribution which is grounded in wavelet theory. Our findings thus have the potential to positively impact various applications of computer vision, especially in fields that require theoretical guarantees

Theses.fr

Ondelettes Complexes pour des Réseaux de Neurones Convolutifs Invariants par Translation

Author: Leterme Hubert
Publication venue: HAL CCSD
Publication date: 14/06/2023
Field of study

INRIA a CCSD electronic archive server

Ondelettes Complexes pour des Réseaux de Neurones Convolutifs Invariants par Translation

Author: Leterme Hubert
Publication venue: HAL CCSD
Publication date: 14/06/2023
Field of study

Thèses en Ligne

Modélisation Parcimonieuse de CNNs avec des Paquets d'Ondelettes Dual-Tree

Author: Alahari Karteek
Leterme Hubert
Perrier Valérie
Polisano Kévin
Publication venue: HAL CCSD
Publication date: 13/09/2021
Field of study

International audienceWe propose to improve the mathematical interpretability of convolutional neural networks (CNNs) for image classification. In this purpose, we replace the first layers of existing models such as AlexNet or ResNet by an operator containing the dual-tree wavelet packet transform, i.e., a redundant decomposition using complex and oriented waveforms. Our experiments show that these modified networks behave very similarly to the original models once trained. The goal is then to study this operator from a theoretical point of view and to identify potential optimizations. We want to analyze its main properties such as directional selectivity, stability with respect to small shifts and rotations, thus retaining discriminant information while decreasing intra-class variability. This work is a step toward a more complete description of CNNs using well-defined mathematical operators, characterized by a small number of arbitrary parameters, making them easier to interpret.Nous proposons d’améliorer l’interprétabilité mathématique des réseaux neuronaux convolutifs (CNNs) pour la classification d’images. Pour cela, nous remplaçons les premières couches de réseaux tels qu’AlexNet ou ResNet par un opérateur faisant intervenir une version complexe,orientée et redondante de la transformée en paquets d’ondelettes discrète, appelée en anglais dual-tree wavelet packet transform. Nous montrons expérimentalement que ces réseaux modifiés se comportent de manière très similaire aux modèles originaux une fois entraînés. L’objectif est ensuite d’étudier, d’un point de vue théorique, l’opérateur mathématique ainsi introduit, et d’identifier des leviers d’optimisation. Nous souhaitons analyser ses principales propriétés telles que la sélectivité directionnelle, la stabilité par translation et rotation, qui permettent de discriminer des images de nature différente tout en atténuant les sources de variabilité au sein d’une même classe d’images. Ce travail est un pas vers une description plus complète des CNNs existants à l’aide d’opérateurs mathématiques bien définis, caractérisés par un faible nombre de paramètres arbitraires, les rendant de fait plus aisés à interpréter

INRIA a CCSD electronic archive server

Modélisation Parcimonieuse de CNNs avec des Paquets d'Ondelettes Dual-Tree

Author: Alahari Karteek
Leterme Hubert
Perrier Valérie
Polisano Kévin
Publication venue: HAL CCSD
Publication date: 13/09/2021
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

On the Shift Invariance of Max Pooling Feature Maps in Convolutional Neural Networks

Author: Alahari Karteek
Leterme Hubert
Perrier Valérie
Polisano Kévin
Publication venue: HAL CCSD
Publication date: 16/09/2022
Field of study

This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.In this paper, we aim to improve the mathematical interpretability of convolutional neural networks for image classification. When trained on natural image datasets, such networks tend to learn parameters in the first layer that closely resemble oriented Gabor filters. By leveraging the properties of discrete Gabor-like convolutions, we prove that, under specific conditions, feature maps computed by the subsequent max pooling operator tend to approximate the modulus of complex Gabor-like coefficients, and as such, are stable with respect to certain input shifts. We then compute a probabilistic measure of shift invariance for these layers. More precisely, we show that some filters, depending on their frequency and orientation, are more likely than others to produce stable image representations. We experimentally validate our theory by considering a deterministic feature extractor based on the dual-tree wavelet packet transform, a particular case of discrete Gabor-like decomposition. We demonstrate a strong correlation between shift invariance on the one hand and similarity with complex modulus on the other hand

arXiv.org e-Print Archive

HAL - Normandie Université

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

Hal-Diderot