21 research outputs found
FineMorphs: Affine-diffeomorphic sequences for regression
A multivariate regression model of affine and diffeomorphic transformation
sequences - FineMorphs - is presented. Leveraging concepts from shape analysis,
model states are optimally "reshaped" by diffeomorphisms generated by smooth
vector fields during learning. Affine transformations and vector fields are
optimized within an optimal control setting, and the model can naturally reduce
(or increase) dimensionality and adapt to large datasets via suboptimal vector
fields. An existence proof of solution and necessary conditions for optimality
for the model are derived. Experimental results on real datasets from the UCI
repository are presented, with favorable results in comparison with
state-of-the-art in the literature and densely-connected neural networks in
TensorFlow.Comment: 39 pages, 7 figure
Pitfalls of Conditional Batch Normalization for Contextual Multi-Modal Learning
Humans have perfected the art of learning from multiple modalities through
sensory organs. Despite their impressive predictive performance on a single
modality, neural networks cannot reach human level accuracy with respect to
multiple modalities. This is a particularly challenging task due to variations
in the structure of respective modalities. Conditional Batch Normalization
(CBN) is a popular method that was proposed to learn contextual features to aid
deep learning tasks. This technique uses auxiliary data to improve
representational power by learning affine transformations for convolutional
neural networks. Despite the boost in performance observed by using CBN layers,
our work reveals that the visual features learned by introducing auxiliary data
via CBN deteriorates. We perform comprehensive experiments to evaluate the
brittleness of CBN networks to various datasets, suggesting that learning from
visual features alone could often be superior for generalization. We evaluate
CBN models on natural images for bird classification and histology images for
cancer type classification. We observe that the CBN network learns close to no
visual features on the bird classification dataset and partial visual features
on the histology dataset. Our extensive experiments reveal that CBN may
encourage shortcut learning between the auxiliary data and labels.Comment: Accepted at ICBINB workshop @ NeurIPS 202
The computational magic of the ventral stream
I argue that the sample complexity of (biological, feedforward) object recognition is mostly due to geometric image transformations and conjecture that a main goal of the ventral stream – V1, V2, V4 and IT – is to learn-and-discount image transformations.

In the first part of the paper I describe a class of simple and biologically plausible memory-based modules that learn transformations from unsupervised visual experience. The main theorems show that these modules provide (for every object) a signature which is invariant to local affine transformations and approximately invariant for other transformations. I also prove that,
in a broad class of hierarchical architectures, signatures remain invariant from layer to layer. The identification of these memory-based modules with complex (and simple) cells in visual areas leads to a theory of invariant recognition for the ventral stream.

In the second part, I outline a theory about hierarchical architectures that can learn invariance to transformations. I show that the memory complexity of learning affine transformations is drastically reduced in a hierarchical architecture that factorizes transformations in terms of the subgroup of translations and the subgroups of rotations and scalings. I then show how translations are automatically selected as the only learnable transformations during development by enforcing small apertures – eg small receptive fields – in the first layer.

In a third part I show that the transformations represented in each area can be optimized in terms of storage and robustness, as a consequence determining the tuning of the neurons in the area, rather independently (under normal conditions) of the statistics of natural images. I describe a model of learning that can be proved to have this property, linking in an elegant way the spectral properties of the signatures with the tuning of receptive fields in different areas. A surprising implication of these theoretical results is that the computational goals and some of the tuning properties of cells in the ventral stream may follow from symmetry properties (in the sense of physics) of the visual world through a process of unsupervised correlational learning, based on Hebbian synapses. In particular, simple and complex cells do not directly care about oriented bars: their tuning is a side effect of their role in translation invariance. Across the whole ventral stream the preferred features reported for neurons in different areas are only a symptom of the invariances computed and represented.

The results of each of the three parts stand on their own independently of each other. Together this theory-in-fieri makes several broad predictions, some of which are:

-invariance to small transformations in early areas (eg translations in V1) may underly stability of visual perception (suggested by Stu Geman);

-each cell’s tuning properties are shaped by visual experience of image transformations during developmental and adult plasticity;

-simple cells are likely to be the same population as complex cells, arising from different convergence of the Hebbian learning rule. The input to complex “complex” cells are dendritic branches with simple cell properties;

-class-specific transformations are learned and represented at the top of the ventral stream hierarchy; thus class-specific modules such as faces, places and possibly body areas should exist in IT;

-the type of transformations that are learned from visual experience depend on the size of the receptive fields and thus on the area (layer in the models) – assuming that the size increases with layers;

-the mix of transformations learned in each area influences the tuning properties of the cells oriented bars in V1+V2, radial and spiral patterns in V4 up to class specific tuning in AIT (eg face tuned cells);

-features must be discriminative and invariant: invariance to transformations is the primary determinant of the tuning of cortical neurons rather than statistics of natural images.

The theory is broadly consistent with the current version of HMAX. It explains it and extend it in terms of unsupervised learning, a broader class of transformation invariance and higher level modules. The goal of this paper is to sketch a comprehensive theory with little regard for mathematical niceties. If the theory turns out to be useful there will be scope for deep mathematics, ranging from group representation tools to wavelet theory to dynamics of learning
FedDrive: Generalizing Federated Learning to Semantic Segmentation in Autonomous Driving
Semantic Segmentation is essential to make self-driving vehicles autonomous,
enabling them to understand their surroundings by assigning individual pixels
to known categories. However, it operates on sensible data collected from the
users' cars; thus, protecting the clients' privacy becomes a primary concern.
For similar reasons, Federated Learning has been recently introduced as a new
machine learning paradigm aiming to learn a global model while preserving
privacy and leveraging data on millions of remote devices. Despite several
efforts on this topic, no work has explicitly addressed the challenges of
federated learning in semantic segmentation for driving so far. To fill this
gap, we propose FedDrive, a new benchmark consisting of three settings and two
datasets, incorporating the real-world challenges of statistical heterogeneity
and domain generalization. We benchmark state-of-the-art algorithms from the
federated learning literature through an in-depth analysis, combining them with
style transfer methods to improve their generalization ability. We demonstrate
that correctly handling normalization statistics is crucial to deal with the
aforementioned challenges. Furthermore, style transfer improves performance
when dealing with significant appearance shifts. Official website:
https://feddrive.github.io