11 research outputs found
Deep Network Classification by Scattering and Homotopy Dictionary Learning
We introduce a sparse scattering deep convolutional neural network, which
provides a simple model to analyze properties of deep representation learning
for classification. Learning a single dictionary matrix with a classifier
yields a higher classification accuracy than AlexNet over the ImageNet 2012
dataset. The network first applies a scattering transform that linearizes
variabilities due to geometric transformations such as translations and small
deformations. A sparse dictionary coding reduces intra-class
variability while preserving class separation through projections over unions
of linear spaces. It is implemented in a deep convolutional network with a
homotopy algorithm having an exponential convergence. A convergence proof is
given in a general framework that includes ALISTA. Classification results are
analyzed on ImageNet
Deep Network Classification by Scattering and Homotopy Dictionary Learning
International audienceWe introduce a sparse scattering deep convolutional neural network, which provides a simple model to analyze properties of deep representation learning for classification. Learning a single dictionary matrix with a classifier yields a higher classification accuracy than AlexNet over the ImageNet 2012 dataset. The network first applies a scattering transform that linearizes variabilities due to geometric transformations such as translations and small deformations. A sparse l1 dictionary coding reduces intra-class variability while preserving class separation through projections over unions of linear spaces. It is implemented in a deep convolutional network with a homotopy algorithm having an exponential convergence. A convergence proof is given in a general framework that includes ALISTA. Classification results are analyzed on ImageNet
Separation and Concentration in Deep Networks
Numerical experiments demonstrate that deep neural network classifiers
progressively separate class distributions around their mean, achieving linear
separability on the training set, and increasing the Fisher discriminant ratio.
We explain this mechanism with two types of operators. We prove that a
rectifier without biases applied to sign-invariant tight frames can separate
class means and increase Fisher ratios. On the opposite, a soft-thresholding on
tight frames can reduce within-class variabilities while preserving class
means. Variance reduction bounds are proved for Gaussian mixture models. For
image classification, we show that separation of class means can be achieved
with rectified wavelet tight frames that are not learned. It defines a
scattering transform. Learning convolutional tight frames along
scattering channels and applying a soft-thresholding reduces within-class
variabilities. The resulting scattering network reaches the classification
accuracy of ResNet-18 on CIFAR-10 and ImageNet, with fewer layers and no
learned biases
Efficient and Modular Implicit Differentiation
Automatic differentiation (autodiff) has revolutionized machine learning. It
allows expressing complex computations by composing elementary ones in creative
ways and removes the burden of computing their derivatives by hand. More
recently, differentiation of optimization problem solutions has attracted
widespread attention with applications such as optimization as a layer, and in
bi-level problems such as hyper-parameter optimization and meta-learning.
However, the formulas for these derivatives often involve case-by-case tedious
mathematical derivations. In this paper, we propose a unified, efficient and
modular approach for implicit differentiation of optimization problems. In our
approach, the user defines (in Python in the case of our implementation) a
function capturing the optimality conditions of the problem to be
differentiated. Once this is done, we leverage autodiff of and implicit
differentiation to automatically differentiate the optimization problem. Our
approach thus combines the benefits of implicit differentiation and autodiff.
It is efficient as it can be added on top of any state-of-the-art solver and
modular as the optimality condition specification is decoupled from the
implicit differentiation mechanism. We show that seemingly simple principles
allow to recover many recently proposed implicit differentiation methods and
create new ones easily. We demonstrate the ease of formulating and solving
bi-level optimization problems using our framework. We also showcase an
application to the sensitivity analysis of molecular dynamics.Comment: V2: some corrections and link to softwar
The Unreasonable Effectiveness of Patches in Deep Convolutional Kernels Methods
International audienceA recent line of work showed that various forms of convolutional kernel methods can be competitive with standard supervised deep convolutional networks on datasets like CIFAR-10, obtaining accuracies in the range of 87-90% while being more amenable to theoretical analysis. In this work, we highlight the importance of a data-dependent feature extraction step that is key to obtain good performance in convolutional kernel methods. This step typically corresponds to a whitened dictionary of patches, and gives rise to a data-driven convolutional kernel methods. We extensively study its effect, demonstrating it is the key ingredient for high performance of these methods. Specifically, we show that one of the simplest instances of such kernel methods, based on a single layer of image patches followed by a linear classifier is already obtaining classification accuracies on CIFAR-10 in the same range as previous more sophisticated convolutional kernel methods. We scale this method to the challenging ImageNet dataset, showing such a simple approach can exceed all existing non-learned representation methods. This is a new baseline for object recognition without representation learning methods, that initiates the investigation of convolutional kernel models on ImageNet. We conduct experiments to analyze the dictionary that we used, our ablations showing they exhibit low-dimensional properties