34 research outputs found
Deep Self-Taught Learning for Handwritten Character Recognition
Recent theoretical and empirical work in statistical machine learning has
demonstrated the importance of learning algorithms for deep architectures,
i.e., function classes obtained by composing multiple non-linear
transformations. Self-taught learning (exploiting unlabeled examples or
examples from other distributions) has already been applied to deep learners,
but mostly to show the advantage of unlabeled examples. Here we explore the
advantage brought by {\em out-of-distribution examples}. For this purpose we
developed a powerful generator of stochastic variations and noise processes for
character images, including not only affine transformations but also slant,
local elastic deformations, changes in thickness, background images, grey level
changes, contrast, occlusion, and various types of noise. The
out-of-distribution examples are obtained from these highly distorted images or
by including examples of object classes different from those in the target test
set. We show that {\em deep learners benefit more from out-of-distribution
examples than a corresponding shallow learner}, at least in the area of
handwritten character recognition. In fact, we show that they beat previously
published results and reach human-level performance on both handwritten digit
classification and 62-class handwritten character recognition
"Mental Rotation" by Optimizing Transforming Distance
The human visual system is able to recognize objects despite transformations
that can drastically alter their appearance. To this end, much effort has been
devoted to the invariance properties of recognition systems. Invariance can be
engineered (e.g. convolutional nets), or learned from data explicitly (e.g.
temporal coherence) or implicitly (e.g. by data augmentation). One idea that
has not, to date, been explored is the integration of latent variables which
permit a search over a learned space of transformations. Motivated by evidence
that people mentally simulate transformations in space while comparing
examples, so-called "mental rotation", we propose a transforming distance.
Here, a trained relational model actively transforms pairs of examples so that
they are maximally similar in some feature space yet respect the learned
transformational constraints. We apply our method to nearest-neighbour problems
on the Toronto Face Database and NORB
Further advantages of data augmentation on convolutional neural networks
Data augmentation is a popular technique largely used to enhance the training
of convolutional neural networks. Although many of its benefits are well known
by deep learning researchers and practitioners, its implicit regularization
effects, as compared to popular explicit regularization techniques, such as
weight decay and dropout, remain largely unstudied. As a matter of fact,
convolutional neural networks for image object classification are typically
trained with both data augmentation and explicit regularization, assuming the
benefits of all techniques are complementary. In this paper, we systematically
analyze these techniques through ablation studies of different network
architectures trained with different amounts of training data. Our results
unveil a largely ignored advantage of data augmentation: networks trained with
just data augmentation more easily adapt to different architectures and amount
of training data, as opposed to weight decay and dropout, which require
specific fine-tuning of their hyperparameters.Comment: Preprint of the manuscript accepted for presentation at the
International Conference on Artificial Neural Networks (ICANN) 2018. Best
Paper Awar
Data Augmentation in Training CNNs: Injecting Noise to Images
Noise injection is a fundamental tool for data augmentation, and yet there is
no widely accepted procedure to incorporate it with learning frameworks. This
study analyzes the effects of adding or applying different noise models of
varying magnitudes to Convolutional Neural Network (CNN) architectures. Noise
models that are distributed with different density functions are given common
magnitude levels via Structural Similarity (SSIM) metric in order to create an
appropriate ground for comparison. The basic results are conforming with the
most of the common notions in machine learning, and also introduce some novel
heuristics and recommendations on noise injection. The new approaches will
provide better understanding on optimal learning procedures for image
classification.Comment: 12 pages, 9 figures, 2 tables, old paper just submitted to arXi
Pal-GAN: Palette-conditioned Generative Adversarial Networks
Recent advances in Generative Adversarial Networks (GANs) have shown great progress on a large variety of tasks. A common technique used to yield greater diversity of samples is conditioning on class labels. Conditioning on high-dimensional structured or unstructured information has also been shown to improve generation results, e.g. Image-to-Image translation. The conditioning information is provided in the form of human annotations, which can be expensive and difficult to obtain in cases where domain knowledge experts are needed. In this paper, we present an alternative: conditioning on low-dimensional structured information that can be automatically extracted from the input without the need for human annotators. Specifically, we propose a Palette-conditioned Generative Adversarial Network (Pal-GAN), an architecture-agnostic model that conditions on both a colour palette and a segmentation mask for high quality image synthesis. We show improvements on conditional consistency, intersection-over-union, and Fréchet inception distance scores. Additionally, we show that sampling colour palettes significantly changes the style of the generated images