34 research outputs found

    Deep Self-Taught Learning for Handwritten Character Recognition

    Full text link
    Recent theoretical and empirical work in statistical machine learning has demonstrated the importance of learning algorithms for deep architectures, i.e., function classes obtained by composing multiple non-linear transformations. Self-taught learning (exploiting unlabeled examples or examples from other distributions) has already been applied to deep learners, but mostly to show the advantage of unlabeled examples. Here we explore the advantage brought by {\em out-of-distribution examples}. For this purpose we developed a powerful generator of stochastic variations and noise processes for character images, including not only affine transformations but also slant, local elastic deformations, changes in thickness, background images, grey level changes, contrast, occlusion, and various types of noise. The out-of-distribution examples are obtained from these highly distorted images or by including examples of object classes different from those in the target test set. We show that {\em deep learners benefit more from out-of-distribution examples than a corresponding shallow learner}, at least in the area of handwritten character recognition. In fact, we show that they beat previously published results and reach human-level performance on both handwritten digit classification and 62-class handwritten character recognition

    "Mental Rotation" by Optimizing Transforming Distance

    Full text link
    The human visual system is able to recognize objects despite transformations that can drastically alter their appearance. To this end, much effort has been devoted to the invariance properties of recognition systems. Invariance can be engineered (e.g. convolutional nets), or learned from data explicitly (e.g. temporal coherence) or implicitly (e.g. by data augmentation). One idea that has not, to date, been explored is the integration of latent variables which permit a search over a learned space of transformations. Motivated by evidence that people mentally simulate transformations in space while comparing examples, so-called "mental rotation", we propose a transforming distance. Here, a trained relational model actively transforms pairs of examples so that they are maximally similar in some feature space yet respect the learned transformational constraints. We apply our method to nearest-neighbour problems on the Toronto Face Database and NORB

    Further advantages of data augmentation on convolutional neural networks

    Full text link
    Data augmentation is a popular technique largely used to enhance the training of convolutional neural networks. Although many of its benefits are well known by deep learning researchers and practitioners, its implicit regularization effects, as compared to popular explicit regularization techniques, such as weight decay and dropout, remain largely unstudied. As a matter of fact, convolutional neural networks for image object classification are typically trained with both data augmentation and explicit regularization, assuming the benefits of all techniques are complementary. In this paper, we systematically analyze these techniques through ablation studies of different network architectures trained with different amounts of training data. Our results unveil a largely ignored advantage of data augmentation: networks trained with just data augmentation more easily adapt to different architectures and amount of training data, as opposed to weight decay and dropout, which require specific fine-tuning of their hyperparameters.Comment: Preprint of the manuscript accepted for presentation at the International Conference on Artificial Neural Networks (ICANN) 2018. Best Paper Awar

    Data Augmentation in Training CNNs: Injecting Noise to Images

    Full text link
    Noise injection is a fundamental tool for data augmentation, and yet there is no widely accepted procedure to incorporate it with learning frameworks. This study analyzes the effects of adding or applying different noise models of varying magnitudes to Convolutional Neural Network (CNN) architectures. Noise models that are distributed with different density functions are given common magnitude levels via Structural Similarity (SSIM) metric in order to create an appropriate ground for comparison. The basic results are conforming with the most of the common notions in machine learning, and also introduce some novel heuristics and recommendations on noise injection. The new approaches will provide better understanding on optimal learning procedures for image classification.Comment: 12 pages, 9 figures, 2 tables, old paper just submitted to arXi

    Pal-GAN: Palette-conditioned Generative Adversarial Networks

    Get PDF
    Recent advances in Generative Adversarial Networks (GANs) have shown great progress on a large variety of tasks. A common technique used to yield greater diversity of samples is conditioning on class labels. Conditioning on high-dimensional structured or unstructured information has also been shown to improve generation results, e.g. Image-to-Image translation. The conditioning information is provided in the form of human annotations, which can be expensive and difficult to obtain in cases where domain knowledge experts are needed. In this paper, we present an alternative: conditioning on low-dimensional structured information that can be automatically extracted from the input without the need for human annotators. Specifically, we propose a Palette-conditioned Generative Adversarial Network (Pal-GAN), an architecture-agnostic model that conditions on both a colour palette and a segmentation mask for high quality image synthesis. We show improvements on conditional consistency, intersection-over-union, and Fréchet inception distance scores. Additionally, we show that sampling colour palettes significantly changes the style of the generated images