100 research outputs found

    Deep neural networks for video classification in ecology

    Get PDF
    Analyzing large volumes of video data is a challenging and time-consuming task. Automating this process would very valuable, especially in ecological research where massive amounts of video can be used to unlock new avenues of ecological research into the behaviour of animals in their environments. Deep Neural Networks, particularly Deep Convolutional Neural Networks, are a powerful class of models for computer vision. When combined with Recurrent Neural Networks, Deep Convolutional models can be applied to video for frame level video classification. This research studies two datasets: penguins and seals. The purpose of the research is to compare the performance of image-only CNNs, which treat each frame of a video independently, against a combined CNN-RNN approach; and to assess whether incorporating the motion information in the temporal aspect of video improves the accuracy of classifications in these two datasets. Video and image-only models offer similar out-of-sample performance on the simpler seals dataset but the video model led to moderate performance improvements on the more complex penguin action recognition dataset

    Training Deeper Neural Machine Translation Models with Transparent Attention

    Full text link
    While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and Bi-RNN encoders for machine translation. We propose a simple modification to the attention mechanism that eases the optimization of deeper models, and results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14 English-German and WMT'15 Czech-English tasks for both architectures.Comment: To appear in EMNLP 201

    Sparse, hierarchical and shared-factors priors for representation learning

    Get PDF
    La représentation en caractéristiques est une préoccupation centrale des systèmes d’apprentissage automatique d’aujourd’hui. Une représentation adéquate peut faciliter une tâche d’apprentissage complexe. C’est le cas lorsque par exemple cette représentation est de faible dimensionnalité et est constituée de caractéristiques de haut niveau. Mais comment déterminer si une représentation est adéquate pour une tâche d’apprentissage ? Les récents travaux suggèrent qu’il est préférable de voir le choix de la représentation comme un problème d’apprentissage en soi. C’est ce que l’on nomme l’apprentissage de représentation. Cette thèse présente une série de contributions visant à améliorer la qualité des représentations apprises. La première contribution élabore une étude comparative des approches par dictionnaire parcimonieux sur le problème de la localisation de points de prises (pour la saisie robotisée) et fournit une analyse empirique de leurs avantages et leurs inconvénients. La deuxième contribution propose une architecture réseau de neurones à convolution (CNN) pour la détection de points de prise et la compare aux approches d’apprentissage par dictionnaire. Ensuite, la troisième contribution élabore une nouvelle fonction d’activation paramétrique et la valide expérimentalement. Finalement, la quatrième contribution détaille un nouveau mécanisme de partage souple de paramètres dans un cadre d’apprentissage multitâche.Feature representation is a central concern of today’s machine learning systems. A proper representation can facilitate a complex learning task. This is the case when for instance the representation has low dimensionality and consists of high-level characteristics. But how can we determine if a representation is adequate for a learning task? Recent work suggests that it is better to see the choice of representation as a learning problem in itself. This is called Representation Learning. This thesis presents a series of contributions aimed at improving the quality of the learned representations. The first contribution elaborates a comparative study of Sparse Dictionary Learning (SDL) approaches on the problem of grasp detection (for robotic grasping) and provides an empirical analysis of their advantages and disadvantages. The second contribution proposes a Convolutional Neural Network (CNN) architecture for grasp detection and compares it to SDL. Then, the third contribution elaborates a new parametric activation function and validates it experimentally. Finally, the fourth contribution details a new soft parameter sharing mechanism for multitasking learning

    Application of capsule networks for image classification on complex datasets

    Get PDF
    Capsule Network, introduced in 2017 by Sabour, Hinton, and Frost, has sparked great interest in the computer vision and deep learning community and offers a paradigm shift in neural computation. In CapsNet, Sabour et. al. replace classical notions of scalar neural computation with a vectorised approach. This allows CapsNet to describe input images not only by the presence of constituent features but also by the pose of detected features, thus imparting view-point and pose invariance. Hinton’s group and the research community at large have applied CapsNets to a number of specific problems and achieved state-of-the-art performance. In contrast, this thesis studies CapsNet by applying it to complex real world datasets like CIFAR10 and CIFAR100 where the CapsNet’s performance is still unproven. We investigate the operational characteristics of CapsNet for the CIFAR10 problem and identify several practical limitations of Capsules that inhibit their performance in an industrial setting. The contribution of this research is the introduction of residual blocks of primary capsule layers. We developed a novel architecture for CIFAR10 classification, called ResCapsNet, and find that the model increases validation accuracy to 78.54% from 71.04% achieved by the baseline CapsNet, at the marginal cost of increasing the number of parameters from 22 million to 25 million. In addition, to extend the generalization of capsules into deeper networks, we discuss the application of Capsules as hidden layers in CIFAR100 classification and show that Capsules are largely ineffective in a latent unsupervised setting. For active supervision of hidden capsules, we propose methods to train hidden capsules as super-class detectors prior to final classification
    • …
    corecore