Search CORE

3,014 research outputs found

Spatial Shortcut Network for Human Pose Estimation

Author: Ali Usman
Bayramli Bayram
Lu Hongtao
Qi Te
Zhang Qinchuan
Publication venue
Publication date: 05/04/2019
Field of study

Like many computer vision problems, human pose estimation is a challenging problem in that recognizing a body part requires not only information from local area but also from areas with large spatial distance. In order to spatially pass information, large convolutional kernels and deep layers have been normally used, introducing high computation cost and large parameter space. Luckily for pose estimation, human body is geometrically structured in images, enabling modeling of spatial dependency. In this paper, we propose a spatial shortcut network for pose estimation task, where information is easier to flow spatially. We evaluate our model with detailed analyses and present its outstanding performance with smaller structure.Comment: 12 page

arXiv.org e-Print Archive

CU-Net: Coupled U-Nets

Author: Geng Shijie
Metaxas Dimitris N.
Peng Xi
Tang Zhiqiang
Zhu Yizhe
Publication venue
Publication date: 20/08/2018
Field of study

We design a new connectivity pattern for the U-Net architecture. Given several stacked U-Nets, we couple each U-Net pair through the connections of their semantic blocks, resulting in the coupled U-Nets (CU-Net). The coupling connections could make the information flow more efficiently across U-Nets. The feature reuse across U-Nets makes each U-Net very parameter efficient. We evaluate the coupled U-Nets on two benchmark datasets of human pose estimation. Both the accuracy and model parameter number are compared. The CU-Net obtains comparable accuracy as state-of-the-art methods. However, it only has at least 60% fewer parameters than other approaches.Comment: BMVC 2018 (Oral

arXiv.org e-Print Archive

NDDR-CNN: Layerwise Feature Fusing in Multi-Task CNNs by Neural Discriminative Dimensionality Reduction

Author: Gao Yuan
Liu Wei
Ma Jiayi
Yuille Alan L.
Zhao Mingbo
Publication venue
Publication date: 04/04/2019
Field of study

In this paper, we propose a novel Convolutional Neural Network (CNN) structure for general-purpose multi-task learning (MTL), which enables automatic feature fusing at every layer from different tasks. This is in contrast with the most widely used MTL CNN structures which empirically or heuristically share features on some specific layers (e.g., share all the features except the last convolutional layer). The proposed layerwise feature fusing scheme is formulated by combining existing CNN components in a novel way, with clear mathematical interpretability as discriminative dimensionality reduction, which is referred to as Neural Discriminative Dimensionality Reduction (NDDR). Specifically, we first concatenate features with the same spatial resolution from different tasks according to their channel dimension. Then, we show that the discriminative dimensionality reduction can be fulfilled by 1x1 Convolution, Batch Normalization, and Weight Decay in one CNN. The use of existing CNN components ensures the end-to-end training and the extensibility of the proposed NDDR layer to various state-of-the-art CNN architectures in a "plug-and-play" manner. The detailed ablation analysis shows that the proposed NDDR layer is easy to train and also robust to different hyperparameters. Experiments on different task sets with various base network architectures demonstrate the promising performance and desirable generalizability of our proposed method. The code of our paper is available at https://github.com/ethanygao/NDDR-CNN.Comment: 11 pages, 3 figures, 9 table

arXiv.org e-Print Archive

Human Pose Regression by Combining Indirect Part Detection and Contextual Information

Author: Luvizon Diogo C.
Picard David
Tabia Hedi
Publication venue
Publication date: 06/10/2017
Field of study

In this paper, we propose an end-to-end trainable regression approach for human pose estimation from still images. We use the proposed Soft-argmax function to convert feature maps directly to joint coordinates, resulting in a fully differentiable framework. Our method is able to learn heat maps representations indirectly, without additional steps of artificial ground truth generation. Consequently, contextual information can be included to the pose predictions in a seamless way. We evaluated our method on two very challenging datasets, the Leeds Sports Poses (LSP) and the MPII Human Pose datasets, reaching the best performance among all the existing regression methods and comparable results to the state-of-the-art detection based approaches

arXiv.org e-Print Archive

Smart Device based Initial Movement Detection of Cyclists using Convolutional Neuronal Networks

Author: Bieshaar Maarten
Schneegans Jan
Publication venue
Publication date: 08/08/2018
Field of study

For future traffic scenarios, we envision interconnected traffic participants, who exchange information about their current state, e.g., position, their predicted intentions, allowing to act in a cooperative manner. Vulnerable road users (VRUs), e.g., pedestrians and cyclists, will be equipped with smart device that can be used to detect their intentions and transmit these detected intention to approaching cars such that their drivers can be warned. In this article, we focus on detecting the initial movement of cyclist using smart devices. Smart devices provide the necessary sensors, namely accelerometer and gyroscope, and therefore pose an excellent instrument to detect movement transitions (e.g., waiting to moving) fast. Convolutional Neural Networks prove to be the state-of-the-art solution for many problems with an ever increasing range of applications. Therefore, we model the initial movement detection as a classification problem. In terms of Organic Computing (OC) it be seen as a step towards self-awareness and self-adaptation. We apply residual network architectures to the task of detecting the initial starting movement of cyclists.Comment: 12 pages, accepted for publication at OC-DDC 2018, W\"urzburg, German

arXiv.org e-Print Archive

Dynamic Convolutions: Exploiting Spatial Sparsity for Faster Inference

Author: Tuytelaars Tinne
Verelst Thomas
Publication venue
Publication date: 25/05/2020
Field of study

Modern convolutional neural networks apply the same operations on every pixel in an image. However, not all image regions are equally important. To address this inefficiency, we propose a method to dynamically apply convolutions conditioned on the input image. We introduce a residual block where a small gating branch learns which spatial positions should be evaluated. These discrete gating decisions are trained end-to-end using the Gumbel-Softmax trick, in combination with a sparsity criterion. Our experiments on CIFAR, ImageNet and MPII show that our method has better focus on the region of interest and better accuracy than existing methods, at a lower computational complexity. Moreover, we provide an efficient CUDA implementation of our dynamic convolutions using a gather-scatter approach, achieving a significant improvement in inference speed with MobileNetV2 residual blocks. On human pose estimation, a task that is inherently spatially sparse, the processing speed is increased by 60% with no loss in accuracy.Comment: CVPR 2020 (poster) https://github.com/thomasverelst/dyncon

arXiv.org e-Print Archive

Generate What You Can't See - a View-dependent Image Generation

Author: Belter Dominik
Piaskowski Karol
Staszak Rafal
Publication venue
Publication date: 15/03/2019
Field of study

In order to operate autonomously, a robot should explore the environment and build a model of each of the surrounding objects. A common approach is to carefully scan the whole workspace. This is time-consuming. It is also often impossible to reach all the viewpoints required to acquire full knowledge about the environment. Humans can perform shape completion of occluded objects by relying on past experience. Therefore, we propose a method that generates images of an object from various viewpoints using a single input RGB image. A deep neural network is trained to imagine the object appearance from many viewpoints. We present the whole pipeline, which takes a single RGB image as input and returns a sequence of RGB and depth images of the object. The method utilizes a CNN-based object detector to extract the object from the natural scene. Then, the proposed network generates a set of RGB and depth images. We show the results both on a synthetic dataset and on real images.Comment: Submitted to IROS 2019. Copyright 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses. Supplementary video: https://youtu.be/gCAoJ7BM5F

arXiv.org e-Print Archive

Residual Codean Autoencoder for Facial Attribute Analysis

Author: Sethi Akshay
Singh Maneet
Singh Richa
Vatsa Mayank
Publication venue
Publication date: 20/03/2018
Field of study

Facial attributes can provide rich ancillary information which can be utilized for different applications such as targeted marketing, human computer interaction, and law enforcement. This research focuses on facial attribute prediction using a novel deep learning formulation, termed as R-Codean autoencoder. The paper first presents Cosine similarity based loss function in an autoencoder which is then incorporated into the Euclidean distance based autoencoder to formulate R-Codean. The proposed loss function thus aims to incorporate both magnitude and direction of image vectors during feature learning. Further, inspired by the utility of shortcut connections in deep models to facilitate learning of optimal parameters, without incurring the problem of vanishing gradient, the proposed formulation is extended to incorporate shortcut connections in the architecture. The proposed R-Codean autoencoder is utilized in facial attribute prediction framework which incorporates patch-based weighting mechanism for assigning higher weights to relevant patches for each attribute. The experimental results on publicly available CelebA and LFWA datasets demonstrate the efficacy of the proposed approach in addressing this challenging problem.Comment: Accepted in Pattern Recognition Letter

arXiv.org e-Print Archive

A Survey of the Recent Architectures of Deep Convolutional Neural Networks

Author: Khan Asifullah
Qureshi Aqsa Saeed
Sohail Anabia
Zahoora Umme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/05/2020
Field of study

Deep Convolutional Neural Network (CNN) is a special type of Neural Networks, which has shown exemplary performance on several competitions related to Computer Vision and Image Processing. Some of the exciting application areas of CNN include Image Classification and Segmentation, Object Detection, Video Processing, Natural Language Processing, and Speech Recognition. The powerful learning ability of deep CNN is primarily due to the use of multiple feature extraction stages that can automatically learn representations from the data. The availability of a large amount of data and improvement in the hardware technology has accelerated the research in CNNs, and recently interesting deep CNN architectures have been reported. Several inspiring ideas to bring advancements in CNNs have been explored, such as the use of different activation and loss functions, parameter optimization, regularization, and architectural innovations. However, the significant improvement in the representational capacity of the deep CNN is achieved through architectural innovations. Notably, the ideas of exploiting spatial and channel information, depth and width of architecture, and multi-path information processing have gained substantial attention. Similarly, the idea of using a block of layers as a structural unit is also gaining popularity. This survey thus focuses on the intrinsic taxonomy present in the recently reported deep CNN architectures and, consequently, classifies the recent innovations in CNN architectures into seven different categories. These seven categories are based on spatial exploitation, depth, multi-path, width, feature-map exploitation, channel boosting, and attention. Additionally, the elementary understanding of CNN components, current challenges, and applications of CNN are also provided.Comment: Number of Pages: 70, Number of Figures: 11, Number of Tables: 11. Artif Intell Rev (2020

arXiv.org e-Print Archive

Dynamic Filter Networks

Author: De Brabandere Bert
Jia Xu
Tuytelaars Tinne
Van Gool Luc
Publication venue
Publication date: 06/06/2016
Field of study

In a traditional convolutional layer, the learned filters stay fixed after training. In contrast, we introduce a new framework, the Dynamic Filter Network, where filters are generated dynamically conditioned on an input. We show that this architecture is a powerful one, with increased flexibility thanks to its adaptive nature, yet without an excessive increase in the number of model parameters. A wide variety of filtering operations can be learned this way, including local spatial transformations, but also others like selective (de)blurring or adaptive feature extraction. Moreover, multiple such layers can be combined, e.g. in a recurrent architecture. We demonstrate the effectiveness of the dynamic filter network on the tasks of video and stereo prediction, and reach state-of-the-art performance on the moving MNIST dataset with a much smaller model. By visualizing the learned filters, we illustrate that the network has picked up flow information by only looking at unlabelled training data. This suggests that the network can be used to pretrain networks for various supervised tasks in an unsupervised way, like optical flow and depth estimation.Comment: submitted to NIPS16; X. Jia and B. De Brabandere contributed equally to this work and are listed in alphabetical orde

arXiv.org e-Print Archive