Search CORE

997 research outputs found

The Devil is in the Decoder: Classification, Regression and GANs

Author: Chen Liang-Chieh
Fathi Alireza
Ferrari Vittorio
Guadarrama Sergio
Silberman Nathan
Uijlings Jasper
Wojna Zbigniew
Publication venue
Publication date: 19/02/2019
Field of study

Many machine vision applications, such as semantic segmentation and depth prediction, require predictions for every pixel of the input image. Models for such problems usually consist of encoders which decrease spatial resolution while learning a high-dimensional representation, followed by decoders who recover the original input resolution and result in low-dimensional predictions. While encoders have been studied rigorously, relatively few studies address the decoder side. This paper presents an extensive comparison of a variety of decoders for a variety of pixel-wise tasks ranging from classification, regression to synthesis. Our contributions are: (1) Decoders matter: we observe significant variance in results between different types of decoders on various problems. (2) We introduce new residual-like connections for decoders. (3) We introduce a novel decoder: bilinear additive upsampling. (4) We explore prediction artifacts

arXiv.org e-Print Archive

UCL Discovery

Discriminative Block-Diagonal Representation Learning for Image Recognition

Author: Shao Ling
Xu Yong
Yang Jian
Zhang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/07/2017
Field of study

Existing block-diagonal representation studies mainly focuses on casting block-diagonal regularization on training data, while only little attention is dedicated to concurrently learning both block-diagonal representations of training and test data. In this paper, we propose a discriminative block-diagonal low-rank representation (BDLRR) method for recognition. In particular, the elaborate BDLRR is formulated as a joint optimization problem of shrinking the unfavorable representation from off-block-diagonal elements and strengthening the compact block-diagonal representation under the semisupervised framework of LRR. To this end, we first impose penalty constraints on the negative representation to eliminate the correlation between different classes such that the incoherence criterion of the extra-class representation is boosted. Moreover, a constructed subspace model is developed to enhance the self-expressive power of training samples and further build the representation bridge between the training and test samples, such that the coherence of the learned intraclass representation is consistently heightened. Finally, the resulting optimization problem is solved elegantly by employing an alternative optimization strategy, and a simple recognition algorithm on the learned representation is utilized for final prediction. Extensive experimental results demonstrate that the proposed method achieves superb recognition results on four face image data sets, three character data sets, and the 15 scene multicategories data set. It not only shows superior potential on image recognition but also outperforms the state-of-the-art methods

arXiv.org e-Print Archive

Crossref

University of East Anglia digital repository

Multi Scale Identity-Preserving Image-to-Image Translation Network for Low-Resolution Face Recognition

Author: Bayat Nicky
Khazaie Vahid Reza
Mohsenzadeh Yalda
Publication venue
Publication date: 30/10/2021
Field of study

State-of-the-art deep neural network models have reached near perfect face recognition accuracy rates on controlled high-resolution face images. However, their performance is drastically degraded when they are tested with very low-resolution face images. This is particularly critical in surveillance systems, where a low-resolution probe image is to be matched with high-resolution gallery images. super-resolution techniques aim at producing high-resolution face images from low-resolution counterparts. While they are capable of reconstructing images that are visually appealing, the identity-related information is not preserved. Here, we propose an identity-preserving end-to-end image-to-image translation deep neural network which is capable of super-resolving very low-resolution faces to their high-resolution counterparts while preserving identity-related information. We achieved this by training a very deep convolutional encoder-decoder network with a symmetric contracting path between corresponding layers. This network was trained with a combination of a reconstruction and an identity-preserving loss, on multi-scale low-resolution conditions. Extensive quantitative evaluations of our proposed model demonstrated that it outperforms competing super-resolution and low-resolution face recognition methods on natural and artificial low-resolution face data sets and even unseen identities

arXiv.org e-Print Archive

Discriminative Elastic-Net Regularized Linear Regression

Author: Lai Zhihui
Shao Ling
Wu Jian
Xie Guo-Sen
Xu Yong
Zhang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

In this paper, we aim at learning compact and discriminative linear regression models. Linear regression has been widely used in different problems. However, most of the existing linear regression methods exploit the conventional zeroone matrix as the regression targets, which greatly narrows the flexibility of the regression model. Another major limitation of theses methods is that the learned projection matrix fails to precisely project the image features to the target space due to their weak discriminative capability. To this end, we present an elastic-net regularized linear regression (ENLR) framework, and develop two robust linear regression models which possess the following special characteristics. First, our methods exploit two particular strategies to enlarge the margins of different classes by relaxing the strict binary targets into a more feasible variable matrix. Second, a robust elastic-net regularization of singular values is introduced to enhance the compactness and effectiveness of the learned projection matrix. Third, the resulting optimization problem of ENLR has a closed-form solution in each iteration, which can be solved efficiently. Finally, rather than directly exploiting the projection matrix for recognition, our methods employ the transformed features as the new discriminate representations to make final image classification. Compared with the traditional linear regression model and some of its variants, our method is much more accurate in image classification. Extensive experiments conducted on publicly available datasets well demonstrate that the proposed framework can outperform the state-of-the-art methods. The MATLAB codes of our methods can be available at http://www.yongxu.org/lunwen.html

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

PolyU Institutional Repository

University of East Anglia digital repository