1,612 research outputs found
GP-GAN: Gender Preserving GAN for Synthesizing Faces from Landmarks
Facial landmarks constitute the most compressed representation of faces and
are known to preserve information such as pose, gender and facial structure
present in the faces. Several works exist that attempt to perform high-level
face-related analysis tasks based on landmarks. In contrast, in this work, an
attempt is made to tackle the inverse problem of synthesizing faces from their
respective landmarks. The primary aim of this work is to demonstrate that
information preserved by landmarks (gender in particular) can be further
accentuated by leveraging generative models to synthesize corresponding faces.
Though the problem is particularly challenging due to its ill-posed nature, we
believe that successful synthesis will enable several applications such as
boosting performance of high-level face related tasks using landmark points and
performing dataset augmentation. To this end, a novel face-synthesis method
known as Gender Preserving Generative Adversarial Network (GP-GAN) that is
guided by adversarial loss, perceptual loss and a gender preserving loss is
presented. Further, we propose a novel generator sub-network UDeNet for GP-GAN
that leverages advantages of U-Net and DenseNet architectures. Extensive
experiments and comparison with recent methods are performed to verify the
effectiveness of the proposed method.Comment: 6 pages, 5 figures, this paper is accepted as 2018 24th International
Conference on Pattern Recognition (ICPR2018
Representation Learning by Learning to Count
We introduce a novel method for representation learning that uses an
artificial supervision signal based on counting visual primitives. This
supervision signal is obtained from an equivariance relation, which does not
require any manual annotation. We relate transformations of images to
transformations of the representations. More specifically, we look for the
representation that satisfies such relation rather than the transformations
that match a given representation. In this paper, we use two image
transformations in the context of counting: scaling and tiling. The first
transformation exploits the fact that the number of visual primitives should be
invariant to scale. The second transformation allows us to equate the total
number of visual primitives in each tile to that in the whole image. These two
transformations are combined in one constraint and used to train a neural
network with a contrastive loss. The proposed task produces representations
that perform on par or exceed the state of the art in transfer learning
benchmarks.Comment: ICCV 2017(oral
Semi-supervised Regression with Generative Adversarial Networks Using Minimal Labeled Data
This work studies the generalization of semi-supervised generative adversarial networks (GANs) to regression tasks. A novel feature layer contrasting optimization function, in conjunction with a feature matching optimization, allows the adversarial network to learn from unannotated data and thereby reduce the number of labels required to train a predictive network. An analysis of simulated training conditions is performed to explore the capabilities and limitations of the method. In concert with the semi-supervised regression GANs, an improved label topology and upsampling technique for multi-target regression tasks are shown to reduce data requirements. Improvements are demonstrated on a wide variety of vision tasks, including dense crowd counting, age estimation, and automotive steering angle prediction. With training data limitations arguably being the most restrictive component of deep learning, methods which reduce data requirements hold immense value. The methods proposed here are general-purpose and can be incorporated into existing network architectures with little or no modifications to the existing structure
Dual Attention GANs for Semantic Image Synthesis
In this paper, we focus on the semantic image synthesis task that aims at
transferring semantic label maps to photo-realistic images. Existing methods
lack effective semantic constraints to preserve the semantic information and
ignore the structural correlations in both spatial and channel dimensions,
leading to unsatisfactory blurry and artifact-prone results. To address these
limitations, we propose a novel Dual Attention GAN (DAGAN) to synthesize
photo-realistic and semantically-consistent images with fine details from the
input layouts without imposing extra training overhead or modifying the network
architectures of existing methods. We also propose two novel modules, i.e.,
position-wise Spatial Attention Module (SAM) and scale-wise Channel Attention
Module (CAM), to capture semantic structure attention in spatial and channel
dimensions, respectively. Specifically, SAM selectively correlates the pixels
at each position by a spatial attention map, leading to pixels with the same
semantic label being related to each other regardless of their spatial
distances. Meanwhile, CAM selectively emphasizes the scale-wise features at
each channel by a channel attention map, which integrates associated features
among all channel maps regardless of their scales. We finally sum the outputs
of SAM and CAM to further improve feature representation. Extensive experiments
on four challenging datasets show that DAGAN achieves remarkably better results
than state-of-the-art methods, while using fewer model parameters. The source
code and trained models are available at https://github.com/Ha0Tang/DAGAN.Comment: Accepted to ACM MM 2020, camera ready (9 pages) + supplementary (10
pages
An AI-Horticulture Monitoring and Prediction System with Automatic Object Counting
Estimating density maps and counting the number of objects of interest from images has a wide range of applications, such as crowd counting, traffic monitoring, cell microscopy in biomedical imaging, plant counting in agronomy, as well as environmental survey. Manual counting is a labor-intensive and time-consuming process. Over the past few years, the topic of automatic object counting by computers has been actively evolving from the classic machine learning methods based on handcrafted image features to end-to-end deep learning methods using data-driven feature engineering, for example by Convolutional Neural Networks (CNNs). In our research, we focus on the task of counting plants for large-scale nursery farms to build an AI-horticulture monitoring and prediction system using unmanned aerial vehicle (UAV) images. The common challenges of automatic object counting as other computer vision tasks are scenario difference, object occlusion, scale variation of views, non-uniform distribution, and perspective difference. For an AI-horticulture monitoring and prediction system for large-scale analysis, the plant species various a lot, so that the image features are different based on different appearance of species. In order to solve these complex problems, the deep convolutional neural network-based approaches are proposed.
Our method uses the density map as the ground truth to train the modified classic deep neural networks for object counting regression. Experiments are conducted comparing our proposed models with the state-of-the-art object counting and density estimation approaches. The results demonstrate that our proposed counting model outperforms state-of-the-art approaches by achieving the best counting performance with a mean absolute error of 1.93 and a mean square error of 2.68 on our horticulture nursery plant dataset
- …