5,026 research outputs found
The Devil is in the Decoder: Classification, Regression and GANs
Many machine vision applications, such as semantic segmentation and depth
prediction, require predictions for every pixel of the input image. Models for
such problems usually consist of encoders which decrease spatial resolution
while learning a high-dimensional representation, followed by decoders who
recover the original input resolution and result in low-dimensional
predictions. While encoders have been studied rigorously, relatively few
studies address the decoder side. This paper presents an extensive comparison
of a variety of decoders for a variety of pixel-wise tasks ranging from
classification, regression to synthesis. Our contributions are: (1) Decoders
matter: we observe significant variance in results between different types of
decoders on various problems. (2) We introduce new residual-like connections
for decoders. (3) We introduce a novel decoder: bilinear additive upsampling.
(4) We explore prediction artifacts
DeepSketch2Face: A Deep Learning Based Sketching System for 3D Face and Caricature Modeling
Face modeling has been paid much attention in the field of visual computing.
There exist many scenarios, including cartoon characters, avatars for social
media, 3D face caricatures as well as face-related art and design, where
low-cost interactive face modeling is a popular approach especially among
amateur users. In this paper, we propose a deep learning based sketching system
for 3D face and caricature modeling. This system has a labor-efficient
sketching interface, that allows the user to draw freehand imprecise yet
expressive 2D lines representing the contours of facial features. A novel CNN
based deep regression network is designed for inferring 3D face models from 2D
sketches. Our network fuses both CNN and shape based features of the input
sketch, and has two independent branches of fully connected layers generating
independent subsets of coefficients for a bilinear face representation. Our
system also supports gesture based interactions for users to further manipulate
initial face models. Both user studies and numerical results indicate that our
sketching system can help users create face models quickly and effectively. A
significantly expanded face database with diverse identities, expressions and
levels of exaggeration is constructed to promote further research and
evaluation of face modeling techniques.Comment: 12 pages, 16 figures, to appear in SIGGRAPH 201
Compact Bilinear Pooling
Bilinear models has been shown to achieve impressive performance on a wide
range of visual tasks, such as semantic segmentation, fine grained recognition
and face recognition. However, bilinear features are high dimensional,
typically on the order of hundreds of thousands to a few million, which makes
them impractical for subsequent analysis. We propose two compact bilinear
representations with the same discriminative power as the full bilinear
representation but with only a few thousand dimensions. Our compact
representations allow back-propagation of classification errors enabling an
end-to-end optimization of the visual recognition system. The compact bilinear
representations are derived through a novel kernelized analysis of bilinear
pooling which provide insights into the discriminative power of bilinear
pooling, and a platform for further research in compact pooling methods.
Experimentation illustrate the utility of the proposed representations for
image classification and few-shot learning across several datasets.Comment: Camera ready version for CVP
One-to-many face recognition with bilinear CNNs
The recent explosive growth in convolutional neural network (CNN) research
has produced a variety of new architectures for deep learning. One intriguing
new architecture is the bilinear CNN (B-CNN), which has shown dramatic
performance gains on certain fine-grained recognition problems [15]. We apply
this new CNN to the challenging new face recognition benchmark, the IARPA Janus
Benchmark A (IJB-A) [12]. It features faces from a large number of identities
in challenging real-world conditions. Because the face images were not
identified automatically using a computerized face detection system, it does
not have the bias inherent in such a database. We demonstrate the performance
of the B-CNN model beginning from an AlexNet-style network pre-trained on
ImageNet. We then show results for fine-tuning using a moderate-sized and
public external database, FaceScrub [17]. We also present results with
additional fine-tuning on the limited training data provided by the protocol.
In each case, the fine-tuned bilinear model shows substantial improvements over
the standard CNN. Finally, we demonstrate how a standard CNN pre-trained on a
large face database, the recently released VGG-Face model [20], can be
converted into a B-CNN without any additional feature training. This B-CNN
improves upon the CNN performance on the IJB-A benchmark, achieving 89.5%
rank-1 recall.Comment: Published version at WACV 201
Counting with Focus for Free
This paper aims to count arbitrary objects in images. The leading counting
approaches start from point annotations per object from which they construct
density maps. Then, their training objective transforms input images to density
maps through deep convolutional networks. We posit that the point annotations
serve more supervision purposes than just constructing density maps. We
introduce ways to repurpose the points for free. First, we propose supervised
focus from segmentation, where points are converted into binary maps. The
binary maps are combined with a network branch and accompanying loss function
to focus on areas of interest. Second, we propose supervised focus from global
density, where the ratio of point annotations to image pixels is used in
another branch to regularize the overall density estimation. To assist both the
density estimation and the focus from segmentation, we also introduce an
improved kernel size estimator for the point annotations. Experiments on six
datasets show that all our contributions reduce the counting error, regardless
of the base network, resulting in state-of-the-art accuracy using only a single
network. Finally, we are the first to count on WIDER FACE, allowing us to show
the benefits of our approach in handling varying object scales and crowding
levels. Code is available at
https://github.com/shizenglin/Counting-with-Focus-for-FreeComment: ICCV, 201
- …