1,052 research outputs found
MeshAdv: Adversarial Meshes for Visual Recognition
Highly expressive models such as deep neural networks (DNNs) have been widely
applied to various applications. However, recent studies show that DNNs are
vulnerable to adversarial examples, which are carefully crafted inputs aiming
to mislead the predictions. Currently, the majority of these studies have
focused on perturbation added to image pixels, while such manipulation is not
physically realistic. Some works have tried to overcome this limitation by
attaching printable 2D patches or painting patterns onto surfaces, but can be
potentially defended because 3D shape features are intact. In this paper, we
propose meshAdv to generate "adversarial 3D meshes" from objects that have rich
shape features but minimal textural variation. To manipulate the shape or
texture of the objects, we make use of a differentiable renderer to compute
accurate shading on the shape and propagate the gradient. Extensive experiments
show that the generated 3D meshes are effective in attacking both classifiers
and object detectors. We evaluate the attack under different viewpoints. In
addition, we design a pipeline to perform black-box attack on a photorealistic
renderer with unknown rendering parameters.Comment: Published in IEEE CVPR201
Neural 3D Mesh Renderer
For modeling the 3D world behind 2D images, which 3D representation is most
appropriate? A polygon mesh is a promising candidate for its compactness and
geometric properties. However, it is not straightforward to model a polygon
mesh from 2D images using neural networks because the conversion from a mesh to
an image, or rendering, involves a discrete operation called rasterization,
which prevents back-propagation. Therefore, in this work, we propose an
approximate gradient for rasterization that enables the integration of
rendering into neural networks. Using this renderer, we perform single-image 3D
mesh reconstruction with silhouette image supervision and our system
outperforms the existing voxel-based approach. Additionally, we perform
gradient-based 3D mesh editing operations, such as 2D-to-3D style transfer and
3D DeepDream, with 2D supervision for the first time. These applications
demonstrate the potential of the integration of a mesh renderer into neural
networks and the effectiveness of our proposed renderer
Unsupervised Training for 3D Morphable Model Regression
We present a method for training a regression network from image pixels to 3D
morphable model coordinates using only unlabeled photographs. The training loss
is based on features from a facial recognition network, computed on-the-fly by
rendering the predicted faces with a differentiable renderer. To make training
from features feasible and avoid network fooling effects, we introduce three
objectives: a batch distribution loss that encourages the output distribution
to match the distribution of the morphable model, a loopback loss that ensures
the network can correctly reinterpret its own output, and a multi-view identity
loss that compares the features of the predicted 3D face and the input
photograph from multiple viewing angles. We train a regression network using
these objectives, a set of unlabeled photographs, and the morphable model
itself, and demonstrate state-of-the-art results.Comment: CVPR 2018 version with supplemental material
(http://openaccess.thecvf.com/content_cvpr_2018/html/Genova_Unsupervised_Training_for_CVPR_2018_paper.html
Self-Supervised Intrinsic Image Decomposition
Intrinsic decomposition from a single image is a highly challenging task, due
to its inherent ambiguity and the scarcity of training data. In contrast to
traditional fully supervised learning approaches, in this paper we propose
learning intrinsic image decomposition by explaining the input image. Our
model, the Rendered Intrinsics Network (RIN), joins together an image
decomposition pipeline, which predicts reflectance, shape, and lighting
conditions given a single image, with a recombination function, a learned
shading model used to recompose the original input based off of intrinsic image
predictions. Our network can then use unsupervised reconstruction error as an
additional signal to improve its intermediate representations. This allows
large-scale unlabeled data to be useful during training, and also enables
transferring learned knowledge to images of unseen object categories, lighting
conditions, and shapes. Extensive experiments demonstrate that our method
performs well on both intrinsic image decomposition and knowledge transfer.Comment: NIPS 2017 camera-ready version, project page:
http://rin.csail.mit.edu
FML: Face Model Learning from Videos
Monocular image-based 3D reconstruction of faces is a long-standing problem
in computer vision. Since image data is a 2D projection of a 3D face, the
resulting depth ambiguity makes the problem ill-posed. Most existing methods
rely on data-driven priors that are built from limited 3D face scans. In
contrast, we propose multi-frame video-based self-supervised training of a deep
network that (i) learns a face identity model both in shape and appearance
while (ii) jointly learning to reconstruct 3D faces. Our face model is learned
using only corpora of in-the-wild video clips collected from the Internet. This
virtually endless source of training data enables learning of a highly general
3D face model. In order to achieve this, we propose a novel multi-frame
consistency loss that ensures consistent shape and appearance across multiple
frames of a subject's face, thus minimizing depth ambiguity. At test time we
can use an arbitrary number of frames, so that we can perform both monocular as
well as multi-frame reconstruction.Comment: CVPR 2019 (Oral). Video: https://www.youtube.com/watch?v=SG2BwxCw0lQ,
Project Page: https://gvv.mpi-inf.mpg.de/projects/FML19
MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction
In this work we propose a novel model-based deep convolutional autoencoder
that addresses the highly challenging problem of reconstructing a 3D human face
from a single in-the-wild color image. To this end, we combine a convolutional
encoder network with an expert-designed generative model that serves as
decoder. The core innovation is our new differentiable parametric decoder that
encapsulates image formation analytically based on a generative model. Our
decoder takes as input a code vector with exactly defined semantic meaning that
encodes detailed face pose, shape, expression, skin reflectance and scene
illumination. Due to this new way of combining CNN-based with model-based face
reconstruction, the CNN-based encoder learns to extract semantically meaningful
parameters from a single monocular input image. For the first time, a CNN
encoder and an expert-designed generative model can be trained end-to-end in an
unsupervised manner, which renders training on very large (unlabeled) real
world data feasible. The obtained reconstructions compare favorably to current
state-of-the-art approaches in terms of quality and richness of representation.Comment: International Conference on Computer Vision (ICCV) 2017 (Oral), 13
page
- …