659 research outputs found
SieveNet: A Unified Framework for Robust Image-Based Virtual Try-On
Image-based virtual try-on for fashion has gained considerable attention
recently. The task requires trying on a clothing item on a target model image.
An efficient framework for this is composed of two stages: (1) warping
(transforming) the try-on cloth to align with the pose and shape of the target
model, and (2) a texture transfer module to seamlessly integrate the warped
try-on cloth onto the target model image. Existing methods suffer from
artifacts and distortions in their try-on output. In this work, we present
SieveNet, a framework for robust image-based virtual try-on. Firstly, we
introduce a multi-stage coarse-to-fine warping network to better model
fine-grained intricacies (while transforming the try-on cloth) and train it
with a novel perceptual geometric matching loss. Next, we introduce a try-on
cloth conditioned segmentation mask prior to improve the texture transfer
network. Finally, we also introduce a dueling triplet loss strategy for
training the texture translation network which further improves the quality of
the generated try-on results. We present extensive qualitative and quantitative
evaluations of each component of the proposed pipeline and show significant
performance improvements against the current state-of-the-art method.Comment: Accepted at IEEE WACV 202
A comprehensive survey on Pose-Invariant Face Recognition
© 2016 ACM. The capacity to recognize faces under varied poses is a fundamental human ability that presents a unique challenge for computer vision systems. Compared to frontal face recognition, which has been intensively studied and has gradually matured in the past few decades, Pose-Invariant Face Recognition (PIFR) remains a largely unsolved problem. However, PIFR is crucial to realizing the full potential of face recognition for real-world applications, since face recognition is intrinsically a passive biometric technology for recognizing uncooperative subjects. In this article, we discuss the inherent difficulties in PIFR and present a comprehensive review of established techniques. Existing PIFR methods can be grouped into four categories, that is, pose-robust feature extraction approaches, multiview subspace learning approaches, face synthesis approaches, and hybrid approaches. The motivations, strategies, pros/cons, and performance of representative approaches are described and compared. Moreover, promising directions for future research are discussed
DI-Net : Decomposed Implicit Garment Transfer Network for Digital Clothed 3D Human
3D virtual try-on enjoys many potential applications and hence has attracted
wide attention. However, it remains a challenging task that has not been
adequately solved. Existing 2D virtual try-on methods cannot be directly
extended to 3D since they lack the ability to perceive the depth of each pixel.
Besides, 3D virtual try-on approaches are mostly built on the fixed topological
structure and with heavy computation. To deal with these problems, we propose a
Decomposed Implicit garment transfer network (DI-Net), which can effortlessly
reconstruct a 3D human mesh with the newly try-on result and preserve the
texture from an arbitrary perspective. Specifically, DI-Net consists of two
modules: 1) A complementary warping module that warps the reference image to
have the same pose as the source image through dense correspondence learning
and sparse flow learning; 2) A geometry-aware decomposed transfer module that
decomposes the garment transfer into image layout based transfer and texture
based transfer, achieving surface and texture reconstruction by constructing
pixel-aligned implicit functions. Experimental results show the effectiveness
and superiority of our method in the 3D virtual try-on task, which can yield
more high-quality results over other existing methods
Editing faces in videos
Editing faces in movies is of interest in the special effects industry. We aim at
producing effects such as the addition of accessories interacting correctly with
the face or replacing the face of a stuntman with the face of the main actor.
The system introduced in this thesis is based on a 3D generative face model.
Using a 3D model makes it possible to edit the face in the semantic space of pose,
expression, and identity instead of pixel space, and due to its 3D nature allows
a modelling of the light interaction. In our system we first reconstruct the 3D
face, which is deforming because of expressions and speech, the lighting, and
the camera in all frames of a monocular input video. The face is then edited by
substituting expressions or identities with those of another video sequence or by
adding virtual objects into the scene. The manipulated 3D scene is rendered back
into the original video, correctly simulating the interaction of the light with the
deformed face and virtual objects.
We describe all steps necessary to build and apply the system. This includes
registration of training faces to learn a generative face model, semi-automatic
annotation of the input video, fitting of the face model to the input video, editing
of the fit, and rendering of the resulting scene.
While describing the application we introduce a host of new methods, each
of which is of interest on its own. We start with a new method to register 3D
face scans to use as training data for the face model. For video preprocessing a
new interest point tracking and 2D Active Appearance Model fitting technique
is proposed. For robust fitting we introduce background modelling, model-based
stereo techniques, and a more accurate light model
Multimodal Three Dimensional Scene Reconstruction, The Gaussian Fields Framework
The focus of this research is on building 3D representations of real world scenes and objects using different imaging sensors. Primarily range acquisition devices (such as laser scanners and stereo systems) that allow the recovery of 3D geometry, and multi-spectral image sequences including visual and thermal IR images that provide additional scene characteristics. The crucial technical challenge that we addressed is the automatic point-sets registration task. In this context our main contribution is the development of an optimization-based method at the core of which lies a unified criterion that solves simultaneously for the dense point correspondence and transformation recovery problems. The new criterion has a straightforward expression in terms of the datasets and the alignment parameters and was used primarily for 3D rigid registration of point-sets. However it proved also useful for feature-based multimodal image alignment. We derived our method from simple Boolean matching principles by approximation and relaxation. One of the main advantages of the proposed approach, as compared to the widely used class of Iterative Closest Point (ICP) algorithms, is convexity in the neighborhood of the registration parameters and continuous differentiability, allowing for the use of standard gradient-based optimization techniques. Physically the criterion is interpreted in terms of a Gaussian Force Field exerted by one point-set on the other. Such formulation proved useful for controlling and increasing the region of convergence, and hence allowing for more autonomy in correspondence tasks. Furthermore, the criterion can be computed with linear complexity using recently developed Fast Gauss Transform numerical techniques. In addition, we also introduced a new local feature descriptor that was derived from visual saliency principles and which enhanced significantly the performance of the registration algorithm. The resulting technique was subjected to a thorough experimental analysis that highlighted its strength and showed its limitations. Our current applications are in the field of 3D modeling for inspection, surveillance, and biometrics. However, since this matching framework can be applied to any type of data, that can be represented as N-dimensional point-sets, the scope of the method is shown to reach many more pattern analysis applications
- …