288 research outputs found
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
Doctor of Philosophy
dissertationImage segmentation entails the partitioning of an image domain, usually two or three dimensions, so that each partition or segment has some meaning that is relevant to the application at hand. Accurate image segmentation is a crucial challenge in many disciplines, including medicine, computer vision, and geology. In some applications, heterogeneous pixel intensities; noisy, ill-defined, or diffusive boundaries; and irregular shapes with high variability can make it challenging to meet accuracy requirements. Various segmentation approaches tackle such challenges by casting the segmentation problem as an energy-minimization problem, and solving it using efficient optimization algorithms. These approaches are broadly classified as either region-based or edge (surface)-based depending on the features on which they operate. The focus of this dissertation is on the development of a surface-based energy model, the design of efficient formulations of optimization frameworks to incorporate such energy, and the solution of the energy-minimization problem using graph cuts. This dissertation utilizes a set of four papers whose motivation is the efficient extraction of the left atrium wall from the late gadolinium enhancement magnetic resonance imaging (LGE-MRI) image volume. This dissertation utilizes these energy formulations for other applications, including contact lens segmentation in the optical coherence tomography (OCT) data and the extraction of geologic features in seismic data. Chapters 2 through 5 (papers 1 through 4) explore building a surface-based image segmentation model by progressively adding components to improve its accuracy and robustness. The first paper defines a parametric search space and its discrete formulation in the form of a multilayer three-dimensional mesh model within which the segmentation takes place. It includes a generative intensity model, and we optimize using a graph formulation of the surface net problem. The second paper proposes a Bayesian framework with a Markov random field (MRF) prior that gives rise to another class of surface nets, which provides better segmentation with smooth boundaries. The third paper presents a maximum a posteriori (MAP)-based surface estimation framework that relies on a generative image model by incorporating global shape priors, in addition to the MRF, within the Bayesian formulation. Thus, the resulting surface not only depends on the learned model of shapes,but also accommodates the test data irregularities through smooth deviations from these priors. Further, the paper proposes a new shape parameter estimation scheme, in closed form, for segmentation as a part of the optimization process. Finally, the fourth paper (under review at the time of this document) presents an extensive analysis of the MAP framework and presents improved mesh generation and generative intensity models. It also performs a thorough analysis of the segmentation results that demonstrates the effectiveness of the proposed method qualitatively, quantitatively, and clinically. Chapter 6, consisting of unpublished work, demonstrates the application of an MRF-based Bayesian framework to segment coupled surfaces of contact lenses in optical coherence tomography images. This chapter also shows an application related to the extraction of geological structures in seismic volumes. Due to the large sizes of seismic volume datasets, we also present fast, approximate surface-based energy minimization strategies that achieve better speed-ups and memory consumption
SHOWMe: Benchmarking Object-agnostic Hand-Object 3D Reconstruction
Recent hand-object interaction datasets show limited real object variability
and rely on fitting the MANO parametric model to obtain groundtruth hand
shapes. To go beyond these limitations and spur further research, we introduce
the SHOWMe dataset which consists of 96 videos, annotated with real and
detailed hand-object 3D textured meshes. Following recent work, we consider a
rigid hand-object scenario, in which the pose of the hand with respect to the
object remains constant during the whole video sequence. This assumption allows
us to register sub-millimetre-precise groundtruth 3D scans to the image
sequences in SHOWMe. Although simpler, this hypothesis makes sense in terms of
applications where the required accuracy and level of detail is important eg.,
object hand-over in human-robot collaboration, object scanning, or manipulation
and contact point analysis. Importantly, the rigidity of the hand-object
systems allows to tackle video-based 3D reconstruction of unknown hand-held
objects using a 2-stage pipeline consisting of a rigid registration step
followed by a multi-view reconstruction (MVR) part. We carefully evaluate a set
of non-trivial baselines for these two stages and show that it is possible to
achieve promising object-agnostic 3D hand-object reconstructions employing an
SfM toolbox or a hand pose estimator to recover the rigid transforms and
off-the-shelf MVR algorithms. However, these methods remain sensitive to the
initial camera pose estimates which might be imprecise due to lack of textures
on the objects or heavy occlusions of the hands, leaving room for improvements
in the reconstruction. Code and dataset are available at
https://europe.naverlabs.com/research/showmeComment: Paper and Appendix, Accepted in ACVR workshop at ICCV conferenc
Recommended from our members
LEARNING TO RIG CHARACTERS
With the emergence of 3D virtual worlds, 3D social media, and massive online games, the need for diverse, high-quality, animation-ready characters and avatars is greater than ever. To animate characters, artists hand-craft articulation structures, such as animation skeletons and part deformers, which require significant amount of manual and laborious interaction with 2D/3D modeling interfaces. This thesis presents deep learning methods that are able to significantly automate the process of character rigging.
First, the thesis introduces RigNet, a method capable of predicting an animation skeleton for an input static 3D shape in the form of a polygon mesh. The predicted skeletons match the animator expectations in joint placement and topology. RigNet also estimates surface skin weights which determine how the mesh is animated given the different skeletal poses. In contrast to prior work that fits pre-defined skeletal templates with hand-tuned objectives, RigNet is able to automatically rig diverse characters, such as humanoids, quadrupeds, toys, birds, with varying articulation structure and geometry. RigNet is based on a deep neural architecture that directly operates on the mesh representation. The architecture is trained on a diverse dataset of rigged models that we mined online and curated. The dataset includes 2.7K polygon meshes, along with their associated skeletons and corresponding skin weights.
Second, the thesis introduces Morig, a method that automatically rigs character meshes driven by single-view point cloud streams capturing the motion of performing characters. Compared to RigNet, MoRig\u27s rigging is \emph{motion-aware}: its neural network encodes motion cues from the point clouds into compact feature representations that are informative about the articulated parts of the performing character. These motion-aware features guide the inference of an appropriate skeletal rig for the input mesh. Furthermore, Morig is able to animate the rig according to the captured point cloud motion. Morig can handle diverse characters with different morphologies (e.g., humanoids, quadrupeds, toy characters). It also accounts for occluded regions in the point clouds and mismatches in the part proportions between the input mesh and captured character.
Third, the thesis introduces APES, a method that takes as input 2D raster images depicting a small set of poses of a character shown in a sprite sheet, and identifies articulated parts useful for rigging the character. APES uses a combination of neural network inference and integer linear programming to identify a compact set of articulated body parts, e.g. head, torso and limbs, that best reconstruct the input poses. Compared to Morig and RigNet that require a large collection of training models with associated skeletons and skinning weights, APES\u27 neural architecture relies on less effortful supervision from (i) pixel correspondences readily available in existing large cartoon image datasets (e.g., Creative Flow), (ii) a relatively small dataset of 57 cartoon characters segmented into moving parts.
Finally, the thesis discusses future research directions related to combining neural rigging with 3D and 4D reconstruction of characters from point cloud data and 2D video as well as automating the process of motion synthesis for 3D characters
Statistical and Dynamical Modeling of Riemannian Trajectories with Application to Human Movement Analysis
abstract: The data explosion in the past decade is in part due to the widespread use of rich sensors that measure various physical phenomenon -- gyroscopes that measure orientation in phones and fitness devices, the Microsoft Kinect which measures depth information, etc. A typical application requires inferring the underlying physical phenomenon from data, which is done using machine learning. A fundamental assumption in training models is that the data is Euclidean, i.e. the metric is the standard Euclidean distance governed by the L-2 norm. However in many cases this assumption is violated, when the data lies on non Euclidean spaces such as Riemannian manifolds. While the underlying geometry accounts for the non-linearity, accurate analysis of human activity also requires temporal information to be taken into account. Human movement has a natural interpretation as a trajectory on the underlying feature manifold, as it evolves smoothly in time. A commonly occurring theme in many emerging problems is the need to \emph{represent, compare, and manipulate} such trajectories in a manner that respects the geometric constraints. This dissertation is a comprehensive treatise on modeling Riemannian trajectories to understand and exploit their statistical and dynamical properties. Such properties allow us to formulate novel representations for Riemannian trajectories. For example, the physical constraints on human movement are rarely considered, which results in an unnecessarily large space of features, making search, classification and other applications more complicated. Exploiting statistical properties can help us understand the \emph{true} space of such trajectories. In applications such as stroke rehabilitation where there is a need to differentiate between very similar kinds of movement, dynamical properties can be much more effective. In this regard, we propose a generalization to the Lyapunov exponent to Riemannian manifolds and show its effectiveness for human activity analysis. The theory developed in this thesis naturally leads to several benefits in areas such as data mining, compression, dimensionality reduction, classification, and regression.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
Image Segmentation using PDE, Variational, Morphological and Probabilistic Methods
The research in this dissertation has focused upon image segmentation and its related areas, using the techniques of partial differential equations, variational methods, mathematical morphological methods and probabilistic methods. An integrated segmentation method using both curve evolution and anisotropic diffusion is presented that utilizes both gradient and region information in images. A bottom-up image segmentation method is proposed to minimize the Mumford-Shah functional. Preferential image segmentation methods are presented that are based on the tree of shapes in mathematical morphologies and the Kullback-Leibler distance in information theory. A thorough evaluation of the morphological preferential image segmentation method is provided, and a web interface is described. A probabilistic model is presented that is based on particle filters for image segmentation.
These methods may be incorporated as components of an integrated image processed system. The system utilizes Internet Protocol (IP) cameras for data acquisition. It utilizes image databases to provide prior information and store image processing results. Image preprocessing, image segmentation and object recognition are integrated in one stage in the system, using various methods developed in several areas. Interactions between data acquisition, integrated image processing and image databases are handled smoothly. A framework of the integrated system is implemented using Perl, C++, MySQL and CGI.
The integrated system works for various applications such as video tracking, medical image processing and facial image processing. Experimental results on this applications are provided in the dissertation. Efficient computations such as multi-scale computing and parallel computing using graphic processors are also presented
- …