509 research outputs found
Recommended from our members
Face Transfer with Multilinear Models
Face Transfer is a method for mapping videorecorded performances of one individual to facial animations of another. It extracts visemes (speech-related mouth articulations), expressions, and three-dimensional (3D) pose from monocular video or film footage. These parameters are then used to generate and drive a detailed 3D textured face mesh for a target identity, which can be seamlessly rendered back into target footage. The underlying face model automatically adjusts for how the target performs facial expressions and visemes. The performance data can be easily edited to change the visemes, expressions, pose, or even the identity of the target---the attributes are separably controllable. This supports a wide variety of video rewrite and puppetry applications.Face Transfer is based on a multilinear model of 3D face meshes that separably parameterizes the space of geometric variations due to different attributes (e.g., identity, expression, and viseme). Separability means that each of these attributes can be independently varied. A multilinear model can be estimated from a Cartesian product of examples (identities x expressions x visemes) with techniques from statistical analysis, but only after careful preprocessing of the geometric data set to secure one-to-one correspondence, to minimize cross-coupling artifacts, and to fill in any missing examples. Face Transfer offers new solutions to these problems and links the estimated model with a face-tracking algorithm to extract pose, expression, and viseme parameters.Engineering and Applied Science
A multilinear tongue model derived from speech related MRI data of the human vocal tract
We present a multilinear statistical model of the human tongue that captures
anatomical and tongue pose related shape variations separately. The model is
derived from 3D magnetic resonance imaging data of 11 speakers sustaining
speech related vocal tract configurations. The extraction is performed by using
a minimally supervised method that uses as basis an image segmentation approach
and a template fitting technique. Furthermore, it uses image denoising to deal
with possibly corrupt data, palate surface information reconstruction to handle
palatal tongue contacts, and a bootstrap strategy to refine the obtained
shapes. Our evaluation concludes that limiting the degrees of freedom for the
anatomical and speech related variations to 5 and 4, respectively, produces a
model that can reliably register unknown data while avoiding overfitting
effects. Furthermore, we show that it can be used to generate a plausible
tongue animation by tracking sparse motion capture data
Tensor Decompositions for Signal Processing Applications From Two-way to Multiway Component Analysis
The widespread use of multi-sensor technology and the emergence of big
datasets has highlighted the limitations of standard flat-view matrix models
and the necessity to move towards more versatile data analysis tools. We show
that higher-order tensors (i.e., multiway arrays) enable such a fundamental
paradigm shift towards models that are essentially polynomial and whose
uniqueness, unlike the matrix methods, is guaranteed under verymild and natural
conditions. Benefiting fromthe power ofmultilinear algebra as theirmathematical
backbone, data analysis techniques using tensor decompositions are shown to
have great flexibility in the choice of constraints that match data properties,
and to find more general latent components in the data than matrix-based
methods. A comprehensive introduction to tensor decompositions is provided from
a signal processing perspective, starting from the algebraic foundations, via
basic Canonical Polyadic and Tucker models, through to advanced cause-effect
and multi-view data analysis schemes. We show that tensor decompositions enable
natural generalizations of some commonly used signal processing paradigms, such
as canonical correlation and subspace techniques, signal separation, linear
regression, feature extraction and classification. We also cover computational
aspects, and point out how ideas from compressed sensing and scientific
computing may be used for addressing the otherwise unmanageable storage and
manipulation problems associated with big datasets. The concepts are supported
by illustrative real world case studies illuminating the benefits of the tensor
framework, as efficient and promising tools for modern signal processing, data
analysis and machine learning applications; these benefits also extend to
vector/matrix data through tensorization. Keywords: ICA, NMF, CPD, Tucker
decomposition, HOSVD, tensor networks, Tensor Train
Bayesian Robust Tensor Factorization for Incomplete Multiway Data
We propose a generative model for robust tensor factorization in the presence
of both missing data and outliers. The objective is to explicitly infer the
underlying low-CP-rank tensor capturing the global information and a sparse
tensor capturing the local information (also considered as outliers), thus
providing the robust predictive distribution over missing entries. The
low-CP-rank tensor is modeled by multilinear interactions between multiple
latent factors on which the column sparsity is enforced by a hierarchical
prior, while the sparse tensor is modeled by a hierarchical view of Student-
distribution that associates an individual hyperparameter with each element
independently. For model learning, we develop an efficient closed-form
variational inference under a fully Bayesian treatment, which can effectively
prevent the overfitting problem and scales linearly with data size. In contrast
to existing related works, our method can perform model selection automatically
and implicitly without need of tuning parameters. More specifically, it can
discover the groundtruth of CP rank and automatically adapt the sparsity
inducing priors to various types of outliers. In addition, the tradeoff between
the low-rank approximation and the sparse representation can be optimized in
the sense of maximum model evidence. The extensive experiments and comparisons
with many state-of-the-art algorithms on both synthetic and real-world datasets
demonstrate the superiorities of our method from several perspectives.Comment: in IEEE Transactions on Neural Networks and Learning Systems, 201
Registration and statistical analysis of the tongue shape during speech production
This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschĂ€ftigt sich mit der Analyse der menschlichen Zungenform wĂ€hrend der Sprachproduktion. ZunĂ€chst wird ein semi-ĂŒberwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schĂ€tzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spĂ€rliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio
- âŠ