13,915 research outputs found
Mitigating Location Privacy Attacks on Mobile Devices using Dynamic App Sandboxing
We present the design, implementation and evaluation of a system, called
MATRIX, developed to protect the privacy of mobile device users from location
inference and sensor side-channel attacks. MATRIX gives users control and
visibility over location and sensor (e.g., Accelerometers and Gyroscopes)
accesses by mobile apps. It implements a PrivoScope service that audits all
location and sensor accesses by apps on the device and generates real-time
notifications and graphs for visualizing these accesses; and a Synthetic
Location service to enable users to provide obfuscated or synthetic location
trajectories or sensor traces to apps they find useful, but do not trust with
their private information. The services are designed to be extensible and easy
for users, hiding all of the underlying complexity from them. MATRIX also
implements a Location Provider component that generates realistic
privacy-preserving synthetic identities and trajectories for users by
incorporating traffic information using historical data from Google Maps
Directions API, and accelerations using statistical information from user
driving experiments. The random traffic patterns are generated by
modeling/solving user schedule using a randomized linear program and
modeling/solving for user driving behavior using a quadratic program. We
extensively evaluated MATRIX using user studies, popular location-driven apps
and machine learning techniques, and demonstrate that it is portable to most
Android devices globally, is reliable, has low-overhead, and generates
synthetic trajectories that are difficult to differentiate from real mobility
trajectories by an adversary
Learning Bidirectional LSTM Networks for Synthesizing 3D Mesh Animation Sequences
In this paper, we present a novel method for learning to synthesize 3D mesh
animation sequences with long short-term memory (LSTM) blocks and mesh-based
convolutional neural networks (CNNs). Synthesizing realistic 3D mesh animation
sequences is a challenging and important task in computer animation. To achieve
this, researchers have long been focusing on shape analysis to develop new
interpolation and extrapolation techniques. However, such techniques have
limited learning capabilities and therefore can produce unrealistic animation.
Deep architectures that operate directly on mesh sequences remain unexplored,
due to the following major barriers: meshes with irregular triangles, sequences
containing rich temporal information and flexible deformations. To address
these, we utilize convolutional neural networks defined on triangular meshes
along with a shape deformation representation to extract useful features,
followed by LSTM cells that iteratively process the features. To allow
completion of a missing mesh sequence from given endpoints, we propose a new
weight-shared bidirectional structure. The bidirectional generation loss also
helps mitigate error accumulation over iterations. Benefiting from all these
technical advances, our approach outperforms existing methods in sequence
prediction and completion both qualitatively and quantitatively. Moreover, this
network can also generate follow-up frames conditioned on initial shapes and
improve the accuracy as more bootstrap models are provided, which other works
in the geometry processing domain cannot achieve
Expecting the Unexpected: Training Detectors for Unusual Pedestrians with Adversarial Imposters
As autonomous vehicles become an every-day reality, high-accuracy pedestrian
detection is of paramount practical importance. Pedestrian detection is a
highly researched topic with mature methods, but most datasets focus on common
scenes of people engaged in typical walking poses on sidewalks. But performance
is most crucial for dangerous scenarios, such as children playing in the street
or people using bicycles/skateboards in unexpected ways. Such "in-the-tail"
data is notoriously hard to observe, making both training and testing
difficult. To analyze this problem, we have collected a novel annotated dataset
of dangerous scenarios called the Precarious Pedestrian dataset. Even given a
dedicated collection effort, it is relatively small by contemporary standards
(around 1000 images). To allow for large-scale data-driven learning, we explore
the use of synthetic data generated by a game engine. A significant challenge
is selected the right "priors" or parameters for synthesis: we would like
realistic data with poses and object configurations that mimic true Precarious
Pedestrians. Inspired by Generative Adversarial Networks (GANs), we generate a
massive amount of synthetic data and train a discriminative classifier to
select a realistic subset, which we deem the Adversarial Imposters. We
demonstrate that this simple pipeline allows one to synthesize realistic
training data by making use of rendering/animation engines within a GAN
framework. Interestingly, we also demonstrate that such data can be used to
rank algorithms, suggesting that Adversarial Imposters can also be used for
"in-the-tail" validation at test-time, a notoriously difficult challenge for
real-world deployment.Comment: To appear in CVPR 201
Data augmentation using learned transformations for one-shot medical image segmentation
Image segmentation is an important task in many medical applications. Methods
based on convolutional neural networks attain state-of-the-art accuracy;
however, they typically rely on supervised training with large labeled
datasets. Labeling medical images requires significant expertise and time, and
typical hand-tuned approaches for data augmentation fail to capture the complex
variations in such images.
We present an automated data augmentation method for synthesizing labeled
medical images. We demonstrate our method on the task of segmenting magnetic
resonance imaging (MRI) brain scans. Our method requires only a single
segmented scan, and leverages other unlabeled scans in a semi-supervised
approach. We learn a model of transformations from the images, and use the
model along with the labeled example to synthesize additional labeled examples.
Each transformation is comprised of a spatial deformation field and an
intensity change, enabling the synthesis of complex effects such as variations
in anatomy and image acquisition procedures. We show that training a supervised
segmenter with these new examples provides significant improvements over
state-of-the-art methods for one-shot biomedical image segmentation. Our code
is available at https://github.com/xamyzhao/brainstorm.Comment: 9 pages, CVPR 201
Dynamic Facial Expression Generation on Hilbert Hypersphere with Conditional Wasserstein Generative Adversarial Nets
In this work, we propose a novel approach for generating videos of the six
basic facial expressions given a neutral face image. We propose to exploit the
face geometry by modeling the facial landmarks motion as curves encoded as
points on a hypersphere. By proposing a conditional version of manifold-valued
Wasserstein generative adversarial network (GAN) for motion generation on the
hypersphere, we learn the distribution of facial expression dynamics of
different classes, from which we synthesize new facial expression motions. The
resulting motions can be transformed to sequences of landmarks and then to
images sequences by editing the texture information using another conditional
Generative Adversarial Network. To the best of our knowledge, this is the first
work that explores manifold-valued representations with GAN to address the
problem of dynamic facial expression generation. We evaluate our proposed
approach both quantitatively and qualitatively on two public datasets;
Oulu-CASIA and MUG Facial Expression. Our experimental results demonstrate the
effectiveness of our approach in generating realistic videos with continuous
motion, realistic appearance and identity preservation. We also show the
efficiency of our framework for dynamic facial expressions generation, dynamic
facial expression transfer and data augmentation for training improved emotion
recognition models
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
Photorealistic Facial Expression Synthesis by the Conditional Difference Adversarial Autoencoder
Photorealistic facial expression synthesis from single face image can be
widely applied to face recognition, data augmentation for emotion recognition
or entertainment. This problem is challenging, in part due to a paucity of
labeled facial expression data, making it difficult for algorithms to
disambiguate changes due to identity and changes due to expression. In this
paper, we propose the conditional difference adversarial autoencoder, CDAAE,
for facial expression synthesis. The CDAAE takes a facial image of a previously
unseen person and generates an image of that human face with a target emotion
or facial action unit label. The CDAAE adds a feedforward path to an
autoencoder structure connecting low level features at the encoder to features
at the corresponding level at the decoder. It handles the problem of
disambiguating changes due to identity and changes due to facial expression by
learning to generate the difference between low-level features of images of the
same person but with different facial expressions. The CDAAE structure can be
used to generate novel expressions by combining and interpolating between
facial expressions/action units within the training set. Our experimental
results demonstrate that the CDAAE can preserve identity information when
generating facial expression for unseen subjects more faithfully than previous
approaches. This is especially advantageous when training with small databases.Comment: Accepted by ACII201
To Create What You Tell: Generating Videos from Captions
We are creating multimedia contents everyday and everywhere. While automatic
content generation has played a fundamental challenge to multimedia community
for decades, recent advances of deep learning have made this problem feasible.
For example, the Generative Adversarial Networks (GANs) is a rewarding approach
to synthesize images. Nevertheless, it is not trivial when capitalizing on GANs
to generate videos. The difficulty originates from the intrinsic structure
where a video is a sequence of visually coherent and semantically dependent
frames. This motivates us to explore semantic and temporal coherence in
designing GANs to generate videos. In this paper, we present a novel Temporal
GANs conditioning on Captions, namely TGANs-C, in which the input to the
generator network is a concatenation of a latent noise vector and caption
embedding, and then is transformed into a frame sequence with 3D
spatio-temporal convolutions. Unlike the naive discriminator which only judges
pairs as fake or real, our discriminator additionally notes whether the video
matches the correct caption. In particular, the discriminator network consists
of three discriminators: video discriminator classifying realistic videos from
generated ones and optimizes video-caption matching, frame discriminator
discriminating between real and fake frames and aligning frames with the
conditioning caption, and motion discriminator emphasizing the philosophy that
the adjacent frames in the generated videos should be smoothly connected as in
real ones. We qualitatively demonstrate the capability of our TGANs-C to
generate plausible videos conditioning on the given captions on two synthetic
datasets (SBMG and TBMG) and one real-world dataset (MSVD). Moreover,
quantitative experiments on MSVD are performed to validate our proposal via
Generative Adversarial Metric and human study.Comment: ACM MM 2017 Brave New Ide
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
Learning Continuous Face Age Progression: A Pyramid of GANs
The two underlying requirements of face age progression, i.e. aging accuracy
and identity permanence, are not well studied in the literature. This paper
presents a novel generative adversarial network based approach to address the
issues in a coupled manner. It separately models the constraints for the
intrinsic subject-specific characteristics and the age-specific facial changes
with respect to the elapsed time, ensuring that the generated faces present
desired aging effects while simultaneously keeping personalized properties
stable. To ensure photo-realistic facial details, high-level age-specific
features conveyed by the synthesized face are estimated by a pyramidal
adversarial discriminator at multiple scales, which simulates the aging effects
with finer details. Further, an adversarial learning scheme is introduced to
simultaneously train a single generator and multiple parallel discriminators,
resulting in smooth continuous face aging sequences. The proposed method is
applicable even in the presence of variations in pose, expression, makeup,
etc., achieving remarkably vivid aging effects. Quantitative evaluations by a
COTS face recognition system demonstrate that the target age distributions are
accurately recovered, and 99.88% and 99.98% age progressed faces can be
correctly verified at 0.001% FAR after age transformations of approximately 28
and 23 years elapsed time on the MORPH and CACD databases, respectively. Both
visual and quantitative assessments show that the approach advances the
state-of-the-art.Comment: arXiv admin note: substantial text overlap with arXiv:1711.1035
- …