158 research outputs found
Multi-View Data Generation Without View Supervision
The development of high-dimensional generative models has recently gained a
great surge of interest with the introduction of variational auto-encoders and
generative adversarial neural networks. Different variants have been proposed
where the underlying latent space is structured, for example, based on
attributes describing the data to generate. We focus on a particular problem
where one aims at generating samples corresponding to a number of objects under
various views. We assume that the distribution of the data is driven by two
independent latent factors: the content, which represents the intrinsic
features of an object, and the view, which stands for the settings of a
particular observation of that object. Therefore, we propose a generative model
and a conditional variant built on such a disentangled latent space. This
approach allows us to generate realistic samples corresponding to various
objects in a high variety of views. Unlike many multi-view approaches, our
model doesn't need any supervision on the views but only on the content.
Compared to other conditional generation approaches that are mostly based on
binary or categorical attributes, we make no such assumption about the factors
of variations. Our model can be used on problems with a huge, potentially
infinite, number of categories. We experiment it on four image datasets on
which we demonstrate the effectiveness of the model and its ability to
generalize.Comment: Published as a conference paper at ICLR 201
Multi-view Generative Adversarial Networks
International audienceLearning over multi-view data is a challenging problem with strong practical applications. Most related studies focus on the classification point of view and assume that all the views are available at any time. We consider an extension of this framework in two directions. First, based on the BiGAN model, the Multi-view BiGAN (MV-BiGAN) is able to perform density estimation from multi-view inputs. Second, it can deal with missing views and is able to update its prediction when additional views are provided. We illustrate these properties on a set of experiments over different datasets
Transferring Style in Motion Capture Sequences with Adversarial Learning
International audienceWe focus on style transfer for sequential data in a supervised setting. Assuming sequential data include both content and style information we want to learn models able to transform a sequence into another one with the same content information but with the style of another one, from a training dataset where content and style labels are available. Following works on image generation and edition with adversarial learning we explore the design of neural network architectures for the task of sequence edition that we apply to motion capture sequences
OCTET: Object-aware Counterfactual Explanations
Nowadays, deep vision models are being widely deployed in safety-critical
applications, e.g., autonomous driving, and explainability of such models is
becoming a pressing concern. Among explanation methods, counterfactual
explanations aim to find minimal and interpretable changes to the input image
that would also change the output of the model to be explained. Such
explanations point end-users at the main factors that impact the decision of
the model. However, previous methods struggle to explain decision models
trained on images with many objects, e.g., urban scenes, which are more
difficult to work with but also arguably more critical to explain. In this
work, we propose to tackle this issue with an object-centric framework for
counterfactual explanation generation. Our method, inspired by recent
generative modeling works, encodes the query image into a latent space that is
structured in a way to ease object-level manipulations. Doing so, it provides
the end-user with control over which search directions (e.g., spatial
displacement of objects, style modification, etc.) are to be explored during
the counterfactual generation. We conduct a set of experiments on
counterfactual explanation benchmarks for driving scenes, and we show that our
method can be adapted beyond classification, e.g., to explain semantic
segmentation models. To complete our analysis, we design and run a user study
that measures the usefulness of counterfactual explanations in understanding a
decision model. Code is available at https://github.com/valeoai/OCTET.Comment: 8 pages + references + appendi
STEEX: Steering Counterfactual Explanations with Semantics
As deep learning models are increasingly used in safety-critical
applications, explainability and trustworthiness become major concerns. For
simple images, such as low-resolution face portraits, synthesizing visual
counterfactual explanations has recently been proposed as a way to uncover the
decision mechanisms of a trained classification model. In this work, we address
the problem of producing counterfactual explanations for high-quality images
and complex scenes. Leveraging recent semantic-to-image models, we propose a
new generative counterfactual explanation framework that produces plausible and
sparse modifications which preserve the overall scene structure. Furthermore,
we introduce the concept of "region-targeted counterfactual explanations", and
a corresponding framework, where users can guide the generation of
counterfactuals by specifying a set of semantic regions of the query image the
explanation must be about. Extensive experiments are conducted on challenging
datasets including high-quality portraits (CelebAMask-HQ) and driving scenes
(BDD100k). Code is available at https://github.com/valeoai/STEEXComment: ECCV 2022 --- 14 pages + supplementar
Towards Motion Forecasting with Real-World Perception Inputs: Are End-to-End Approaches Competitive?
Motion forecasting is crucial in enabling autonomous vehicles to anticipate
the future trajectories of surrounding agents. To do so, it requires solving
mapping, detection, tracking, and then forecasting problems, in a multi-step
pipeline. In this complex system, advances in conventional forecasting methods
have been made using curated data, i.e., with the assumption of perfect maps,
detection, and tracking. This paradigm, however, ignores any errors from
upstream modules. Meanwhile, an emerging end-to-end paradigm, that tightly
integrates the perception and forecasting architectures into joint training,
promises to solve this issue. So far, however, the evaluation protocols between
the two methods were incompatible and their comparison was not possible. In
fact, and perhaps surprisingly, conventional forecasting methods are usually
not trained nor tested in real-world pipelines (e.g., with upstream detection,
tracking, and mapping modules). In this work, we aim to bring forecasting
models closer to real-world deployment. First, we propose a unified evaluation
pipeline for forecasting methods with real-world perception inputs, allowing us
to compare the performance of conventional and end-to-end methods for the first
time. Second, our in-depth study uncovers a substantial performance gap when
transitioning from curated to perception-based data. In particular, we show
that this gap (1) stems not only from differences in precision but also from
the nature of imperfect inputs provided by perception modules, and that (2) is
not trivially reduced by simply finetuning on perception outputs. Based on
extensive experiments, we provide recommendations for critical areas that
require improvement and guidance towards more robust motion forecasting in the
real world. We will release an evaluation library to benchmark models under
standardized and practical conditions.Comment: 8 pages, 4 figures, updated results, acknowledgmen
DiffHPE: Robust, Coherent 3D Human Pose Lifting with Diffusion
We present an innovative approach to 3D Human Pose Estimation (3D-HPE) by
integrating cutting-edge diffusion models, which have revolutionized diverse
fields, but are relatively unexplored in 3D-HPE. We show that diffusion models
enhance the accuracy, robustness, and coherence of human pose estimations. We
introduce DiffHPE, a novel strategy for harnessing diffusion models in 3D-HPE,
and demonstrate its ability to refine standard supervised 3D-HPE. We also show
how diffusion models lead to more robust estimations in the face of occlusions,
and improve the time-coherence and the sagittal symmetry of predictions. Using
the Human\,3.6M dataset, we illustrate the effectiveness of our approach and
its superiority over existing models, even under adverse situations where the
occlusion patterns in training do not match those in inference. Our findings
indicate that while standalone diffusion models provide commendable
performance, their accuracy is even better in combination with supervised
models, opening exciting new avenues for 3D-HPE research.Comment: Accepted to 2023 International Conference on Computer Vision Workshop
(Analysis and Modeling of Faces and Gestures
Recommended from our members
Multi-ancestry study of blood lipid levels identifies four loci interacting with physical activity.
Many genetic loci affect circulating lipid levels, but it remains unknown whether lifestyle factors, such as physical activity, modify these genetic effects. To identify lipid loci interacting with physical activity, we performed genome-wide analyses of circulating HDL cholesterol, LDL cholesterol, and triglyceride levels in up to 120,979 individuals of European, African, Asian, Hispanic, and Brazilian ancestry, with follow-up of suggestive associations in an additional 131,012 individuals. We find four loci, in/near CLASP1, LHX1, SNTA1, and CNTNAP2, that are associated with circulating lipid levels through interaction with physical activity; higher levels of physical activity enhance the HDL cholesterol-increasing effects of the CLASP1, LHX1, and SNTA1 loci and attenuate the LDL cholesterol-increasing effect of the CNTNAP2 locus. The CLASP1, LHX1, and SNTA1 regions harbor genes linked to muscle function and lipid metabolism. Our results elucidate the role of physical activity interactions in the genetic contribution to blood lipid levels
Recommended from our members
Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes.
We aggregated coding variant data for 81,412 type 2 diabetes cases and 370,832 controls of diverse ancestry, identifying 40 coding variant association signals (P < 2.2 × 10-7); of these, 16 map outside known risk-associated loci. We make two important observations. First, only five of these signals are driven by low-frequency variants: even for these, effect sizes are modest (odds ratio ≤1.29). Second, when we used large-scale genome-wide association data to fine-map the associated variants in their regional context, accounting for the global enrichment of complex trait associations in coding sequence, compelling evidence for coding variant causality was obtained for only 16 signals. At 13 others, the associated coding variants clearly represent 'false leads' with potential to generate erroneous mechanistic inference. Coding variant associations offer a direct route to biological insight for complex diseases and identification of validated therapeutic targets; however, appropriate mechanistic inference requires careful specification of their causal contribution to disease predisposition
- …