38 research outputs found
Beyond Statistical Similarity: Rethinking Metrics for Deep Generative Models in Engineering Design
Deep generative models such as Variational Autoencoders (VAEs), Generative
Adversarial Networks (GANs), Diffusion Models, and Transformers, have shown
great promise in a variety of applications, including image and speech
synthesis, natural language processing, and drug discovery. However, when
applied to engineering design problems, evaluating the performance of these
models can be challenging, as traditional statistical metrics based on
likelihood may not fully capture the requirements of engineering applications.
This paper doubles as a review and practical guide to evaluation metrics for
deep generative models (DGMs) in engineering design. We first summarize the
well-accepted `classic' evaluation metrics for deep generative models grounded
in machine learning theory. Using case studies, we then highlight why these
metrics seldom translate well to design problems but see frequent use due to
the lack of established alternatives. Next, we curate a set of design-specific
metrics which have been proposed across different research communities and can
be used for evaluating deep generative models. These metrics focus on unique
requirements in design and engineering, such as constraint satisfaction,
functional performance, novelty, and conditioning. Throughout our discussion,
we apply the metrics to models trained on simple-to-visualize 2-dimensional
example problems. Finally, we evaluate four deep generative models on a bicycle
frame design problem and structural topology generation problem. In particular,
we showcase the use of proposed metrics to quantify performance target
achievement, design novelty, and geometric constraints. We publicly release the
code for the datasets, models, and metrics used throughout the paper at
https://decode.mit.edu/projects/metrics/
Learning from Invalid Data: On Constraint Satisfaction in Generative Models
Generative models have demonstrated impressive results in vision, language,
and speech. However, even with massive datasets, they struggle with precision,
generating physically invalid or factually incorrect data. This is particularly
problematic when the generated data must satisfy constraints, for example, to
meet product specifications in engineering design or to adhere to the laws of
physics in a natural scene. To improve precision while preserving diversity and
fidelity, we propose a novel training mechanism that leverages datasets of
constraint-violating data points, which we consider invalid. Our approach
minimizes the divergence between the generative distribution and the valid
prior while maximizing the divergence with the invalid distribution. We
demonstrate how generative models like GANs and DDPMs that we augment to train
with invalid data vastly outperform their standard counterparts which solely
train on valid data points. For example, our training procedure generates up to
98 % fewer invalid samples on 2D densities, improves connectivity and stability
four-fold on a stacking block problem, and improves constraint satisfaction by
15 % on a structural topology optimization benchmark in engineering design. We
also analyze how the quality of the invalid data affects the learning procedure
and the generalization properties of models. Finally, we demonstrate
significant improvements in sample efficiency, showing that a tenfold increase
in valid samples leads to a negligible difference in constraint satisfaction,
while less than 10 % invalid samples lead to a tenfold improvement. Our
proposed mechanism offers a promising solution for improving precision in
generative models while preserving diversity and fidelity, particularly in
domains where constraint satisfaction is critical and data is limited, such as
engineering design, robotics, and medicine
Recommended from our members
On Approximating the Entropy of Polynomial Mappings
We investigate the complexity of Polynomial Entropy Approximation (PEA): Given a low-degree polynomial mapping p : F^n-> F^m, where F is a finite field, approximate the output entropy H(p(U_n)), where U_n is the uniform distribution on F^n and H may be any of several entropy measures.
We show:
Approximating the Shannon entropy of degree 3 polynomials p : F_2^n->F_2^m over F_2 to within an additive constant (or even n^{.9}) is complete for SZKPL, the class of problems having statistical zero-knowledge proofs where the honest verifier and its simulator are computable in logarithmic space. (SZKPL contains most of the natural problems known to be in the full class SZKP.)
For prime fields F\neq F_2 and homogeneous quadratic polynomials p : F^n->F^m, there is a probabilistic polynomial-time algorithm that distinguishes the case that p(U_n) has entropy smaller than k from the case that p(U_n) has min-entropy (or even Renyi entropy) greater than (2+o(1))k.
For degree d polynomials p : F_2^n->F_2^m, there is a polynomial-time algorithm that distinguishes the case that p(U_n) has max-entropy smaller than k (where the max-entropy of a random variable is the logarithm of its support size) from the case that p(U_n) has max-entropy at least (1+o(1))k^d (for fixed d and large k).Engineering and Applied Science
3D Neural Embedding Likelihood for Robust Probabilistic Inverse Graphics
The ability to perceive and understand 3D scenes is crucial for many
applications in computer vision and robotics. Inverse graphics is an appealing
approach to 3D scene understanding that aims to infer the 3D scene structure
from 2D images. In this paper, we introduce probabilistic modeling to the
inverse graphics framework to quantify uncertainty and achieve robustness in 6D
pose estimation tasks. Specifically, we propose 3D Neural Embedding Likelihood
(3DNEL) as a unified probabilistic model over RGB-D images, and develop
efficient inference procedures on 3D scene descriptions. 3DNEL effectively
combines learned neural embeddings from RGB with depth information to improve
robustness in sim-to-real 6D object pose estimation from RGB-D images.
Performance on the YCB-Video dataset is on par with state-of-the-art yet is
much more robust in challenging regimes. In contrast to discriminative
approaches, 3DNEL's probabilistic generative formulation jointly models
multi-object scenes, quantifies uncertainty in a principled way, and handles
object pose tracking under heavy occlusion. Finally, 3DNEL provides a
principled framework for incorporating prior knowledge about the scene and
objects, which allows natural extension to additional tasks like camera pose
tracking from video