12 research outputs found
DDSL: Deep Differentiable Simplex Layer for Learning Geometric Signals
We present a Deep Differentiable Simplex Layer (DDSL) for neural networks for
geometric deep learning. The DDSL is a differentiable layer compatible with
deep neural networks for bridging simplex mesh-based geometry representations
(point clouds, line mesh, triangular mesh, tetrahedral mesh) with raster images
(e.g., 2D/3D grids). The DDSL uses Non-Uniform Fourier Transform (NUFT) to
perform differentiable, efficient, anti-aliased rasterization of simplex-based
signals. We present a complete theoretical framework for the process as well as
an efficient backpropagation algorithm. Compared to previous differentiable
renderers and rasterizers, the DDSL generalizes to arbitrary simplex degrees
and dimensions. In particular, we explore its applications to 2D shapes and
illustrate two applications of this method: (1) mesh editing and optimization
guided by neural network outputs, and (2) using DDSL for a differentiable
rasterization loss to facilitate end-to-end training of polygon generators. We
are able to validate the effectiveness of gradient-based shape optimization
with the example of airfoil optimization, and using the differentiable
rasterization loss to facilitate end-to-end training, we surpass state of the
art for polygonal image segmentation given ground-truth bounding boxes
MeshfreeFlowNet: A Physics-Constrained Deep Continuous Space-Time Super-Resolution Framework
We propose MeshfreeFlowNet, a novel deep learning-based super-resolution
framework to generate continuous (grid-free) spatio-temporal solutions from the
low-resolution inputs. While being computationally efficient, MeshfreeFlowNet
accurately recovers the fine-scale quantities of interest. MeshfreeFlowNet
allows for: (i) the output to be sampled at all spatio-temporal resolutions,
(ii) a set of Partial Differential Equation (PDE) constraints to be imposed,
and (iii) training on fixed-size inputs on arbitrarily sized spatio-temporal
domains owing to its fully convolutional encoder. We empirically study the
performance of MeshfreeFlowNet on the task of super-resolution of turbulent
flows in the Rayleigh-Benard convection problem. Across a diverse set of
evaluation metrics, we show that MeshfreeFlowNet significantly outperforms
existing baselines. Furthermore, we provide a large scale implementation of
MeshfreeFlowNet and show that it efficiently scales across large clusters,
achieving 96.80% scaling efficiency on up to 128 GPUs and a training time of
less than 4 minutes.Comment: Supplementary Video: https://youtu.be/mjqwPch9gDo. Accepted to SC2
OpenScene: 3D Scene Understanding with Open Vocabularies
Traditional 3D scene understanding approaches rely on labeled 3D datasets to
train a model for a single task with supervision. We propose OpenScene, an
alternative approach where a model predicts dense features for 3D scene points
that are co-embedded with text and image pixels in CLIP feature space. This
zero-shot approach enables task-agnostic training and open-vocabulary queries.
For example, to perform SOTA zero-shot 3D semantic segmentation it first infers
CLIP features for every 3D point and later classifies them based on
similarities to embeddings of arbitrary class labels. More interestingly, it
enables a suite of open-vocabulary scene understanding applications that have
never been done before. For example, it allows a user to enter an arbitrary
text query and then see a heat map indicating which parts of a scene match. Our
approach is effective at identifying objects, materials, affordances,
activities, and room types in complex 3D scenes, all using a single model
trained without any labeled 3D data.Comment: CVPR 2023. Project page: https://pengsongyou.github.io/openscen