37 research outputs found
Diffusion-based Molecule Generation with Informative Prior Bridges
AI-based molecule generation provides a promising approach to a large area of
biomedical sciences and engineering, such as antibody design, hydrolase
engineering, or vaccine development. Because the molecules are governed by
physical laws, a key challenge is to incorporate prior information into the
training procedure to generate high-quality and realistic molecules. We propose
a simple and novel approach to steer the training of diffusion-based generative
models with physical and statistics prior information. This is achieved by
constructing physically informed diffusion bridges, stochastic processes that
guarantee to yield a given observation at the fixed terminal time. We develop a
Lyapunov function based method to construct and determine bridges, and propose
a number of proposals of informative prior bridges for both high-quality
molecule generation and uniformity-promoted 3D point cloud generation. With
comprehensive experiments, we show that our method provides a powerful approach
to the 3D generation task, yielding molecule structures with better quality and
stability scores and more uniformly distributed point clouds of high qualities
Residual Mixture of Experts
Mixture of Experts (MoE) is able to scale up vision transformers effectively.
However, it requires prohibiting computation resources to train a large MoE
transformer. In this paper, we propose Residual Mixture of Experts (RMoE), an
efficient training pipeline for MoE vision transformers on downstream tasks,
such as segmentation and detection. RMoE achieves comparable results with the
upper-bound MoE training, while only introducing minor additional training cost
than the lower-bound non-MoE training pipelines. The efficiency is supported by
our key observation: the weights of an MoE transformer can be factored into an
input-independent core and an input-dependent residual. Compared with the
weight core, the weight residual can be efficiently trained with much less
computation resource, e.g., finetuning on the downstream data. We show that,
compared with the current MoE training pipeline, we get comparable results
while saving over 30% training cost. When compared with state-of-the-art non-
MoE transformers, such as Swin-T / CvT-13 / Swin-L, we get +1.1 / 0.9 / 1.0
mIoU gain on ADE20K segmentation and +1.4 / 1.6 / 0.6 AP gain on MS-COCO object
detection task with less than 3% additional training cost
Neural Volumetric Mesh Generator
Deep generative models have shown success in generating 3D shapes with
different representations. In this work, we propose Neural Volumetric Mesh
Generator(NVMG) which can generate novel and high-quality volumetric meshes.
Unlike the previous 3D generative model for point cloud, voxel, and implicit
surface, the volumetric mesh representation is a ready-to-use representation in
industry with details on both the surface and interior. Generating this such
highly-structured data thus brings a significant challenge. We first propose a
diffusion-based generative model to tackle this problem by generating voxelized
shapes with close-to-reality outlines and structures. We can simply obtain a
tetrahedral mesh as a template with the voxelized shape. Further, we use a
voxel-conditional neural network to predict the smooth implicit surface
conditioned on the voxels, and progressively project the tetrahedral mesh to
the predicted surface under regularizations. The regularization terms are
carefully designed so that they can (1) get rid of the defects like flipping
and high distortion; (2) force the regularity of the interior and surface
structure during the deformation procedure for a high-quality final mesh. As
shown in the experiments, our pipeline can generate high-quality artifact-free
volumetric and surface meshes from random noise or a reference image without
any post-processing. Compared with the state-of-the-art voxel-to-mesh
deformation method, we show more robustness and better performance when taking
generated voxels as input
PathFusion: Path-consistent Lidar-Camera Deep Feature Fusion
Fusing camera with LiDAR is a promising technique to improve the accuracy of
3D detection due to the complementary physical properties. While most existing
methods focus on fusing camera features directly with raw LiDAR point clouds or
shallow 3D features, it is observed that direct deep 3D feature fusion achieves
inferior accuracy due to feature misalignment. The misalignment that originates
from the feature aggregation across large receptive fields becomes increasingly
severe for deep network stages. In this paper, we propose PathFusion to enable
path-consistent LiDAR-camera deep feature fusion. PathFusion introduces a path
consistency loss between shallow and deep features, which encourages the 2D
backbone and its fusion path to transform 2D features in a way that is
semantically aligned with the transform of the 3D backbone. We apply PathFusion
to the prior-art fusion baseline, Focals Conv, and observe more than 1.2\% mAP
improvements on the nuScenes test split consistently with and without
testing-time augmentations. Moreover, PathFusion also improves KITTI AP3D (R11)
by more than 0.6% on moderate level
Communication Efficient Distributed Training with Distributed Lion
The Lion optimizer has been a promising competitor with the AdamW for
training large AI models, with advantages on memory, computation, and sample
efficiency. In this paper, we introduce Distributed Lion, an innovative
adaptation of Lion for distributed training environments. Leveraging the sign
operator in Lion, our Distributed Lion only requires communicating binary or
lower-precision vectors between workers to the center server, significantly
reducing the communication cost. Our theoretical analysis confirms Distributed
Lion's convergence properties. Empirical results demonstrate its robustness
across a range of tasks, worker counts, and batch sizes, on both vision and
language problems. Notably, Distributed Lion attains comparable performance to
standard Lion or AdamW optimizers applied on aggregated gradients, but with
significantly reduced communication bandwidth. This feature is particularly
advantageous for training large models. In addition, we also demonstrate that
Distributed Lion presents a more favorable performance-bandwidth balance
compared to existing efficient distributed methods such as deep gradient
compression and ternary gradients.Comment: 22 page