115 research outputs found
Text-Guided Molecule Generation with Diffusion Language Model
Text-guided molecule generation is a task where molecules are generated to
match specific textual descriptions. Recently, most existing SMILES-based
molecule generation methods rely on an autoregressive architecture. In this
work, we propose the Text-Guided Molecule Generation with Diffusion Language
Model (TGM-DLM), a novel approach that leverages diffusion models to address
the limitations of autoregressive methods. TGM-DLM updates token embeddings
within the SMILES string collectively and iteratively, using a two-phase
diffusion generation process. The first phase optimizes embeddings from random
noise, guided by the text description, while the second phase corrects invalid
SMILES strings to form valid molecular representations. We demonstrate that
TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for
additional data resources. Our findings underscore the remarkable effectiveness
of TGM-DLM in generating coherent and precise molecules with specific
properties, opening new avenues in drug discovery and related scientific
domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.Comment: Accepted by 38th Association for the Advancement of Artificial
Intelligence, AAA
StageInteractor: Query-based Object Detector with Cross-stage Interaction
Previous object detectors make predictions based on dense grid points or
numerous preset anchors. Most of these detectors are trained with one-to-many
label assignment strategies. On the contrary, recent query-based object
detectors depend on a sparse set of learnable queries and a series of decoder
layers. The one-to-one label assignment is independently applied on each layer
for the deep supervision during training. Despite the great success of
query-based object detection, however, this one-to-one label assignment
strategy demands the detectors to have strong fine-grained discrimination and
modeling capacity. To solve the above problems, in this paper, we propose a new
query-based object detector with cross-stage interaction, coined as
StageInteractor. During the forward propagation, we come up with an efficient
way to improve this modeling ability by reusing dynamic operators with
lightweight adapters. As for the label assignment, a cross-stage label assigner
is applied subsequent to the one-to-one label assignment. With this assigner,
the training target class labels are gathered across stages and then
reallocated to proper predictions at each decoder layer. On MS COCO benchmark,
our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50
as backbone, 100 queries and 12 training epochs. With longer training time and
300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN
and Swin-S, respectively
Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion
In this paper, we study the problem of jointly estimating the optical flow
and scene flow from synchronized 2D and 3D data. Previous methods either employ
a complex pipeline that splits the joint task into independent stages, or fuse
2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such
one-size-fits-all approaches suffer from a dilemma of failing to fully utilize
the characteristic of each modality or to maximize the inter-modality
complementarity. To address the problem, we propose a novel end-to-end
framework, which consists of 2D and 3D branches with multiple bidirectional
fusion connections between them in specific layers. Different from previous
work, we apply a point-based 3D branch to extract the LiDAR features, as it
preserves the geometric structure of point clouds. To fuse dense image features
and sparse point features, we propose a learnable operator named bidirectional
camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the
bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine
architecture (dubbed CamLiPWC), and the other one based on the recurrent
all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC
and CamLiRAFT surpass all existing methods and achieve up to a 47.9\% reduction
in 3D end-point-error from the best published result. Our best-performing
model, CamLiRAFT, achieves an error of 4.26\% on the KITTI Scene Flow
benchmark, ranking 1st among all submissions with much fewer parameters.
Besides, our methods have strong generalization performance and the ability to
handle non-rigid motion. Code is available at
https://github.com/MCG-NJU/CamLiFlow.Comment: Accepted to TPAMI 202
SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos
Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn
great attention over the past few years. Dense detectors typically follow a
two-stage pipeline by first constructing a dense BEV feature and then
performing object detection in BEV space, which suffers from complex view
transformations and high computation cost. On the other side, sparse detectors
follow a query-based paradigm without explicit dense BEV feature construction,
but achieve worse performance than the dense counterparts. In this paper, we
find that the key to mitigate this performance gap is the adaptability of the
detector in both BEV and image space. To achieve this goal, we propose
SparseBEV, a fully sparse 3D object detector that outperforms the dense
counterparts. SparseBEV contains three key designs, which are (1)
scale-adaptive self attention to aggregate features with adaptive receptive
field in BEV space, (2) adaptive spatio-temporal sampling to generate sampling
locations under the guidance of queries, and (3) adaptive mixing to decode the
sampled features with dynamic weights from the queries. On the test split of
nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS. On
the val split, SparseBEV achieves 55.8 NDS while maintaining a real-time
inference speed of 23.5 FPS. Code is available at
https://github.com/MCG-NJU/SparseBEV.Comment: Accepted to ICCV 202
Numerical investigation on the propulsive performance of flexible flapping fins using CFD/CSD method
A FSI (fluid-structure interaction) numerical simulation was performed to investigate the flow field around a flexible flapping fin using an in-house developed CFD/CSD solver. The three-dimensional fluid-structure interaction of the flapping locomotion was achieved by loosely coupling preconditioned Unsteady Reynolds-Averaged Navier-Stokes (URANS) solutions and non-linear co-rotational structural solutions. The CSD solver was developed specifically for high flexible flapping fins by considering the large geometric nonlinear characteristics. Validation of benchmark tests illustrated the high-fidelity of the developed methodology. Then effect of flexural angles, flexural amplitude and flapping frequency in terms of Strouhal number were evaluated. Results demonstrated that different flexural angles will present different flow fields, and thus significantly varied thrust generation and pressure distribution. The thrust does not increase monotonically with flexural angles. The thrust is also found to increase with increasing Strouhal number while propulsive efficiency peaks within the range of 0.2Â <St<Â 0.4, which lies in the middle of the range observed in nature. The appropriate combination of flexibility and Strouhal number illustrates higher efficiency and gives instruction for further design of flexible flapping fins
Numerical investigation on the propulsive performance of flexible flapping fins using CFD/CSD method
A FSI (fluid-structure interaction) numerical simulation was performed to investigate the flow field around a flexible flapping fin using an in-house developed CFD/CSD solver. The three-dimensional fluid-structure interaction of the flapping locomotion was achieved by loosely coupling preconditioned Unsteady Reynolds-Averaged Navier-Stokes (URANS) solutions and non-linear co-rotational structural solutions. The CSD solver was developed specifically for high flexible flapping fins by considering the large geometric nonlinear characteristics. Validation of benchmark tests illustrated the high-fidelity of the developed methodology. Then effect of flexural angles, flexural amplitude and flapping frequency in terms of Strouhal number were evaluated. Results demonstrated that different flexural angles will present different flow fields, and thus significantly varied thrust generation and pressure distribution. The thrust does not increase monotonically with flexural angles. The thrust is also found to increase with increasing Strouhal number while propulsive efficiency peaks within the range of 0.2Â <St<Â 0.4, which lies in the middle of the range observed in nature. The appropriate combination of flexibility and Strouhal number illustrates higher efficiency and gives instruction for further design of flexible flapping fins
LinK: Linear Kernel for LiDAR-based 3D Perception
Extending the success of 2D Large Kernel to 3D perception is challenging due
to: 1. the cubically-increasing overhead in processing 3D data; 2. the
optimization difficulties from data scarcity and sparsity. Previous work has
taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by
introducing block-shared weights. However, to reduce the feature variations
within a block, it only employs modest block size and fails to achieve larger
kernels like the 21x21x21. To address this issue, we propose a new method,
called LinK, to achieve a wider-range perception receptive field in a
convolution-like manner with two core designs. The first is to replace the
static kernel matrix with a linear kernel generator, which adaptively provides
weights only for non-empty voxels. The second is to reuse the pre-computed
aggregation results in the overlapped blocks to reduce computation complexity.
The proposed method successfully enables each voxel to perceive context within
a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D
object detection and 3D semantic segmentation, demonstrate the effectiveness of
our method. Notably, we rank 1st on the public leaderboard of the 3D detection
benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based
backbone into the basic detector, CenterPoint. We also boost the strong
segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is
available at https://github.com/MCG-NJU/LinK.Comment: Accepted to CVPR202
Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables
Fact checking aims to predict claim veracity by reasoning over multiple
evidence pieces. It usually involves evidence retrieval and veracity reasoning.
In this paper, we focus on the latter, reasoning over unstructured text and
structured table information. Previous works have primarily relied on
fine-tuning pretrained language models or training homogeneous-graph-based
models. Despite their effectiveness, we argue that they fail to explore the
rich semantic information underlying the evidence with different structures. To
address this, we propose a novel word-level Heterogeneous-graph-based model for
Fact Checking over unstructured and structured information, namely HeterFC. Our
approach leverages a heterogeneous evidence graph, with words as nodes and
thoughtfully designed edges representing different evidence properties. We
perform information propagation via a relational graph neural network,
facilitating interactions between claims and evidence. An attention-based
method is utilized to integrate information, combined with a language model for
generating predictions. We introduce a multitask loss function to account for
potential inaccuracies in evidence retrieval. Comprehensive experiments on the
large fact checking dataset FEVEROUS demonstrate the effectiveness of HeterFC.
Code will be released at: https://github.com/Deno-V/HeterFC.Comment: Accepted by 38th Association for the Advancement of Artificial
Intelligence, AAA
- …