115 research outputs found

    Text-Guided Molecule Generation with Diffusion Language Model

    Full text link
    Text-guided molecule generation is a task where molecules are generated to match specific textual descriptions. Recently, most existing SMILES-based molecule generation methods rely on an autoregressive architecture. In this work, we propose the Text-Guided Molecule Generation with Diffusion Language Model (TGM-DLM), a novel approach that leverages diffusion models to address the limitations of autoregressive methods. TGM-DLM updates token embeddings within the SMILES string collectively and iteratively, using a two-phase diffusion generation process. The first phase optimizes embeddings from random noise, guided by the text description, while the second phase corrects invalid SMILES strings to form valid molecular representations. We demonstrate that TGM-DLM outperforms MolT5-Base, an autoregressive model, without the need for additional data resources. Our findings underscore the remarkable effectiveness of TGM-DLM in generating coherent and precise molecules with specific properties, opening new avenues in drug discovery and related scientific domains. Code will be released at: https://github.com/Deno-V/tgm-dlm.Comment: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAA

    StageInteractor: Query-based Object Detector with Cross-stage Interaction

    Full text link
    Previous object detectors make predictions based on dense grid points or numerous preset anchors. Most of these detectors are trained with one-to-many label assignment strategies. On the contrary, recent query-based object detectors depend on a sparse set of learnable queries and a series of decoder layers. The one-to-one label assignment is independently applied on each layer for the deep supervision during training. Despite the great success of query-based object detection, however, this one-to-one label assignment strategy demands the detectors to have strong fine-grained discrimination and modeling capacity. To solve the above problems, in this paper, we propose a new query-based object detector with cross-stage interaction, coined as StageInteractor. During the forward propagation, we come up with an efficient way to improve this modeling ability by reusing dynamic operators with lightweight adapters. As for the label assignment, a cross-stage label assigner is applied subsequent to the one-to-one label assignment. With this assigner, the training target class labels are gathered across stages and then reallocated to proper predictions at each decoder layer. On MS COCO benchmark, our model improves the baseline by 2.2 AP, and achieves 44.8 AP with ResNet-50 as backbone, 100 queries and 12 training epochs. With longer training time and 300 queries, StageInteractor achieves 51.1 AP and 52.2 AP with ResNeXt-101-DCN and Swin-S, respectively

    Learning Optical Flow and Scene Flow with Bidirectional Camera-LiDAR Fusion

    Full text link
    In this paper, we study the problem of jointly estimating the optical flow and scene flow from synchronized 2D and 3D data. Previous methods either employ a complex pipeline that splits the joint task into independent stages, or fuse 2D and 3D information in an ``early-fusion'' or ``late-fusion'' manner. Such one-size-fits-all approaches suffer from a dilemma of failing to fully utilize the characteristic of each modality or to maximize the inter-modality complementarity. To address the problem, we propose a novel end-to-end framework, which consists of 2D and 3D branches with multiple bidirectional fusion connections between them in specific layers. Different from previous work, we apply a point-based 3D branch to extract the LiDAR features, as it preserves the geometric structure of point clouds. To fuse dense image features and sparse point features, we propose a learnable operator named bidirectional camera-LiDAR fusion module (Bi-CLFM). We instantiate two types of the bidirectional fusion pipeline, one based on the pyramidal coarse-to-fine architecture (dubbed CamLiPWC), and the other one based on the recurrent all-pairs field transforms (dubbed CamLiRAFT). On FlyingThings3D, both CamLiPWC and CamLiRAFT surpass all existing methods and achieve up to a 47.9\% reduction in 3D end-point-error from the best published result. Our best-performing model, CamLiRAFT, achieves an error of 4.26\% on the KITTI Scene Flow benchmark, ranking 1st among all submissions with much fewer parameters. Besides, our methods have strong generalization performance and the ability to handle non-rigid motion. Code is available at https://github.com/MCG-NJU/CamLiFlow.Comment: Accepted to TPAMI 202

    SparseBEV: High-Performance Sparse 3D Object Detection from Multi-Camera Videos

    Full text link
    Camera-based 3D object detection in BEV (Bird's Eye View) space has drawn great attention over the past few years. Dense detectors typically follow a two-stage pipeline by first constructing a dense BEV feature and then performing object detection in BEV space, which suffers from complex view transformations and high computation cost. On the other side, sparse detectors follow a query-based paradigm without explicit dense BEV feature construction, but achieve worse performance than the dense counterparts. In this paper, we find that the key to mitigate this performance gap is the adaptability of the detector in both BEV and image space. To achieve this goal, we propose SparseBEV, a fully sparse 3D object detector that outperforms the dense counterparts. SparseBEV contains three key designs, which are (1) scale-adaptive self attention to aggregate features with adaptive receptive field in BEV space, (2) adaptive spatio-temporal sampling to generate sampling locations under the guidance of queries, and (3) adaptive mixing to decode the sampled features with dynamic weights from the queries. On the test split of nuScenes, SparseBEV achieves the state-of-the-art performance of 67.5 NDS. On the val split, SparseBEV achieves 55.8 NDS while maintaining a real-time inference speed of 23.5 FPS. Code is available at https://github.com/MCG-NJU/SparseBEV.Comment: Accepted to ICCV 202

    Numerical investigation on the propulsive performance of flexible flapping fins using CFD/CSD method

    Get PDF
    A FSI (fluid-structure interaction) numerical simulation was performed to investigate the flow field around a flexible flapping fin using an in-house developed CFD/CSD solver. The three-dimensional fluid-structure interaction of the flapping locomotion was achieved by loosely coupling preconditioned Unsteady Reynolds-Averaged Navier-Stokes (URANS) solutions and non-linear co-rotational structural solutions. The CSD solver was developed specifically for high flexible flapping fins by considering the large geometric nonlinear characteristics. Validation of benchmark tests illustrated the high-fidelity of the developed methodology. Then effect of flexural angles, flexural amplitude and flapping frequency in terms of Strouhal number were evaluated. Results demonstrated that different flexural angles will present different flow fields, and thus significantly varied thrust generation and pressure distribution. The thrust does not increase monotonically with flexural angles. The thrust is also found to increase with increasing Strouhal number while propulsive efficiency peaks within the range of 0.2 <St< 0.4, which lies in the middle of the range observed in nature. The appropriate combination of flexibility and Strouhal number illustrates higher efficiency and gives instruction for further design of flexible flapping fins

    Numerical investigation on the propulsive performance of flexible flapping fins using CFD/CSD method

    Get PDF
    A FSI (fluid-structure interaction) numerical simulation was performed to investigate the flow field around a flexible flapping fin using an in-house developed CFD/CSD solver. The three-dimensional fluid-structure interaction of the flapping locomotion was achieved by loosely coupling preconditioned Unsteady Reynolds-Averaged Navier-Stokes (URANS) solutions and non-linear co-rotational structural solutions. The CSD solver was developed specifically for high flexible flapping fins by considering the large geometric nonlinear characteristics. Validation of benchmark tests illustrated the high-fidelity of the developed methodology. Then effect of flexural angles, flexural amplitude and flapping frequency in terms of Strouhal number were evaluated. Results demonstrated that different flexural angles will present different flow fields, and thus significantly varied thrust generation and pressure distribution. The thrust does not increase monotonically with flexural angles. The thrust is also found to increase with increasing Strouhal number while propulsive efficiency peaks within the range of 0.2 <St< 0.4, which lies in the middle of the range observed in nature. The appropriate combination of flexibility and Strouhal number illustrates higher efficiency and gives instruction for further design of flexible flapping fins

    Localization and Completion for 3D Object Interactions

    Get PDF

    LinK: Linear Kernel for LiDAR-based 3D Perception

    Full text link
    Extending the success of 2D Large Kernel to 3D perception is challenging due to: 1. the cubically-increasing overhead in processing 3D data; 2. the optimization difficulties from data scarcity and sparsity. Previous work has taken the first step to scale up the kernel size from 3x3x3 to 7x7x7 by introducing block-shared weights. However, to reduce the feature variations within a block, it only employs modest block size and fails to achieve larger kernels like the 21x21x21. To address this issue, we propose a new method, called LinK, to achieve a wider-range perception receptive field in a convolution-like manner with two core designs. The first is to replace the static kernel matrix with a linear kernel generator, which adaptively provides weights only for non-empty voxels. The second is to reuse the pre-computed aggregation results in the overlapped blocks to reduce computation complexity. The proposed method successfully enables each voxel to perceive context within a range of 21x21x21. Extensive experiments on two basic perception tasks, 3D object detection and 3D semantic segmentation, demonstrate the effectiveness of our method. Notably, we rank 1st on the public leaderboard of the 3D detection benchmark of nuScenes (LiDAR track), by simply incorporating a LinK-based backbone into the basic detector, CenterPoint. We also boost the strong segmentation baseline's mIoU with 2.7% in the SemanticKITTI test set. Code is available at https://github.com/MCG-NJU/LinK.Comment: Accepted to CVPR202

    Heterogeneous Graph Reasoning for Fact Checking over Texts and Tables

    Full text link
    Fact checking aims to predict claim veracity by reasoning over multiple evidence pieces. It usually involves evidence retrieval and veracity reasoning. In this paper, we focus on the latter, reasoning over unstructured text and structured table information. Previous works have primarily relied on fine-tuning pretrained language models or training homogeneous-graph-based models. Despite their effectiveness, we argue that they fail to explore the rich semantic information underlying the evidence with different structures. To address this, we propose a novel word-level Heterogeneous-graph-based model for Fact Checking over unstructured and structured information, namely HeterFC. Our approach leverages a heterogeneous evidence graph, with words as nodes and thoughtfully designed edges representing different evidence properties. We perform information propagation via a relational graph neural network, facilitating interactions between claims and evidence. An attention-based method is utilized to integrate information, combined with a language model for generating predictions. We introduce a multitask loss function to account for potential inaccuracies in evidence retrieval. Comprehensive experiments on the large fact checking dataset FEVEROUS demonstrate the effectiveness of HeterFC. Code will be released at: https://github.com/Deno-V/HeterFC.Comment: Accepted by 38th Association for the Advancement of Artificial Intelligence, AAA
    • …
    corecore