106 research outputs found

    Less is More: Physical-enhanced Radar-Inertial Odometry

    Full text link
    Radar offers the advantage of providing additional physical properties related to observed objects. In this study, we design a physical-enhanced radar-inertial odometry system that capitalizes on the Doppler velocities and radar cross-section information. The filter for static radar points, correspondence estimation, and residual functions are all strengthened by integrating the physical properties. We conduct experiments on both public datasets and our self-collected data, with different mobile platforms and sensor types. Our quantitative results demonstrate that the proposed radar-inertial odometry system outperforms alternative methods using the physical-enhanced components. Our findings also reveal that using the physical properties results in fewer radar points for odometry estimation, but the performance is still guaranteed and even improved, thus aligning with the ``less is more'' principle.Comment: Accepted by ICRA 202

    Denoising Diffusion Semantic Segmentation with Mask Prior Modeling

    Full text link
    The evolution of semantic segmentation has long been dominated by learning more discriminative image representations for classifying each pixel. Despite the prominent advancements, the priors of segmentation masks themselves, e.g., geometric and semantic constraints, are still under-explored. In this paper, we propose to ameliorate the semantic segmentation quality of existing discriminative approaches with a mask prior modeled by a recently-developed denoising diffusion generative model. Beginning with a unified architecture that adapts diffusion models for mask prior modeling, we focus this work on a specific instantiation with discrete diffusion and identify a variety of key design choices for its successful application. Our exploratory analysis revealed several important findings, including: (1) a simple integration of diffusion models into semantic segmentation is not sufficient, and a poorly-designed diffusion process might lead to degradation in segmentation performance; (2) during the training, the object to which noise is added is more important than the type of noise; (3) during the inference, the strict diffusion denoising scheme may not be essential and can be relaxed to a simpler scheme that even works better. We evaluate the proposed prior modeling with several off-the-shelf segmentors, and our experimental results on ADE20K and Cityscapes demonstrate that our approach could achieve competitively quantitative performance and more appealing visual quality

    Image Reconstruction of Two-Dimensional Highly Scattering Inhomogeneous Medium Using MAP-Based Estimation

    Get PDF
    A maximum a posteriori (MAP) estimation based on Bayesian framework is applied to image reconstruction of two-dimensional highly scattering inhomogeneous medium. The finite difference method (FDM) and conjugate gradient (CG) algorithm serve as the forward and inverse solving models, respectively. The generalized Gaussian Markov random field model (GGMRF) is treated as the regularization, and finally the influence of the measurement errors and initial distributions is investigated. Through the test cases, the MAP estimate algorithm is demonstrated to greatly improve the reconstruction results of the optical coefficients

    Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

    Full text link
    Transformers have revolutionized computer vision and natural language processing, but their high computational complexity limits their application in high-resolution image processing and long-context analysis. This paper introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the NLP field with necessary modifications for vision tasks. Similar to the Vision Transformer (ViT), our model is designed to efficiently handle sparse inputs and demonstrate robust global processing capabilities, while also scaling up effectively, accommodating both large-scale parameters and extensive datasets. Its distinctive advantage lies in its reduced spatial aggregation complexity, which renders it exceptionally adept at processing high-resolution images seamlessly, eliminating the necessity for windowing operations. Our evaluations demonstrate that VRWKV surpasses ViT's performance in image classification and has significantly faster speeds and lower memory usage processing high-resolution inputs. In dense prediction tasks, it outperforms window-based models, maintaining comparable speeds. These results highlight VRWKV's potential as a more efficient alternative for visual perception tasks. Code is released at \url{https://github.com/OpenGVLab/Vision-RWKV}

    Ethnicity, Stigma and Adherence to Antiretroviral Therapy (ART) among People Living with HIV/AIDS in Guangxi, China

    Get PDF
    This study examines the impact of ethnicity and multiple types of HIV-related stigma on adherence to antiretroviral therapy (ART) among 2,146 people living with HIV/AIDS (PLWHA) in Guangxi, China who had initiated ART. The results of multiple binary logistic regressions indicate that those who had experienced enacted stigma tended to report lower adherence, while better adherence was associated with older age, being women and having a job. Ethnicity had a moderator effect on the association between internalized stigma and adherence since better adherence was associated with lower internalized stigma among participants in ethnic minority groups other than Zhuang. Our findings indicate that PLWHA of other ethnic minority groups could benefit from internalized stigma reduction interventions; PLWHA, overall, could benefit most from increased employment opportunities and acquisition of coping skills to mitigate the negative effects of enacted stigma

    LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion

    Full text link
    LiDAR-camera fusion methods have shown impressive performance in 3D object detection. Recent advanced multi-modal methods mainly perform global fusion, where image features and point cloud features are fused across the whole scene. Such practice lacks fine-grained region-level information, yielding suboptimal fusion performance. In this paper, we present the novel Local-to-Global fusion network (LoGoNet), which performs LiDAR-camera fusion at both local and global levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous literature, while we exclusively use point centroids to more precisely represent the position of voxel features, thus achieving better cross-modal alignment. As to the Local Fusion (LoF), we first divide each proposal into uniform grids and then project these grid centers to the images. The image features around the projected grid points are sampled to be fused with position-decorated point cloud features, maximally utilizing the rich contextual information around the proposals. The Feature Dynamic Aggregation (FDA) module is further proposed to achieve information interaction between these locally and globally fused features, thus producing more informative multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD) and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy that, for the first time, the detection performance on three classes surpasses 80 APH (L2) simultaneously. Code will be available at \url{https://github.com/sankin97/LoGoNet}.Comment: Accepted by CVPR202

    DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds

    Full text link
    Existing offboard 3D detectors always follow a modular pipeline design to take advantage of unlimited sequential point clouds. We have found that the full potential of offboard 3D detectors is not explored mainly due to two reasons: (1) the onboard multi-object tracker cannot generate sufficient complete object trajectories, and (2) the motion state of objects poses an inevitable challenge for the object-centric refining stage in leveraging the long-term temporal context representation. To tackle these problems, we propose a novel paradigm of offboard 3D object detection, named DetZero. Concretely, an offline tracker coupled with a multi-frame detector is proposed to focus on the completeness of generated object tracks. An attention-mechanism refining module is proposed to strengthen contextual information interaction across long-term sequential point clouds for object refining with decomposed regression methods. Extensive experiments on Waymo Open Dataset show our DetZero outperforms all state-of-the-art onboard and offboard 3D detection methods. Notably, DetZero ranks 1st place on Waymo 3D object detection leaderboard with 85.15 mAPH (L2) detection performance. Further experiments validate the application of taking the place of human labels with such high-quality results. Our empirical study leads to rethinking conventions and interesting findings that can guide future research on offboard 3D object detection.Comment: 17 pages, 8 figure

    UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase

    Full text link
    Point-, voxel-, and range-views are three representative forms of point clouds. All of them have accurate 3D measurements but lack color and texture information. RGB images are a natural complement to these point cloud views and fully utilizing the comprehensive information of them benefits more robust perceptions. In this paper, we present a unified multi-modal LiDAR segmentation network, termed UniSeg, which leverages the information of RGB images and three views of the point cloud, and accomplishes semantic segmentation and panoptic segmentation simultaneously. Specifically, we first design the Learnable cross-Modal Association (LMA) module to automatically fuse voxel-view and range-view features with image features, which fully utilize the rich semantic information of images and are robust to calibration errors. Then, the enhanced voxel-view and range-view features are transformed to the point space,where three views of point cloud features are further fused adaptively by the Learnable cross-View Association module (LVA). Notably, UniSeg achieves promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes, and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks, including the LiDAR semantic segmentation challenge of nuScenes and panoptic segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg codebase, which is the largest and most comprehensive outdoor LiDAR segmentation codebase. It contains most of the popular outdoor LiDAR segmentation algorithms and provides reproducible implementations. The OpenPCSeg codebase will be made publicly available at https://github.com/PJLab-ADG/PCSeg.Comment: ICCV 2023; 21 pages; 9 figures; 18 tables; Code at https://github.com/PJLab-ADG/PCSe

    Mild temperature photothermal assisted anti-bacterial and anti-inflammatory nanosystem for synergistic treatment of post-cataract surgery endophthalmitis

    Get PDF
    Rationale: Endophthalmitis, which is one of the severest complications of cataract surgeries, can seriously threaten vision and even lead to irreversible blindness owing to its complicated microenvironment, including both local bacterial infection and severe inflammation. It is urgent to develop a comprehensive treatment for both anti-bacterial and anti-inflammatory effects. Methods: Herein, we developed AuAgCu2O-bromfenac sodium nanoparticles (AuAgCu2O-BS NPs), which was designed to combine anti-bacterial and anti-inflammatory effects for integrated therapy of endophthalmitis after cataract surgery. The AuAgCu2O-BS NPs could eradicate methicillin-resistant Staphylococcus aureus (MRSA) bacterial strain relied on their photodynamic effects and the release of metal ions (Ag+ and Cu+) by the hollow AuAgCu2O nanostructures mediated mild photothermal effects. The anti-inflammatory drug, bromfenac sodium, released from the nanoparticles were able to significantly reduce the local inflammation of the endophthalmitis and promote tissue rehabilitation. In vivo bacterial elimination and anti-inflammation were confirmed by a postcataract endophthalmitis rabbit model. Results: Excellent antibacterial ability of AuAgCu2O-BS NPs was verified both in vitro and in vivo. Ophthalmological clinical observation and pathologic histology analysis showed prominent treatment of inflammatory reaction. Importantly, the mild temperature photothermal effect not only promoted the release of metal ions and bromfenac sodium but also avoided the thermal damage of the surrounding tissues, which was more suitable for the practical application of ophthalmology due to the complex structure of the eyeball. Moreover, superior biocompatibility was approved by the preliminary toxicity investigations, including low cytotoxicity, negligible damage to major organs, and stable intraocular pressure. Conclusions: Our studies of nanosystem provide a promising synergic therapeutic strategy for postcataract endophthalmitis treatment with favorable prognosis and promise in clinical translations.Peer reviewe
    corecore