106 research outputs found
Less is More: Physical-enhanced Radar-Inertial Odometry
Radar offers the advantage of providing additional physical properties
related to observed objects. In this study, we design a physical-enhanced
radar-inertial odometry system that capitalizes on the Doppler velocities and
radar cross-section information. The filter for static radar points,
correspondence estimation, and residual functions are all strengthened by
integrating the physical properties. We conduct experiments on both public
datasets and our self-collected data, with different mobile platforms and
sensor types. Our quantitative results demonstrate that the proposed
radar-inertial odometry system outperforms alternative methods using the
physical-enhanced components. Our findings also reveal that using the physical
properties results in fewer radar points for odometry estimation, but the
performance is still guaranteed and even improved, thus aligning with the
``less is more'' principle.Comment: Accepted by ICRA 202
Denoising Diffusion Semantic Segmentation with Mask Prior Modeling
The evolution of semantic segmentation has long been dominated by learning
more discriminative image representations for classifying each pixel. Despite
the prominent advancements, the priors of segmentation masks themselves, e.g.,
geometric and semantic constraints, are still under-explored. In this paper, we
propose to ameliorate the semantic segmentation quality of existing
discriminative approaches with a mask prior modeled by a recently-developed
denoising diffusion generative model. Beginning with a unified architecture
that adapts diffusion models for mask prior modeling, we focus this work on a
specific instantiation with discrete diffusion and identify a variety of key
design choices for its successful application. Our exploratory analysis
revealed several important findings, including: (1) a simple integration of
diffusion models into semantic segmentation is not sufficient, and a
poorly-designed diffusion process might lead to degradation in segmentation
performance; (2) during the training, the object to which noise is added is
more important than the type of noise; (3) during the inference, the strict
diffusion denoising scheme may not be essential and can be relaxed to a simpler
scheme that even works better. We evaluate the proposed prior modeling with
several off-the-shelf segmentors, and our experimental results on ADE20K and
Cityscapes demonstrate that our approach could achieve competitively
quantitative performance and more appealing visual quality
Image Reconstruction of Two-Dimensional Highly Scattering Inhomogeneous Medium Using MAP-Based Estimation
A maximum a posteriori (MAP) estimation based on Bayesian framework is applied to image reconstruction of two-dimensional highly scattering inhomogeneous medium. The finite difference method (FDM) and conjugate gradient (CG) algorithm serve as the forward and inverse solving models, respectively. The generalized Gaussian Markov random field model (GGMRF) is treated as the regularization, and finally the influence of the measurement errors and initial distributions is investigated. Through the test cases, the MAP estimate algorithm is demonstrated to greatly improve the reconstruction results of the optical coefficients
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Transformers have revolutionized computer vision and natural language
processing, but their high computational complexity limits their application in
high-resolution image processing and long-context analysis. This paper
introduces Vision-RWKV (VRWKV), a model adapted from the RWKV model used in the
NLP field with necessary modifications for vision tasks. Similar to the Vision
Transformer (ViT), our model is designed to efficiently handle sparse inputs
and demonstrate robust global processing capabilities, while also scaling up
effectively, accommodating both large-scale parameters and extensive datasets.
Its distinctive advantage lies in its reduced spatial aggregation complexity,
which renders it exceptionally adept at processing high-resolution images
seamlessly, eliminating the necessity for windowing operations. Our evaluations
demonstrate that VRWKV surpasses ViT's performance in image classification and
has significantly faster speeds and lower memory usage processing
high-resolution inputs. In dense prediction tasks, it outperforms window-based
models, maintaining comparable speeds. These results highlight VRWKV's
potential as a more efficient alternative for visual perception tasks. Code is
released at \url{https://github.com/OpenGVLab/Vision-RWKV}
Ethnicity, Stigma and Adherence to Antiretroviral Therapy (ART) among People Living with HIV/AIDS in Guangxi, China
This study examines the impact of ethnicity and multiple types of HIV-related stigma on adherence to antiretroviral therapy (ART) among 2,146 people living with HIV/AIDS (PLWHA) in Guangxi, China who had initiated ART. The results of multiple binary logistic regressions indicate that those who had experienced enacted stigma tended to report lower adherence, while better adherence was associated with older age, being women and having a job. Ethnicity had a moderator effect on the association between internalized stigma and adherence since better adherence was associated with lower internalized stigma among participants in ethnic minority groups other than Zhuang. Our findings indicate that PLWHA of other ethnic minority groups could benefit from internalized stigma reduction interventions; PLWHA, overall, could benefit most from increased employment opportunities and acquisition of coping skills to mitigate the negative effects of enacted stigma
LoGoNet: Towards Accurate 3D Object Detection with Local-to-Global Cross-Modal Fusion
LiDAR-camera fusion methods have shown impressive performance in 3D object
detection. Recent advanced multi-modal methods mainly perform global fusion,
where image features and point cloud features are fused across the whole scene.
Such practice lacks fine-grained region-level information, yielding suboptimal
fusion performance. In this paper, we present the novel Local-to-Global fusion
network (LoGoNet), which performs LiDAR-camera fusion at both local and global
levels. Concretely, the Global Fusion (GoF) of LoGoNet is built upon previous
literature, while we exclusively use point centroids to more precisely
represent the position of voxel features, thus achieving better cross-modal
alignment. As to the Local Fusion (LoF), we first divide each proposal into
uniform grids and then project these grid centers to the images. The image
features around the projected grid points are sampled to be fused with
position-decorated point cloud features, maximally utilizing the rich
contextual information around the proposals. The Feature Dynamic Aggregation
(FDA) module is further proposed to achieve information interaction between
these locally and globally fused features, thus producing more informative
multi-modal features. Extensive experiments on both Waymo Open Dataset (WOD)
and KITTI datasets show that LoGoNet outperforms all state-of-the-art 3D
detection methods. Notably, LoGoNet ranks 1st on Waymo 3D object detection
leaderboard and obtains 81.02 mAPH (L2) detection performance. It is noteworthy
that, for the first time, the detection performance on three classes surpasses
80 APH (L2) simultaneously. Code will be available at
\url{https://github.com/sankin97/LoGoNet}.Comment: Accepted by CVPR202
DetZero: Rethinking Offboard 3D Object Detection with Long-term Sequential Point Clouds
Existing offboard 3D detectors always follow a modular pipeline design to
take advantage of unlimited sequential point clouds. We have found that the
full potential of offboard 3D detectors is not explored mainly due to two
reasons: (1) the onboard multi-object tracker cannot generate sufficient
complete object trajectories, and (2) the motion state of objects poses an
inevitable challenge for the object-centric refining stage in leveraging the
long-term temporal context representation. To tackle these problems, we propose
a novel paradigm of offboard 3D object detection, named DetZero. Concretely, an
offline tracker coupled with a multi-frame detector is proposed to focus on the
completeness of generated object tracks. An attention-mechanism refining module
is proposed to strengthen contextual information interaction across long-term
sequential point clouds for object refining with decomposed regression methods.
Extensive experiments on Waymo Open Dataset show our DetZero outperforms all
state-of-the-art onboard and offboard 3D detection methods. Notably, DetZero
ranks 1st place on Waymo 3D object detection leaderboard with 85.15 mAPH (L2)
detection performance. Further experiments validate the application of taking
the place of human labels with such high-quality results. Our empirical study
leads to rethinking conventions and interesting findings that can guide future
research on offboard 3D object detection.Comment: 17 pages, 8 figure
UniSeg: A Unified Multi-Modal LiDAR Segmentation Network and the OpenPCSeg Codebase
Point-, voxel-, and range-views are three representative forms of point
clouds. All of them have accurate 3D measurements but lack color and texture
information. RGB images are a natural complement to these point cloud views and
fully utilizing the comprehensive information of them benefits more robust
perceptions. In this paper, we present a unified multi-modal LiDAR segmentation
network, termed UniSeg, which leverages the information of RGB images and three
views of the point cloud, and accomplishes semantic segmentation and panoptic
segmentation simultaneously. Specifically, we first design the Learnable
cross-Modal Association (LMA) module to automatically fuse voxel-view and
range-view features with image features, which fully utilize the rich semantic
information of images and are robust to calibration errors. Then, the enhanced
voxel-view and range-view features are transformed to the point space,where
three views of point cloud features are further fused adaptively by the
Learnable cross-View Association module (LVA). Notably, UniSeg achieves
promising results in three public benchmarks, i.e., SemanticKITTI, nuScenes,
and Waymo Open Dataset (WOD); it ranks 1st on two challenges of two benchmarks,
including the LiDAR semantic segmentation challenge of nuScenes and panoptic
segmentation challenges of SemanticKITTI. Besides, we construct the OpenPCSeg
codebase, which is the largest and most comprehensive outdoor LiDAR
segmentation codebase. It contains most of the popular outdoor LiDAR
segmentation algorithms and provides reproducible implementations. The
OpenPCSeg codebase will be made publicly available at
https://github.com/PJLab-ADG/PCSeg.Comment: ICCV 2023; 21 pages; 9 figures; 18 tables; Code at
https://github.com/PJLab-ADG/PCSe
Mild temperature photothermal assisted anti-bacterial and anti-inflammatory nanosystem for synergistic treatment of post-cataract surgery endophthalmitis
Rationale: Endophthalmitis, which is one of the severest complications of cataract surgeries, can seriously threaten vision and even lead to irreversible blindness owing to its complicated microenvironment, including both local bacterial infection and severe inflammation. It is urgent to develop a comprehensive treatment for both anti-bacterial and anti-inflammatory effects. Methods: Herein, we developed AuAgCu2O-bromfenac sodium nanoparticles (AuAgCu2O-BS NPs), which was designed to combine anti-bacterial and anti-inflammatory effects for integrated therapy of endophthalmitis after cataract surgery. The AuAgCu2O-BS NPs could eradicate methicillin-resistant Staphylococcus aureus (MRSA) bacterial strain relied on their photodynamic effects and the release of metal ions (Ag+ and Cu+) by the hollow AuAgCu2O nanostructures mediated mild photothermal effects. The anti-inflammatory drug, bromfenac sodium, released from the nanoparticles were able to significantly reduce the local inflammation of the endophthalmitis and promote tissue rehabilitation. In vivo bacterial elimination and anti-inflammation were confirmed by a postcataract endophthalmitis rabbit model. Results: Excellent antibacterial ability of AuAgCu2O-BS NPs was verified both in vitro and in vivo. Ophthalmological clinical observation and pathologic histology analysis showed prominent treatment of inflammatory reaction. Importantly, the mild temperature photothermal effect not only promoted the release of metal ions and bromfenac sodium but also avoided the thermal damage of the surrounding tissues, which was more suitable for the practical application of ophthalmology due to the complex structure of the eyeball. Moreover, superior biocompatibility was approved by the preliminary toxicity investigations, including low cytotoxicity, negligible damage to major organs, and stable intraocular pressure. Conclusions: Our studies of nanosystem provide a promising synergic therapeutic strategy for postcataract endophthalmitis treatment with favorable prognosis and promise in clinical translations.Peer reviewe
- …