414 research outputs found
BEVStereo++: Accurate Depth Estimation in Multi-view 3D Object Detection via Dynamic Temporal Stereo
Bounded by the inherent ambiguity of depth perception, contemporary
multi-view 3D object detection methods fall into the performance bottleneck.
Intuitively, leveraging temporal multi-view stereo (MVS) technology is the
natural knowledge for tackling this ambiguity. However, traditional attempts of
MVS has two limitations when applying to 3D object detection scenes: 1) The
affinity measurement among all views suffers expensive computational cost; 2)
It is difficult to deal with outdoor scenarios where objects are often mobile.
To this end, we propose BEVStereo++: by introducing a dynamic temporal stereo
strategy, BEVStereo++ is able to cut down the harm that is brought by
introducing temporal stereo when dealing with those two scenarios. Going one
step further, we apply Motion Compensation Module and long sequence Frame
Fusion to BEVStereo++, which shows further performance boosting and error
reduction. Without bells and whistles, BEVStereo++ achieves
state-of-the-art(SOTA) on both Waymo and nuScenes dataset
ClipSAM: CLIP and SAM Collaboration for Zero-Shot Anomaly Segmentation
Recently, foundational models such as CLIP and SAM have shown promising
performance for the task of Zero-Shot Anomaly Segmentation (ZSAS). However,
either CLIP-based or SAM-based ZSAS methods still suffer from non-negligible
key drawbacks: 1) CLIP primarily focuses on global feature alignment across
different inputs, leading to imprecise segmentation of local anomalous parts;
2) SAM tends to generate numerous redundant masks without proper prompt
constraints, resulting in complex post-processing requirements. In this work,
we innovatively propose a CLIP and SAM collaboration framework called ClipSAM
for ZSAS. The insight behind ClipSAM is to employ CLIP's semantic understanding
capability for anomaly localization and rough segmentation, which is further
used as the prompt constraints for SAM to refine the anomaly segmentation
results. In details, we introduce a crucial Unified Multi-scale Cross-modal
Interaction (UMCI) module for interacting language with visual features at
multiple scales of CLIP to reason anomaly positions. Then, we design a novel
Multi-level Mask Refinement (MMR) module, which utilizes the positional
information as multi-level prompts for SAM to acquire hierarchical levels of
masks and merges them. Extensive experiments validate the effectiveness of our
approach, achieving the optimal segmentation performance on the MVTec-AD and
VisA datasets.Comment: 17 pages,17 figure
Coupling interaction between the power coupler and the third harmonic superconducting cavity
Fermilab has developed a third harmonic superconducting cavity operating at the frequency of 3.9 GHz to improve the beam performance for the FLASH user facility at DESY. It is interesting to investigate the coupling interaction between the SRF cavity and the power coupler with or without beam loading. The coupling of the power coupler to the cavity needs to be determined to minimize the power consumption and guarantee the best performance for a given beam current. In this paper, we build and analyze an equivalent circuit model containing a series of lumped elements to represent the resonant system. An analytic solution of the required power from the generator as a function of the system parameters has also been given based on a vector diagram
Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
In this paper, we propose a robust 3D detector, named Cross Modal Transformer
(CMT), for end-to-end 3D multi-modal detection. Without explicit view
transformation, CMT takes the image and point clouds tokens as inputs and
directly outputs accurate 3D bounding boxes. The spatial alignment of
multi-modal tokens is performed by encoding the 3D points into multi-modal
features. The core design of CMT is quite simple while its performance is
impressive. It achieves 74.1\% NDS (state-of-the-art with single model) on
nuScenes test set while maintaining faster inference speed. Moreover, CMT has a
strong robustness even if the LiDAR is missing. Code is released at
https://github.com/junjie18/CMT
- …