54 research outputs found
Detecting Abrupt Change of Channel Covariance Matrix in IRS-Assisted Communication
The knowledge of channel covariance matrices is crucial to the design of
intelligent reflecting surface (IRS) assisted communication. However, channel
covariance matrices may change suddenly in practice. This letter focuses on the
detection of the above change in IRS-assisted communication. Specifically, we
consider the uplink communication system consisting of a single-antenna user
(UE), an IRS, and a multi-antenna base station (BS). We first categorize two
types of channel covariance matrix changes based on their impact on system
design: Type I change, which denotes the change in the BS receive covariance
matrix, and Type II change, which denotes the change in the IRS
transmit/receive covariance matrix. Secondly, a powerful method is proposed to
detect whether a Type I change occurs, a Type II change occurs, or no change
occurs. The effectiveness of our proposed scheme is verified by numerical
results.Comment: accepted by IEEE Wireless Communications Letter
NB-IoT Uplink Synchronization by Change Point Detection of Phase Series in NTNs
Non-Terrestrial Networks (NTNs) are widely recognized as a potential solution
to achieve ubiquitous connections of Narrow Bandwidth Internet of Things
(NB-IoT). In order to adopt NTNs in NB-IoT, one of the main challenges is the
uplink synchronization of Narrowband Physical Random Access procedure which
refers to the estimation of time of arrival (ToA) and carrier frequency offset
(CFO). Due to the large propagation delay and Doppler shift in NTNs,
traditional estimation methods for Terrestrial Networks (TNs) can not be
applied in NTNs directly. In this context, we design a two stage ToA and CFO
estimation scheme including coarse estimation and fine estimation based on
abrupt change point detection (CPD) of phase series with machine learning. Our
method achieves high estimation accuracy of ToA and CFO under the low
signal-noise ratio (SNR) and large Doppler shift conditions and extends the
estimation range without enhancing Random Access preambles
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Recent advancements in vision foundation models (VFMs) have opened up new
possibilities for versatile and efficient visual perception. In this work, we
introduce Seal, a novel framework that harnesses VFMs for segmenting diverse
automotive point cloud sequences. Seal exhibits three appealing properties: i)
Scalability: VFMs are directly distilled into point clouds, obviating the need
for annotations in either 2D or 3D during pretraining. ii) Consistency: Spatial
and temporal relationships are enforced at both the camera-to-LiDAR and
point-to-segment regularization stages, facilitating cross-modal representation
learning. iii) Generalizability: Seal enables knowledge transfer in an
off-the-shelf manner to downstream tasks involving diverse point clouds,
including those from real/synthetic, low/high-resolution, large/small-scale,
and clean/corrupted datasets. Extensive experiments conducted on eleven
different point cloud datasets showcase the effectiveness and superiority of
Seal. Notably, Seal achieves a remarkable 45.0% mIoU on nuScenes after linear
probing, surpassing random initialization by 36.9% mIoU and outperforming prior
arts by 6.1% mIoU. Moreover, Seal demonstrates significant performance gains
over existing methods across 20 different few-shot fine-tuning tasks on all
eleven tested point cloud datasets.Comment: NeurIPS 2023 (Spotlight); 37 pages, 16 figures, 15 tables; Code at
https://github.com/youquanl/Segment-Any-Point-Clou
Towards Label-free Scene Understanding by Vision Foundation Models
Vision foundation models such as Contrastive Vision-Language Pre-training
(CLIP) and Segment Anything (SAM) have demonstrated impressive zero-shot
performance on image classification and segmentation tasks. However, the
incorporation of CLIP and SAM for label-free scene understanding has yet to be
explored. In this paper, we investigate the potential of vision foundation
models in enabling networks to comprehend 2D and 3D worlds without labelled
data. The primary challenge lies in effectively supervising networks under
extremely noisy pseudo labels, which are generated by CLIP and further
exacerbated during the propagation from the 2D to the 3D domain. To tackle
these challenges, we propose a novel Cross-modality Noisy Supervision (CNS)
method that leverages the strengths of CLIP and SAM to supervise 2D and 3D
networks simultaneously. In particular, we introduce a prediction consistency
regularization to co-train 2D and 3D networks, then further impose the
networks' latent space consistency using the SAM's robust feature
representation. Experiments conducted on diverse indoor and outdoor datasets
demonstrate the superior performance of our method in understanding 2D and 3D
open environments. Our 2D and 3D network achieves label-free semantic
segmentation with 28.4% and 33.5% mIoU on ScanNet, improving 4.7% and 7.9%,
respectively. And for nuScenes dataset, our performance is 26.8% with an
improvement of 6%. Code will be released
(https://github.com/runnanchen/Label-Free-Scene-Understanding)
Rethinking Range View Representation for LiDAR Segmentation
LiDAR segmentation is crucial for autonomous driving perception. Recent
trends favor point- or voxel-based methods as they often yield better
performance than the traditional range view representation. In this work, we
unveil several key factors in building powerful range view models. We observe
that the "many-to-one" mapping, semantic incoherence, and shape deformation are
possible impediments against effective learning from range view projections. We
present RangeFormer -- a full-cycle framework comprising novel designs across
network architecture, data augmentation, and post-processing -- that better
handles the learning and processing of LiDAR point clouds from the range view.
We further introduce a Scalable Training from Range view (STR) strategy that
trains on arbitrary low-resolution 2D range images, while still maintaining
satisfactory 3D segmentation accuracy. We show that, for the first time, a
range view method is able to surpass the point, voxel, and multi-view fusion
counterparts in the competing LiDAR semantic and panoptic segmentation
benchmarks, i.e., SemanticKITTI, nuScenes, and ScribbleKITTI.Comment: ICCV 2023; 24 pages, 10 figures, 14 tables; Webpage at
https://ldkong.com/RangeForme
- …