437 research outputs found
3DCFS : Fast and robust joint 3D semantic-instance segmentation via coupled feature selection
We propose a novel fast and robust 3D point clouds segmentation framework via coupled feature selection, named 3DCFS, that jointly performs semantic and instance segmentation. Inspired by the human scene perception process, we design a novel coupled feature selection module, named CFSM, that adaptively selects and fuses the reciprocal semantic and instance features from two tasks in a coupled manner. To further boost the performance of the instance segmentation task in our 3DCFS, we investigate a loss function that helps the model learn to balance the magnitudes of the output embedding dimensions during training, which makes calculating the Euclidean distance more reliable and enhances the generalizability of the model. Extensive experiments demonstrate that our 3DCFS outperforms state-of-the-art methods on benchmark datasets in terms of accuracy, speed and computational cost
Frequency-Aware Transformer for Learned Image Compression
Learned image compression (LIC) has gained traction as an effective solution
for image storage and transmission in recent years. However, existing LIC
methods are redundant in latent representation due to limitations in capturing
anisotropic frequency components and preserving directional details. To
overcome these challenges, we propose a novel frequency-aware transformer (FAT)
block that for the first time achieves multiscale directional ananlysis for
LIC. The FAT block comprises frequency-decomposition window attention (FDWA)
modules to capture multiscale and directional frequency components of natural
images. Additionally, we introduce frequency-modulation feed-forward network
(FMFFN) to adaptively modulate different frequency components, improving
rate-distortion performance. Furthermore, we present a transformer-based
channel-wise autoregressive (T-CA) model that effectively exploits channel
dependencies. Experiments show that our method achieves state-of-the-art
rate-distortion performance compared to existing LIC methods, and evidently
outperforms latest standardized codec VTM-12.1 by 14.5%, 15.1%, 13.0% in
BD-rate on the Kodak, Tecnick, and CLIC datasets
- …