340 research outputs found
Learned Image Compression with Mixed Transformer-CNN Architectures
Learned image compression (LIC) methods have exhibited promising progress and
superior rate-distortion performance compared with classical image compression
standards. Most existing LIC methods are Convolutional Neural Networks-based
(CNN-based) or Transformer-based, which have different advantages. Exploiting
both advantages is a point worth exploring, which has two challenges: 1) how to
effectively fuse the two methods? 2) how to achieve higher performance with a
suitable complexity? In this paper, we propose an efficient parallel
Transformer-CNN Mixture (TCM) block with a controllable complexity to
incorporate the local modeling ability of CNN and the non-local modeling
ability of transformers to improve the overall architecture of image
compression models. Besides, inspired by the recent progress of entropy
estimation models and attention modules, we propose a channel-wise entropy
model with parameter-efficient swin-transformer-based attention (SWAtten)
modules by using channel squeezing. Experimental results demonstrate our
proposed method achieves state-of-the-art rate-distortion performances on three
different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional
Validation) compared to existing LIC methods. The code is at
https://github.com/jmliu206/LIC_TCM.Comment: Accepted by CVPR2023 (Highlight
When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision
Learning from bounding-boxes annotations has shown great potential in
weakly-supervised 3D point cloud instance segmentation. However, we observed
that existing methods would suffer severe performance degradation with
perturbed bounding box annotations. To tackle this issue, we propose a
complementary image prompt-induced weakly-supervised point cloud instance
segmentation (CIP-WPIS) method. CIP-WPIS leverages pretrained knowledge
embedded in the 2D foundation model SAM and 3D geometric prior to achieve
accurate point-wise instance labels from the bounding box annotations.
Specifically, CP-WPIS first selects image views in which 3D candidate points of
an instance are fully visible. Then, we generate complementary background and
foreground prompts from projections to obtain SAM 2D instance mask predictions.
According to these, we assign the confidence values to points indicating the
likelihood of points belonging to the instance. Furthermore, we utilize 3D
geometric homogeneity provided by superpoints to decide the final instance
label assignments. In this fashion, we achieve high-quality 3D point-wise
instance labels. Extensive experiments on both Scannet-v2 and S3DIS benchmarks
demonstrate that our method is robust against noisy 3D bounding-box annotations
and achieves state-of-the-art performance
Anomaly and geochemistry of rare earth elements and yttrium in the late Permian coal from the Moxinpo mine, Chongqing, southwestern China
Abstract The rare earth elements and yttrium (REY) of the K2 coal from the Moxinpo mine, Chongqing, were determined using inductively coupled plasma mass spectrometry (ICP-MS). The results show that REY are enriched in the K2 coal, with the average content up to 462 μg/g, much higher than average values of most coals in the world. The REY distribution patterns indicate that the light REY is enriched and show a well-pronounced Eu minimum. The fractionation of individual light-REY is higher than that of the heavy-REY. The REY distribution through the K2 coal seam shows that the top and bottom portion of the coal seam have a lower content of REY than the middle portion. Goyazite and rhabdophane were identified with a scanning electron microscope equipped with an energy-dispersed X-ray spectrometer (SEM-EDX). The REY distributions through the coal seam, SEM-EDX data and the correlation analysis between ash yields and the concentrations have revealed that the REY mainly occurs in the organic matter. The K2 coal is a potential rare-metal resource due to its high REY contents, and the coal ash could be regarded as a new and promising raw material for recovery of REY as a by-product
Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization
Pretrained language models have achieved remarkable success in natural
language understanding. However, fine-tuning pretrained models on limited
training data tends to overfit and thus diminish performance. This paper
presents Bi-Drop, a fine-tuning strategy that selectively updates model
parameters using gradients from various sub-nets dynamically generated by
dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner,
so it overcomes the problem of hysteresis in sub-net updating, which is
possessed by previous methods that perform asynchronous sub-net estimation.
Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves
higher utility of training data. Experiments on the GLUE benchmark demonstrate
that Bi-Drop consistently outperforms previous fine-tuning methods.
Furthermore, empirical results also show that Bi-Drop exhibits excellent
generalization ability and robustness for domain transfer, data imbalance, and
low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with
equal contribution
Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction
Target Speech Extraction (TSE) is a crucial task in speech processing that
focuses on isolating the clean speech of a specific speaker from complex
mixtures. While discriminative methods are commonly used for TSE, they can
introduce distortion in terms of speech perception quality. On the other hand,
generative approaches, particularly diffusion-based methods, can enhance speech
quality perceptually but suffer from slower inference speed. We propose an
efficient generative approach named Diffusion Conditional Expectation Model
(DCEM) for TSE. It can handle multi- and single-speaker scenarios in both noisy
and clean conditions. Additionally, we introduce Regenerate-DCEM (R-DCEM) that
can regenerate and optimize speech quality based on pre-processed speech from a
discriminative model. Our method outperforms conventional methods in terms of
both intrusive and non-intrusive metrics and demonstrates notable strengths in
inference efficiency and robustness to unseen tasks. Audio examples are
available online (https://vivian556123.github.io/dcem).Comment: Submitted to ICASSP 202
- …