340 research outputs found

    Learned Image Compression with Mixed Transformer-CNN Architectures

    Full text link
    Learned image compression (LIC) methods have exhibited promising progress and superior rate-distortion performance compared with classical image compression standards. Most existing LIC methods are Convolutional Neural Networks-based (CNN-based) or Transformer-based, which have different advantages. Exploiting both advantages is a point worth exploring, which has two challenges: 1) how to effectively fuse the two methods? 2) how to achieve higher performance with a suitable complexity? In this paper, we propose an efficient parallel Transformer-CNN Mixture (TCM) block with a controllable complexity to incorporate the local modeling ability of CNN and the non-local modeling ability of transformers to improve the overall architecture of image compression models. Besides, inspired by the recent progress of entropy estimation models and attention modules, we propose a channel-wise entropy model with parameter-efficient swin-transformer-based attention (SWAtten) modules by using channel squeezing. Experimental results demonstrate our proposed method achieves state-of-the-art rate-distortion performances on three different resolution datasets (i.e., Kodak, Tecnick, CLIC Professional Validation) compared to existing LIC methods. The code is at https://github.com/jmliu206/LIC_TCM.Comment: Accepted by CVPR2023 (Highlight

    When 3D Bounding-Box Meets SAM: Point Cloud Instance Segmentation with Weak-and-Noisy Supervision

    Full text link
    Learning from bounding-boxes annotations has shown great potential in weakly-supervised 3D point cloud instance segmentation. However, we observed that existing methods would suffer severe performance degradation with perturbed bounding box annotations. To tackle this issue, we propose a complementary image prompt-induced weakly-supervised point cloud instance segmentation (CIP-WPIS) method. CIP-WPIS leverages pretrained knowledge embedded in the 2D foundation model SAM and 3D geometric prior to achieve accurate point-wise instance labels from the bounding box annotations. Specifically, CP-WPIS first selects image views in which 3D candidate points of an instance are fully visible. Then, we generate complementary background and foreground prompts from projections to obtain SAM 2D instance mask predictions. According to these, we assign the confidence values to points indicating the likelihood of points belonging to the instance. Furthermore, we utilize 3D geometric homogeneity provided by superpoints to decide the final instance label assignments. In this fashion, we achieve high-quality 3D point-wise instance labels. Extensive experiments on both Scannet-v2 and S3DIS benchmarks demonstrate that our method is robust against noisy 3D bounding-box annotations and achieves state-of-the-art performance

    Anomaly and geochemistry of rare earth elements and yttrium in the late Permian coal from the Moxinpo mine, Chongqing, southwestern China

    Get PDF
    Abstract The rare earth elements and yttrium (REY) of the K2 coal from the Moxinpo mine, Chongqing, were determined using inductively coupled plasma mass spectrometry (ICP-MS). The results show that REY are enriched in the K2 coal, with the average content up to 462 μg/g, much higher than average values of most coals in the world. The REY distribution patterns indicate that the light REY is enriched and show a well-pronounced Eu minimum. The fractionation of individual light-REY is higher than that of the heavy-REY. The REY distribution through the K2 coal seam shows that the top and bottom portion of the coal seam have a lower content of REY than the middle portion. Goyazite and rhabdophane were identified with a scanning electron microscope equipped with an energy-dispersed X-ray spectrometer (SEM-EDX). The REY distributions through the coal seam, SEM-EDX data and the correlation analysis between ash yields and the concentrations have revealed that the REY mainly occurs in the organic matter. The K2 coal is a potential rare-metal resource due to its high REY contents, and the coal ash could be regarded as a new and promising raw material for recovery of REY as a by-product

    Bi-Drop: Enhancing Fine-tuning Generalization via Synchronous sub-net Estimation and Optimization

    Full text link
    Pretrained language models have achieved remarkable success in natural language understanding. However, fine-tuning pretrained models on limited training data tends to overfit and thus diminish performance. This paper presents Bi-Drop, a fine-tuning strategy that selectively updates model parameters using gradients from various sub-nets dynamically generated by dropout. The sub-net estimation of Bi-Drop is performed in an in-batch manner, so it overcomes the problem of hysteresis in sub-net updating, which is possessed by previous methods that perform asynchronous sub-net estimation. Also, Bi-Drop needs only one mini-batch to estimate the sub-net so it achieves higher utility of training data. Experiments on the GLUE benchmark demonstrate that Bi-Drop consistently outperforms previous fine-tuning methods. Furthermore, empirical results also show that Bi-Drop exhibits excellent generalization ability and robustness for domain transfer, data imbalance, and low-resource scenarios.Comment: EMNLP 2023 Findings. Camera-ready version. Co-first authors with equal contribution

    Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

    Full text link
    Target Speech Extraction (TSE) is a crucial task in speech processing that focuses on isolating the clean speech of a specific speaker from complex mixtures. While discriminative methods are commonly used for TSE, they can introduce distortion in terms of speech perception quality. On the other hand, generative approaches, particularly diffusion-based methods, can enhance speech quality perceptually but suffer from slower inference speed. We propose an efficient generative approach named Diffusion Conditional Expectation Model (DCEM) for TSE. It can handle multi- and single-speaker scenarios in both noisy and clean conditions. Additionally, we introduce Regenerate-DCEM (R-DCEM) that can regenerate and optimize speech quality based on pre-processed speech from a discriminative model. Our method outperforms conventional methods in terms of both intrusive and non-intrusive metrics and demonstrates notable strengths in inference efficiency and robustness to unseen tasks. Audio examples are available online (https://vivian556123.github.io/dcem).Comment: Submitted to ICASSP 202
    • …
    corecore