333 research outputs found

    Balanced Audiovisual Dataset for Imbalance Analysis

    Full text link
    The imbalance problem is widespread in the field of machine learning, which also exists in multimodal learning areas caused by the intrinsic discrepancy between modalities of samples. Recent works have attempted to solve the modality imbalance problem from algorithm perspective, however, they do not fully analyze the influence of modality bias in datasets. Concretely, existing multimodal datasets are usually collected under specific tasks, where one modality tends to perform better than other ones in most conditions. In this work, to comprehensively explore the influence of modality bias, we first split existing datasets into different subsets by estimating sample-wise modality discrepancy. We surprisingly find that: the multimodal models with existing imbalance algorithms consistently perform worse than the unimodal one on specific subsets, in accordance with the modality bias. To further explore the influence of modality bias and analyze the effectiveness of existing imbalance algorithms, we build a balanced audiovisual dataset, with uniformly distributed modality discrepancy over the whole dataset. We then conduct extensive experiments to re-evaluate existing imbalance algorithms and draw some interesting findings: existing algorithms only provide a compromise between modalities and suffer from the large modality discrepancy of samples. We hope that these findings could facilitate future research on the modality imbalance problem.Comment: website:https://gewu-lab.github.io/Balanced-Audiovisual-Dataset

    All-solid-state asymmetric supercapacitor based on porous cobalt selenide thin films

    Get PDF
    As a significant semiconductor material, cobalt selenide has enormous potential and extensive application prospects in the field of solar cells, photocatalysis and supercapacitor. In this paper, porous CoSe thin films were successfully fabricated on stainless-steel sheet using a facile, effective electrodeposition technique. Electrochemical tests reveal that the specific capacitance reaches as high as 510 F g−1 at the current density of 1 A g−1 with the capacitance retention of 91% over 5000 cycles. An asymmetric all-solid-state supercapacitor is fabricated using CoSe thin film as the positive electrode and activate carbon as the negative electrode. The combined solid device displays a high area specific capacitance of 18.1 mF cm−2 accompanied with good cycling stability, outstanding flexibility and satisfactory mechanical stability. Furthermore, the solid devices connected in series can power the red light-emitting diodes. The results show great potential for preparing large scale high energy density storage systems

    Fluent: Round-efficient Secure Aggregation for Private Federated Learning

    Full text link
    Federated learning (FL) facilitates collaborative training of machine learning models among a large number of clients while safeguarding the privacy of their local datasets. However, FL remains susceptible to vulnerabilities such as privacy inference and inversion attacks. Single-server secure aggregation schemes were proposed to address these threats. Nonetheless, they encounter practical constraints due to their round and communication complexities. This work introduces Fluent, a round and communication-efficient secure aggregation scheme for private FL. Fluent has several improvements compared to state-of-the-art solutions like Bell et al. (CCS 2020) and Ma et al. (SP 2023): (1) it eliminates frequent handshakes and secret sharing operations by efficiently reusing the shares across multiple training iterations without leaking any private information; (2) it accomplishes both the consistency check and gradient unmasking in one logical step, thereby reducing another round of communication. With these innovations, Fluent achieves the fewest communication rounds (i.e., two in the collection phase) in the malicious server setting, in contrast to at least three rounds in existing schemes. This significantly minimizes the latency for geographically distributed clients; (3) Fluent also introduces Fluent-Dynamic with a participant selection algorithm and an alternative secret sharing scheme. This can facilitate dynamic client joining and enhance the system flexibility and scalability. We implemented Fluent and compared it with existing solutions. Experimental results show that Fluent improves the computational cost by at least 75% and communication overhead by at least 25% for normal clients. Fluent also reduces the communication overhead for the server at the expense of a marginal increase in computational cost

    DXVNet-ViT-Huge (JFT) Multimode Classification Network Based on Vision Transformer

    Get PDF
    Aiming at the problem that traditional CNN network is not good at extracting global features of images, Based on DXVNet network, Conditional Random Fields (CRF) component and pre-trained ViT-Huge (Vision Transformer) are adopted in this paper Transformer model expands and builds a brand new DXVNet-ViT-Huge (JFT) network. CRF component can help the network learn the constraint conditions of each word corresponding prediction label, improve the D-GRU method based word label prediction errors, and improve the accuracy of sequence annotation. The Transformer architecture of the ViT (Huge) model can extract the global feature information of the image, while CNN is better at extracting the local features of the image. Therefore, the ViT (Huge) Huge pre-training model and CNN pre-training model adopt the multi-modal feature fusion technology. Two complementary image feature information is fused by Bi-GRU to improve the performance of network classification. The experimental results show that the newly constructed Dxvnet-Vit-Huge (JFT) model achieves good performance, and the F1 values in the two real public data sets are 6.03% and 7.11% higher than the original DXVNet model, respectively

    Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction

    Full text link
    Although sparse-view computed tomography (CT) has significantly reduced radiation dose, it also introduces severe artifacts which degrade the image quality. In recent years, deep learning-based methods for inverse problems have made remarkable progress and have become increasingly popular in CT reconstruction. However, most of these methods suffer several limitations: dependence on high-quality training data, weak interpretability, etc. In this study, we propose a fully unsupervised framework called Deep Radon Prior (DRP), inspired by Deep Image Prior (DIP), to address the aforementioned limitations. DRP introduces a neural network as an implicit prior into the iterative method, thereby realizing cross-domain gradient feedback. During the reconstruction process, the neural network is progressively optimized in multiple stages to narrow the solution space in radon domain for the under-constrained imaging protocol, and the convergence of the proposed method has been discussed in this work. Compared with the popular pre-trained method, the proposed framework requires no dataset and exhibits superior interpretability and generalization ability. The experimental results demonstrate that the proposed method can generate detailed images while effectively suppressing image artifacts.Meanwhile, DRP achieves comparable or better performance than the supervised methods.Comment: 11 pages, 12 figures, Journal pape
    corecore