333 research outputs found
Balanced Audiovisual Dataset for Imbalance Analysis
The imbalance problem is widespread in the field of machine learning, which
also exists in multimodal learning areas caused by the intrinsic discrepancy
between modalities of samples. Recent works have attempted to solve the
modality imbalance problem from algorithm perspective, however, they do not
fully analyze the influence of modality bias in datasets. Concretely, existing
multimodal datasets are usually collected under specific tasks, where one
modality tends to perform better than other ones in most conditions. In this
work, to comprehensively explore the influence of modality bias, we first split
existing datasets into different subsets by estimating sample-wise modality
discrepancy. We surprisingly find that: the multimodal models with existing
imbalance algorithms consistently perform worse than the unimodal one on
specific subsets, in accordance with the modality bias. To further explore the
influence of modality bias and analyze the effectiveness of existing imbalance
algorithms, we build a balanced audiovisual dataset, with uniformly distributed
modality discrepancy over the whole dataset. We then conduct extensive
experiments to re-evaluate existing imbalance algorithms and draw some
interesting findings: existing algorithms only provide a compromise between
modalities and suffer from the large modality discrepancy of samples. We hope
that these findings could facilitate future research on the modality imbalance
problem.Comment: website:https://gewu-lab.github.io/Balanced-Audiovisual-Dataset
All-solid-state asymmetric supercapacitor based on porous cobalt selenide thin films
As a significant semiconductor material, cobalt selenide has enormous potential and extensive application prospects in the field of solar cells, photocatalysis and supercapacitor. In this paper, porous CoSe thin films were successfully fabricated on stainless-steel sheet using a facile, effective electrodeposition technique. Electrochemical tests reveal that the specific capacitance reaches as high as 510 F g−1 at the current density of 1 A g−1 with the capacitance retention of 91% over 5000 cycles. An asymmetric all-solid-state supercapacitor is fabricated using CoSe thin film as the positive electrode and activate carbon as the negative electrode. The combined solid device displays a high area specific capacitance of 18.1 mF cm−2 accompanied with good cycling stability, outstanding flexibility and satisfactory mechanical stability. Furthermore, the solid devices connected in series can power the red light-emitting diodes. The results show great potential for preparing large scale high energy density storage systems
Fluent: Round-efficient Secure Aggregation for Private Federated Learning
Federated learning (FL) facilitates collaborative training of machine
learning models among a large number of clients while safeguarding the privacy
of their local datasets. However, FL remains susceptible to vulnerabilities
such as privacy inference and inversion attacks. Single-server secure
aggregation schemes were proposed to address these threats. Nonetheless, they
encounter practical constraints due to their round and communication
complexities. This work introduces Fluent, a round and communication-efficient
secure aggregation scheme for private FL. Fluent has several improvements
compared to state-of-the-art solutions like Bell et al. (CCS 2020) and Ma et
al. (SP 2023): (1) it eliminates frequent handshakes and secret sharing
operations by efficiently reusing the shares across multiple training
iterations without leaking any private information; (2) it accomplishes both
the consistency check and gradient unmasking in one logical step, thereby
reducing another round of communication. With these innovations, Fluent
achieves the fewest communication rounds (i.e., two in the collection phase) in
the malicious server setting, in contrast to at least three rounds in existing
schemes. This significantly minimizes the latency for geographically
distributed clients; (3) Fluent also introduces Fluent-Dynamic with a
participant selection algorithm and an alternative secret sharing scheme. This
can facilitate dynamic client joining and enhance the system flexibility and
scalability. We implemented Fluent and compared it with existing solutions.
Experimental results show that Fluent improves the computational cost by at
least 75% and communication overhead by at least 25% for normal clients. Fluent
also reduces the communication overhead for the server at the expense of a
marginal increase in computational cost
DXVNet-ViT-Huge (JFT) Multimode Classification Network Based on Vision Transformer
Aiming at the problem that traditional CNN network is not good at extracting global features of images, Based on DXVNet network, Conditional Random Fields (CRF) component and pre-trained ViT-Huge (Vision Transformer) are adopted in this paper Transformer model expands and builds a brand new DXVNet-ViT-Huge (JFT) network. CRF component can help the network learn the constraint conditions of each word corresponding prediction label, improve the D-GRU method based word label prediction errors, and improve the accuracy of sequence annotation. The Transformer architecture of the ViT (Huge) model can extract the global feature information of the image, while CNN is better at extracting the local features of the image. Therefore, the ViT (Huge) Huge pre-training model and CNN pre-training model adopt the multi-modal feature fusion technology. Two complementary image feature information is fused by Bi-GRU to improve the performance of network classification. The experimental results show that the newly constructed Dxvnet-Vit-Huge (JFT) model achieves good performance, and the F1 values in the two real public data sets are 6.03% and 7.11% higher than the original DXVNet model, respectively
Deep Radon Prior: A Fully Unsupervised Framework for Sparse-View CT Reconstruction
Although sparse-view computed tomography (CT) has significantly reduced
radiation dose, it also introduces severe artifacts which degrade the image
quality. In recent years, deep learning-based methods for inverse problems have
made remarkable progress and have become increasingly popular in CT
reconstruction. However, most of these methods suffer several limitations:
dependence on high-quality training data, weak interpretability, etc. In this
study, we propose a fully unsupervised framework called Deep Radon Prior (DRP),
inspired by Deep Image Prior (DIP), to address the aforementioned limitations.
DRP introduces a neural network as an implicit prior into the iterative method,
thereby realizing cross-domain gradient feedback. During the reconstruction
process, the neural network is progressively optimized in multiple stages to
narrow the solution space in radon domain for the under-constrained imaging
protocol, and the convergence of the proposed method has been discussed in this
work. Compared with the popular pre-trained method, the proposed framework
requires no dataset and exhibits superior interpretability and generalization
ability. The experimental results demonstrate that the proposed method can
generate detailed images while effectively suppressing image
artifacts.Meanwhile, DRP achieves comparable or better performance than the
supervised methods.Comment: 11 pages, 12 figures, Journal pape
- …