54 research outputs found

    Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation

    Full text link
    Multimodal learning has seen great success mining data features from multiple modalities with remarkable model performance improvement. Meanwhile, federated learning (FL) addresses the data sharing problem, enabling privacy-preserved collaborative training to provide sufficient precious data. Great potential, therefore, arises with the confluence of them, known as multimodal federated learning. However, limitation lies in the predominant approaches as they often assume that each local dataset records samples from all modalities. In this paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal Prediction (UTMP) framework under the context of multimodal federated learning. We design HA-Fedformer, a novel transformer-based model that empowers unimodal training with only a unimodal dataset at the client and multimodal testing by aggregating multiple clients' knowledge for better accuracy. The key advantages are twofold. Firstly, to alleviate the impact of data non-IID, we develop an uncertainty-aware aggregation method for the local encoders with layer-wise Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of unaligned language sequence, we implement a cross-modal decoder aggregation to capture the hidden signal correlation between decoders trained by data from different modalities. Our experiments on popular sentiment analysis benchmarks, CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms state-of-the-art multimodal models under the UTMP federated learning frameworks, with 15%-20% improvement on most attributes.Comment: 10 pages,5 figure

    High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization

    Full text link
    Uncertainty quantification for estimation through stochastic optimization solutions in an online setting has gained popularity recently. This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level. Specifically, we propose to use a small number of independent multi-runs to acquire distribution information and construct a t-based confidence interval. Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free. We provide a rigorous theoretical guarantee for the confidence interval, demonstrating that the coverage is approximately exact with an explicit convergence rate and allowing for high confidence level inference. In particular, a new Gaussian approximation result is developed for the online estimators to characterize the coverage properties of our confidence intervals in terms of relative errors. Additionally, our method also allows for leveraging parallel computing to further accelerate calculations using multiple cores. It is easy to implement and can be integrated with existing stochastic algorithms without the need for complicated modifications

    On the Stability Analysis of Open Federated Learning Systems

    Full text link
    We consider the open federated learning (FL) systems, where clients may join and/or leave the system during the FL process. Given the variability of the number of present clients, convergence to a fixed model cannot be guaranteed in open systems. Instead, we resort to a new performance metric that we term the stability of open FL systems, which quantifies the magnitude of the learned model in open systems. Under the assumption that local clients' functions are strongly convex and smooth, we theoretically quantify the radius of stability for two FL algorithms, namely local SGD and local Adam. We observe that this radius relies on several key parameters, including the function condition number as well as the variance of the stochastic gradient. Our theoretical results are further verified by numerical simulations on both synthetic and real-world benchmark data-sets

    Compositional Federated Learning: Applications in Distributionally Robust Averaging and Meta Learning

    Full text link
    In the paper, we propose an effective and efficient Compositional Federated Learning (ComFedL) algorithm for solving a new compositional Federated Learning (FL) framework, which frequently appears in many machine learning problems with a hierarchical structure such as distributionally robust federated learning and model-agnostic meta learning (MAML). Moreover, we study the convergence analysis of our ComFedL algorithm under some mild conditions, and prove that it achieves a fast convergence rate of O(1T)O(\frac{1}{\sqrt{T}}), where TT denotes the number of iteration. To the best of our knowledge, our algorithm is the first work to bridge federated learning with composition stochastic optimization. In particular, we first transform the distributionally robust FL (i.e., a minimax optimization problem) into a simple composition optimization problem by using KL divergence regularization. At the same time, we also first transform the distribution-agnostic MAML problem (i.e., a minimax optimization problem) into a simple composition optimization problem. Finally, we apply two popular machine learning tasks, i.e., distributionally robust FL and MAML to demonstrate the effectiveness of our algorithm.Comment: 21 pages, 8 figure

    Decentralized convex optimization over time-varying graphs: a survey

    Full text link
    Decentralized optimization over time-varying networks has a wide range of applications in distributed learning, signal processing and various distributed control problems. The agents of the distributed system locally hold optimization objectives and can communicate to their immediate neighbors over a network that changes from time to time. In this paper, we survey state-of-the-art results and describe the techniques for optimization over time-varying graphs. We also give an overview of open questions in the field and formulate hypotheses and directions for future work

    Disaster cassification net: A disaster classification algorithm on remote sensing imagery

    Get PDF
    As we all know, natural disasters have a great impact on people’s lives and properties, and it is very necessary to deal with disaster categories in a timely and effective manner. In light of this, we propose using tandem stitching to create a new Disaster Cassification network D-Net (Disaster Cassification Net) using the D-Conv, D-Linear, D-model, and D-Layer modules. During the experiment, we compared the proposed method with “CNN” and “Transformer”, we found that disaster cassification net compared to CNN algorithm Params decreased by 26–608 times, FLOPs decreased by up to 21 times, Precision increased by 1.6%–43.5%; we found that disaster cassification net compared to Transformer algorithm Params decreased by 23–149 times, FLOPs decreased by 1.7–10 times, Precision increased by 3.9%–25.9%. Precision increased by 3.9%–25.9%. And found that disaster cassification net achieves the effect of SOTA(State-Of-The-Art) on the disaster dataset; After that, we compared the above-mentioned MobileNet_v2 with the best performance on the classification dataset and CCT network are compared with disaster cassification net on fashion_mnist and CIFAR_100 public datasets, respectively, and the results show that disaster cassification net can still achieve the state-of-the-art classification effect. Therefore, our proposed algorithm can be applied not only to disaster tasks, but also to other classification tasks

    Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities

    Full text link
    Variational inequalities are a broad and flexible class of problems that includes minimization, saddle point, fixed point problems as special cases. Therefore, variational inequalities are used in a variety of applications ranging from equilibrium search to adversarial learning. Today's realities with the increasing size of data and models demand parallel and distributed computing for real-world machine learning problems, most of which can be represented as variational inequalities. Meanwhile, most distributed approaches has a significant bottleneck - the cost of communications. The three main techniques to reduce both the total number of communication rounds and the cost of one such round are the use of similarity of local functions, compression of transmitted information and local updates. In this paper, we combine all these approaches. Such a triple synergy did not exist before for variational inequalities and saddle problems, nor even for minimization problems. The methods presented in this paper have the best theoretical guarantees of communication complexity and are significantly ahead of other methods for distributed variational inequalities. The theoretical results are confirmed by adversarial learning experiments on synthetic and real datasets.Comment: 19 pages, 2 algorithms, 1 tabl
    • …
    corecore