54 research outputs found
Unimodal Training-Multimodal Prediction: Cross-modal Federated Learning with Hierarchical Aggregation
Multimodal learning has seen great success mining data features from multiple
modalities with remarkable model performance improvement. Meanwhile, federated
learning (FL) addresses the data sharing problem, enabling privacy-preserved
collaborative training to provide sufficient precious data. Great potential,
therefore, arises with the confluence of them, known as multimodal federated
learning. However, limitation lies in the predominant approaches as they often
assume that each local dataset records samples from all modalities. In this
paper, we aim to bridge this gap by proposing an Unimodal Training - Multimodal
Prediction (UTMP) framework under the context of multimodal federated learning.
We design HA-Fedformer, a novel transformer-based model that empowers unimodal
training with only a unimodal dataset at the client and multimodal testing by
aggregating multiple clients' knowledge for better accuracy. The key advantages
are twofold. Firstly, to alleviate the impact of data non-IID, we develop an
uncertainty-aware aggregation method for the local encoders with layer-wise
Markov Chain Monte Carlo sampling. Secondly, to overcome the challenge of
unaligned language sequence, we implement a cross-modal decoder aggregation to
capture the hidden signal correlation between decoders trained by data from
different modalities. Our experiments on popular sentiment analysis benchmarks,
CMU-MOSI and CMU-MOSEI, demonstrate that HA-Fedformer significantly outperforms
state-of-the-art multimodal models under the UTMP federated learning
frameworks, with 15%-20% improvement on most attributes.Comment: 10 pages,5 figure
High Confidence Level Inference is Almost Free using Parallel Stochastic Optimization
Uncertainty quantification for estimation through stochastic optimization
solutions in an online setting has gained popularity recently. This paper
introduces a novel inference method focused on constructing confidence
intervals with efficient computation and fast convergence to the nominal level.
Specifically, we propose to use a small number of independent multi-runs to
acquire distribution information and construct a t-based confidence interval.
Our method requires minimal additional computation and memory beyond the
standard updating of estimates, making the inference process almost cost-free.
We provide a rigorous theoretical guarantee for the confidence interval,
demonstrating that the coverage is approximately exact with an explicit
convergence rate and allowing for high confidence level inference. In
particular, a new Gaussian approximation result is developed for the online
estimators to characterize the coverage properties of our confidence intervals
in terms of relative errors. Additionally, our method also allows for
leveraging parallel computing to further accelerate calculations using multiple
cores. It is easy to implement and can be integrated with existing stochastic
algorithms without the need for complicated modifications
On the Stability Analysis of Open Federated Learning Systems
We consider the open federated learning (FL) systems, where clients may join
and/or leave the system during the FL process. Given the variability of the
number of present clients, convergence to a fixed model cannot be guaranteed in
open systems. Instead, we resort to a new performance metric that we term the
stability of open FL systems, which quantifies the magnitude of the learned
model in open systems. Under the assumption that local clients' functions are
strongly convex and smooth, we theoretically quantify the radius of stability
for two FL algorithms, namely local SGD and local Adam. We observe that this
radius relies on several key parameters, including the function condition
number as well as the variance of the stochastic gradient. Our theoretical
results are further verified by numerical simulations on both synthetic and
real-world benchmark data-sets
Compositional Federated Learning: Applications in Distributionally Robust Averaging and Meta Learning
In the paper, we propose an effective and efficient Compositional Federated
Learning (ComFedL) algorithm for solving a new compositional Federated Learning
(FL) framework, which frequently appears in many machine learning problems with
a hierarchical structure such as distributionally robust federated learning and
model-agnostic meta learning (MAML). Moreover, we study the convergence
analysis of our ComFedL algorithm under some mild conditions, and prove that it
achieves a fast convergence rate of , where denotes
the number of iteration. To the best of our knowledge, our algorithm is the
first work to bridge federated learning with composition stochastic
optimization. In particular, we first transform the distributionally robust FL
(i.e., a minimax optimization problem) into a simple composition optimization
problem by using KL divergence regularization. At the same time, we also first
transform the distribution-agnostic MAML problem (i.e., a minimax optimization
problem) into a simple composition optimization problem. Finally, we apply two
popular machine learning tasks, i.e., distributionally robust FL and MAML to
demonstrate the effectiveness of our algorithm.Comment: 21 pages, 8 figure
Decentralized convex optimization over time-varying graphs: a survey
Decentralized optimization over time-varying networks has a wide range of
applications in distributed learning, signal processing and various distributed
control problems. The agents of the distributed system locally hold
optimization objectives and can communicate to their immediate neighbors over a
network that changes from time to time. In this paper, we survey
state-of-the-art results and describe the techniques for optimization over
time-varying graphs. We also give an overview of open questions in the field
and formulate hypotheses and directions for future work
Disaster cassification net: A disaster classification algorithm on remote sensing imagery
As we all know, natural disasters have a great impact on people’s lives and properties, and it is very necessary to deal with disaster categories in a timely and effective manner. In light of this, we propose using tandem stitching to create a new Disaster Cassification network D-Net (Disaster Cassification Net) using the D-Conv, D-Linear, D-model, and D-Layer modules. During the experiment, we compared the proposed method with “CNN” and “Transformer”, we found that disaster cassification net compared to CNN algorithm Params decreased by 26–608 times, FLOPs decreased by up to 21 times, Precision increased by 1.6%–43.5%; we found that disaster cassification net compared to Transformer algorithm Params decreased by 23–149 times, FLOPs decreased by 1.7–10 times, Precision increased by 3.9%–25.9%. Precision increased by 3.9%–25.9%. And found that disaster cassification net achieves the effect of SOTA(State-Of-The-Art) on the disaster dataset; After that, we compared the above-mentioned MobileNet_v2 with the best performance on the classification dataset and CCT network are compared with disaster cassification net on fashion_mnist and CIFAR_100 public datasets, respectively, and the results show that disaster cassification net can still achieve the state-of-the-art classification effect. Therefore, our proposed algorithm can be applied not only to disaster tasks, but also to other classification tasks
Similarity, Compression and Local Steps: Three Pillars of Efficient Communications for Distributed Variational Inequalities
Variational inequalities are a broad and flexible class of problems that
includes minimization, saddle point, fixed point problems as special cases.
Therefore, variational inequalities are used in a variety of applications
ranging from equilibrium search to adversarial learning. Today's realities with
the increasing size of data and models demand parallel and distributed
computing for real-world machine learning problems, most of which can be
represented as variational inequalities. Meanwhile, most distributed approaches
has a significant bottleneck - the cost of communications. The three main
techniques to reduce both the total number of communication rounds and the cost
of one such round are the use of similarity of local functions, compression of
transmitted information and local updates. In this paper, we combine all these
approaches. Such a triple synergy did not exist before for variational
inequalities and saddle problems, nor even for minimization problems. The
methods presented in this paper have the best theoretical guarantees of
communication complexity and are significantly ahead of other methods for
distributed variational inequalities. The theoretical results are confirmed by
adversarial learning experiments on synthetic and real datasets.Comment: 19 pages, 2 algorithms, 1 tabl
- …