38 research outputs found
Multimodal Federated Learning via Contrastive Representation Ensemble
With the increasing amount of multimedia data on modern mobile systems and
IoT infrastructures, harnessing these rich multimodal data without breaching
user privacy becomes a critical issue. Federated learning (FL) serves as a
privacy-conscious alternative to centralized machine learning. However,
existing FL methods extended to multimodal data all rely on model aggregation
on single modality level, which restrains the server and clients to have
identical model architecture for each modality. This limits the global model in
terms of both model complexity and data capacity, not to mention task
diversity. In this work, we propose Contrastive Representation Ensemble and
Aggregation for Multimodal FL (CreamFL), a multimodal federated learning
framework that enables training larger server models from clients with
heterogeneous model architectures and data modalities, while only communicating
knowledge on public dataset. To achieve better multimodal representation
fusion, we design a global-local cross-modal ensemble strategy to aggregate
client representations. To mitigate local model drift caused by two
unprecedented heterogeneous factors stemming from multimodal discrepancy
(modality gap and task gap), we further propose two inter-modal and intra-modal
contrasts to regularize local training, which complements information of the
absent modality for uni-modal clients and regularizes local clients to head
towards global consensus. Thorough evaluations and ablation studies on
image-text retrieval and visual question answering tasks showcase the
superiority of CreamFL over state-of-the-art FL methods and its practical
value.Comment: ICLR 2023. Code is available at https://github.com/FLAIR-THU/CreamF
Multimodal Molecular Pretraining via Modality Blending
Self-supervised learning has recently gained growing interest in molecular
modeling for scientific tasks such as AI-assisted drug discovery. Current
studies consider leveraging both 2D and 3D molecular structures for
representation learning. However, relying on straightforward alignment
strategies that treat each modality separately, these methods fail to exploit
the intrinsic correlation between 2D and 3D representations that reflect the
underlying structural characteristics of molecules, and only perform
coarse-grained molecule-level alignment. To derive fine-grained alignment and
promote structural molecule understanding, we introduce an atomic-relation
level "blend-then-predict" self-supervised learning approach, MoleBLEND, which
first blends atom relations represented by different modalities into one
unified relation matrix for joint encoding, then recovers modality-specific
information for 2D and 3D structures individually. By treating atom
relationships as anchors, MoleBLEND organically aligns and integrates visually
dissimilar 2D and 3D modalities of the same molecule at fine-grained atomic
level, painting a more comprehensive depiction of each molecule. Extensive
experiments show that MoleBLEND achieves state-of-the-art performance across
major 2D/3D molecular benchmarks. We further provide theoretical insights from
the perspective of mutual-information maximization, demonstrating that our
method unifies contrastive, generative (cross-modality prediction) and
mask-then-predict (single-modality prediction) objectives into one single
cohesive framework
Generative Pretraining in Multimodality
We present Emu, a Transformer-based multimodal foundation model, which can
seamlessly generate images and texts in multimodal context. This omnivore model
can take in any single-modality or multimodal data input indiscriminately
(e.g., interleaved image, text and video) through a one-model-for-all
autoregressive training process. First, visual signals are encoded into
embeddings, and together with text tokens form an interleaved input sequence.
Emu is then end-to-end trained with a unified objective of classifying the next
text token or regressing the next visual embedding in the multimodal sequence.
This versatile multimodality empowers the exploration of diverse pretraining
data sources at scale, such as videos with interleaved frames and text,
webpages with interleaved images and text, as well as web-scale image-text
pairs and video-text pairs. Emu can serve as a generalist multimodal interface
for both image-to-text and text-to-image tasks, and supports in-context image
and text generation. Across a broad range of zero-shot/few-shot tasks including
image captioning, visual question answering, video question answering and
text-to-image generation, Emu demonstrates superb performance compared to
state-of-the-art large multimodal models. Extended capabilities such as
multimodal assistants via instruction tuning are also demonstrated with
impressive performance.Comment: Code and Demo: https://github.com/baaivision/Em
Relationship between Apelin/APJ Signaling, Oxidative Stress, and Diseases
Apelin, a peptide hormone, is an endogenous ligand for G protein-coupled receptor and has been shown to be widely expressed in human and animal tissues, such as the central nervous system and adipose tissue. Recent studies indicate that the apelin/APJ system is involved in the regulation of multiple physiological and pathological processes, and it is associated with cardiovascular diseases, metabolic disorders, neurological diseases, ischemia-reperfusion injury, aging, eclampsia, deafness, and tumors. The occurrence and development of these diseases are closely related to the local inflammatory response. Oxidative stress is that the balance between oxidation and antioxidant is broken, and reactive oxygen species are produced in large quantities, causing cell or molecular damage, which leads to vascular damage and a series of inflammatory reactions. Hence, this article reviewed recent advances in the relationship between apelin/APJ and oxidative stress, and inflammation-related diseases, and highlights them as potential therapeutic targets for oxidative stress-related inflammatory diseases
Reliability Evaluation for Clustered WSNs under Malware Propagation
We consider a clustered wireless sensor network (WSN) under epidemic-malware propagation conditions and solve the problem of how to evaluate its reliability so as to ensure efficient, continuous, and dependable transmission of sensed data from sensor nodes to the sink. Facing the contradiction between malware intention and continuous-time Markov chain (CTMC) randomness, we introduce a strategic game that can predict malware infection in order to model a successful infection as a CTMC state transition. Next, we devise a novel measure to compute the Mean Time to Failure (MTTF) of a sensor node, which represents the reliability of a sensor node continuously performing tasks such as sensing, transmitting, and fusing data. Since clustered WSNs can be regarded as parallel-serial-parallel systems, the reliability of a clustered WSN can be evaluated via classical reliability theory. Numerical results show the influence of parameters such as the true positive rate and the false positive rate on a sensor node’s MTTF. Furthermore, we validate the method of reliability evaluation for a clustered WSN according to the number of sensor nodes in a cluster, the number of clusters in a route, and the number of routes in the WSN