Search CORE

952 research outputs found

What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?

Author: Du Chenzhuang
Huang Yu
Li Siting
Zhao Hang
Zhao Yue
Publication venue
Publication date: 10/10/2023
Field of study

With the growing success of multi-modal learning, research on the robustness of multi-modal models, especially when facing situations with missing modalities, is receiving increased attention. Nevertheless, previous studies in this domain exhibit certain limitations, as they often lack theoretical insights or their methodologies are tied to specific network architectures or modalities. We model the scenarios of multi-modal models encountering missing modalities from an information-theoretic perspective and illustrate that the performance ceiling in such scenarios can be approached by efficiently utilizing the information inherent in non-missing modalities. In practice, there are two key aspects: (1) The encoder should be able to extract sufficiently good features from the non-missing modality; (2) The extracted features should be robust enough not to be influenced by noise during the fusion process across modalities. To this end, we introduce Uni-Modal Ensemble with Missing Modality Adaptation (UME-MMA). UME-MMA employs uni-modal pre-trained weights for the multi-modal model to enhance feature extraction and utilizes missing modality data augmentation techniques to better adapt to situations with missing modalities. Apart from that, UME-MMA, built on a late-fusion learning framework, allows for the plug-and-play use of various encoders, making it suitable for a wide range of modalities and enabling seamless integration of large-scale pre-trained encoders to further enhance performance. And we demonstrate UME-MMA's effectiveness in audio-visual datasets~(e.g., AV-MNIST, Kinetics-Sound, AVE) and vision-language datasets~(e.g., MM-IMDB, UPMC Food101)

arXiv.org e-Print Archive

Global burden of colistin-resistant bacteria : mobilized colistin resistance genes study (1980-2018)

Author: Elbediwi Mohammed
Fang Weihuan
Feng Youjun
Li Xiaoliang
Li Yan
Pan Hang
Paudyal Narayan
Rajkovic Andreja
Rankin Shelley C.
Xie Shaohua
Yue Min
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Colistin is considered to be an antimicrobial of last-resort for the treatment of multidrug-resistant Gram-negative bacterial infections. The recent global dissemination of mobilized colistin resistance (mcr) genes is an urgent public health threat. An accurate estimate of the global prevalence of mcr genes, their reservoirs and the potential pathways for human transmission are required to implement control and prevention strategies, yet such data are lacking. Publications from four English (PubMed, Scopus, the Cochrane Database of Systematic Reviews and Web of Science) and two Chinese (CNKI and WANFANG) databases published between 18 November 2015 and 30 December 2018 were identified. In this systematic review and meta-analysis, the prevalence of mcr genes in bacteria isolated from humans, animals, the environment and food products were investigated. A total of 974 publications were identified. 202 observational studies were included in the systematic review and 71 in the meta-analysis. mcr genes were reported from 47 countries across six continents and the overall average prevalence was 4.7% (0.1-9.3%). China reported the highest number of mcr-positive strains. Pathogenic Escherichia coli (54%), isolated from animals (52%) and harboring an IncI2 plasmid (34%) were the bacteria with highest prevalence of mcr genes. The estimated prevalence of mcr-1 pathogenic E. coli was higher in food-animals than in humans and food products, which suggests a role for foodborne transmission. This study provides a comprehensive assessment of prevalence of the mcr gene by source, organism, genotype and type of plasmid

Ghent University Academic Bibliography

All are Worth Words: A ViT Backbone for Diffusion Models

Author: Bao Fan
Cao Yue
Li Chongxuan
Nie Shen
Su Hang
Xue Kaiwen
Zhu Jun
Publication venue
Publication date: 25/03/2023
Field of study

Vision transformers (ViT) have shown promise in various vision tasks while the U-Net based on a convolutional neural network (CNN) remains dominant in diffusion models. We design a simple and general ViT-based architecture (named U-ViT) for image generation with diffusion models. U-ViT is characterized by treating all inputs including the time, condition and noisy image patches as tokens and employing long skip connections between shallow and deep layers. We evaluate U-ViT in unconditional and class-conditional image generation, as well as text-to-image generation tasks, where U-ViT is comparable if not superior to a CNN-based U-Net of a similar size. In particular, latent diffusion models with U-ViT achieve record-breaking FID scores of 2.29 in class-conditional image generation on ImageNet 256x256, and 5.48 in text-to-image generation on MS-COCO, among methods without accessing large external datasets during the training of generative models. Our results suggest that, for diffusion-based image modeling, the long skip connection is crucial while the down-sampling and up-sampling operators in CNN-based U-Net are not always necessary. We believe that U-ViT can provide insights for future research on backbones in diffusion models and benefit generative modeling on large scale cross-modality datasets.Comment: Accepted to CVPR 202

arXiv.org e-Print Archive

2,3,4-Trihydroxybenzoic acid 0.25-hydrate

Author: Cai Fang
Dong Fu-Yue
Jiang Ren-Wang
Li Jin-Hang
Yuan Xiao-Feng
Publication venue: International Union of Crystallography
Publication date: 01/03/2012
Field of study

The asymmetric unit of the title compound, C7H6O5·0.25H2O, contains two molecules of 2,3,4-trihydroxybenzoic acid, with similar conformations, and one water molecule which lies on a twofold rotation axis. Both acid molecules are essentially planar [maximum r.m.s deviations = 0.0324 (2) and 0.0542 (3) Å for the two acid molecules]. The molecular conformations are stabilized by intramolecular O(phenol)—H⋯O(carboxyl/phenol) interactions. A cyclic intermolecular association is formed between the two acid and one water molecule [graph set R 3 3(12)] involving O—H⋯O hydrogen bonds. The two acid molecules are further linked through a cyclic R 2 2(8) carboxylic acid hydrogen-bonding association, which together with intermolecular O—H⋯O hydrogen-bonding interactions involving the phenol groups and the water molecule, and weak π–π interactions [minimum ring centroid separation = 3.731 (3) Å], give a three-dimensional network

Crossref

Directory of Open Access Journals

PubMed Central

One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale

Author: Bao Fan
Cao Yue
Li Chongxuan
Nie Shen
Pu Shi
Su Hang
Wang Yaole
Xue Kaiwen
Yue Gang
Zhu Jun
Publication venue
Publication date: 30/05/2023
Field of study

This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit all distributions relevant to a set of multi-modal data in one model. Our key insight is -- learning diffusion models for marginal, conditional, and joint distributions can be unified as predicting the noise in the perturbed data, where the perturbation levels (i.e. timesteps) can be different for different modalities. Inspired by the unified view, UniDiffuser learns all distributions simultaneously with a minimal modification to the original diffusion model -- perturbs data in all modalities instead of a single modality, inputs individual timesteps in different modalities, and predicts the noise of all modalities instead of a single modality. UniDiffuser is parameterized by a transformer for diffusion models to handle input types of different modalities. Implemented on large-scale paired image-text data, UniDiffuser is able to perform image, text, text-to-image, image-to-text, and image-text pair generation by setting proper timesteps without additional overhead. In particular, UniDiffuser is able to produce perceptually realistic samples in all tasks and its quantitative results (e.g., the FID and CLIP score) are not only superior to existing general-purpose models but also comparable to the bespoken models (e.g., Stable Diffusion and DALL-E 2) in representative tasks (e.g., text-to-image generation).Comment: Accepted to ICML202

arXiv.org e-Print Archive

On Uni-Modal Feature Learning in Supervised Multi-Modal Learning

Author: Du Chenzhuang
Li Tingle
Liu Yichen
Teng Jiaye
Wang Yue
Yuan Tianyuan
Yuan Yang
Zhao Hang
Publication venue
Publication date: 23/06/2023
Field of study

We abstract the features (i.e. learned representations) of multi-modal data into 1) uni-modal features, which can be learned from uni-modal training, and 2) paired features, which can only be learned from cross-modal interactions. Multi-modal models are expected to benefit from cross-modal interactions on the basis of ensuring uni-modal feature learning. However, recent supervised multi-modal late-fusion training approaches still suffer from insufficient learning of uni-modal features on each modality. We prove that this phenomenon does hurt the model's generalization ability. To this end, we propose to choose a targeted late-fusion learning method for the given supervised multi-modal task from Uni-Modal Ensemble(UME) and the proposed Uni-Modal Teacher(UMT), according to the distribution of uni-modal and paired features. We demonstrate that, under a simple guiding strategy, we can achieve comparable results to other complex late-fusion or intermediate-fusion methods on various multi-modal datasets, including VGG-Sound, Kinetics-400, UCF101, and ModelNet40

arXiv.org e-Print Archive

LSDP5 Enhances Triglyceride Storage in Hepatocytes by Influencing Lipolysis and Fatty Acid β-Oxidation of Lipid Droplets

Author: Gu Yu
Jiang Li-Na
Li Fan-Fan
Li Hang
Li Qing
Liu Fang
Pan Shu-Yi
Song Yue
Ye Jing
Zhang Li-Jun
Publication venue: Public Library of Science
Publication date: 01/06/2012
Field of study

Lipid storage droplet protein 5 (LSDP5) is a lipid droplet-associated protein of the PAT (perilipin, adipophilin, and TIP47) family that is expressed in the liver in a peroxisome proliferator-activated receptor alpha (PPARα)-dependent manner; however, its exact function has not been elucidated. We noticed that LSDP5 was localized to the surface of lipid droplets in hepatocytes. Overexpression of LSDP5 enhanced lipid accumulation in the hepatic cell line AML12 and in primary hepatocytes. Knock-down of LSDP5 significantly decreased the triglyceride content of lipid droplets, stimulated lipolysis, and modestly increased the mitochondrial content and level of fatty-acid β-oxidation in the mitochondria. The expression of PPARα was increased in LSDP5-deficient cells and required for the increase in the level of fatty acid β-oxidation in LSDP5-deficient cells. Using serial deletions of LSDP5, we determined that the lipid droplet-targeting domain and the domain directing lipid droplet clustering overlapped and were localized to the 188 amino acid residues at the N-terminus of LSDP5. Our findings suggest that LSDP5, a novel lipid droplet protein, may contribute to triglyceride accumulation by negatively regulating lipolysis and fatty acid oxidation in hepatocytes

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare