952 research outputs found
What Makes for Robust Multi-Modal Models in the Face of Missing Modalities?
With the growing success of multi-modal learning, research on the robustness
of multi-modal models, especially when facing situations with missing
modalities, is receiving increased attention. Nevertheless, previous studies in
this domain exhibit certain limitations, as they often lack theoretical
insights or their methodologies are tied to specific network architectures or
modalities. We model the scenarios of multi-modal models encountering missing
modalities from an information-theoretic perspective and illustrate that the
performance ceiling in such scenarios can be approached by efficiently
utilizing the information inherent in non-missing modalities. In practice,
there are two key aspects: (1) The encoder should be able to extract
sufficiently good features from the non-missing modality; (2) The extracted
features should be robust enough not to be influenced by noise during the
fusion process across modalities. To this end, we introduce Uni-Modal Ensemble
with Missing Modality Adaptation (UME-MMA). UME-MMA employs uni-modal
pre-trained weights for the multi-modal model to enhance feature extraction and
utilizes missing modality data augmentation techniques to better adapt to
situations with missing modalities. Apart from that, UME-MMA, built on a
late-fusion learning framework, allows for the plug-and-play use of various
encoders, making it suitable for a wide range of modalities and enabling
seamless integration of large-scale pre-trained encoders to further enhance
performance. And we demonstrate UME-MMA's effectiveness in audio-visual
datasets~(e.g., AV-MNIST, Kinetics-Sound, AVE) and vision-language
datasets~(e.g., MM-IMDB, UPMC Food101)
Global burden of colistin-resistant bacteria : mobilized colistin resistance genes study (1980-2018)
Colistin is considered to be an antimicrobial of last-resort for the treatment of multidrug-resistant Gram-negative bacterial infections. The recent global dissemination of mobilized colistin resistance (mcr) genes is an urgent public health threat. An accurate estimate of the global prevalence of mcr genes, their reservoirs and the potential pathways for human transmission are required to implement control and prevention strategies, yet such data are lacking. Publications from four English (PubMed, Scopus, the Cochrane Database of Systematic Reviews and Web of Science) and two Chinese (CNKI and WANFANG) databases published between 18 November 2015 and 30 December 2018 were identified. In this systematic review and meta-analysis, the prevalence of mcr genes in bacteria isolated from humans, animals, the environment and food products were investigated. A total of 974 publications were identified. 202 observational studies were included in the systematic review and 71 in the meta-analysis. mcr genes were reported from 47 countries across six continents and the overall average prevalence was 4.7% (0.1-9.3%). China reported the highest number of mcr-positive strains. Pathogenic Escherichia coli (54%), isolated from animals (52%) and harboring an IncI2 plasmid (34%) were the bacteria with highest prevalence of mcr genes. The estimated prevalence of mcr-1 pathogenic E. coli was higher in food-animals than in humans and food products, which suggests a role for foodborne transmission. This study provides a comprehensive assessment of prevalence of the mcr gene by source, organism, genotype and type of plasmid
All are Worth Words: A ViT Backbone for Diffusion Models
Vision transformers (ViT) have shown promise in various vision tasks while
the U-Net based on a convolutional neural network (CNN) remains dominant in
diffusion models. We design a simple and general ViT-based architecture (named
U-ViT) for image generation with diffusion models. U-ViT is characterized by
treating all inputs including the time, condition and noisy image patches as
tokens and employing long skip connections between shallow and deep layers. We
evaluate U-ViT in unconditional and class-conditional image generation, as well
as text-to-image generation tasks, where U-ViT is comparable if not superior to
a CNN-based U-Net of a similar size. In particular, latent diffusion models
with U-ViT achieve record-breaking FID scores of 2.29 in class-conditional
image generation on ImageNet 256x256, and 5.48 in text-to-image generation on
MS-COCO, among methods without accessing large external datasets during the
training of generative models. Our results suggest that, for diffusion-based
image modeling, the long skip connection is crucial while the down-sampling and
up-sampling operators in CNN-based U-Net are not always necessary. We believe
that U-ViT can provide insights for future research on backbones in diffusion
models and benefit generative modeling on large scale cross-modality datasets.Comment: Accepted to CVPR 202
2,3,4-TrihyÂdroxyÂbenzoic acid 0.25-hydrate
The asymmetric unit of the title compound, C7H6O5·0.25H2O, contains two molÂecules of 2,3,4-trihyÂdroxyÂbenzoic acid, with similar conformations, and one water molÂecule which lies on a twofold rotation axis. Both acid molÂecules are essentially planar [maximum r.m.s deviations = 0.0324 (2) and 0.0542 (3) Å for the two acid molecules]. The molÂecular conformations are stabilized by intraÂmolecular O(phenol)—Hâ‹ŻO(carboxÂyl/phenol) interÂactions. A cyclic interÂmolecular association is formed between the two acid and one water molÂecule [graph set R
3
3(12)] involving O—Hâ‹ŻO hydrogen bonds. The two acid molÂecules are further linked through a cyclic R
2
2(8) carbÂoxyÂlic acid hydrogen-bonding association, which together with interÂmolecular O—Hâ‹ŻO hydrogen-bonding interÂactions involving the phenol groups and the water molÂecule, and weak π–π interÂactions [minimum ring centroid separation = 3.731 (3) Å], give a three-dimensional network
One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale
This paper proposes a unified diffusion framework (dubbed UniDiffuser) to fit
all distributions relevant to a set of multi-modal data in one model. Our key
insight is -- learning diffusion models for marginal, conditional, and joint
distributions can be unified as predicting the noise in the perturbed data,
where the perturbation levels (i.e. timesteps) can be different for different
modalities. Inspired by the unified view, UniDiffuser learns all distributions
simultaneously with a minimal modification to the original diffusion model --
perturbs data in all modalities instead of a single modality, inputs individual
timesteps in different modalities, and predicts the noise of all modalities
instead of a single modality. UniDiffuser is parameterized by a transformer for
diffusion models to handle input types of different modalities. Implemented on
large-scale paired image-text data, UniDiffuser is able to perform image, text,
text-to-image, image-to-text, and image-text pair generation by setting proper
timesteps without additional overhead. In particular, UniDiffuser is able to
produce perceptually realistic samples in all tasks and its quantitative
results (e.g., the FID and CLIP score) are not only superior to existing
general-purpose models but also comparable to the bespoken models (e.g., Stable
Diffusion and DALL-E 2) in representative tasks (e.g., text-to-image
generation).Comment: Accepted to ICML202
On Uni-Modal Feature Learning in Supervised Multi-Modal Learning
We abstract the features (i.e. learned representations) of multi-modal data
into 1) uni-modal features, which can be learned from uni-modal training, and
2) paired features, which can only be learned from cross-modal interactions.
Multi-modal models are expected to benefit from cross-modal interactions on the
basis of ensuring uni-modal feature learning. However, recent supervised
multi-modal late-fusion training approaches still suffer from insufficient
learning of uni-modal features on each modality. We prove that this phenomenon
does hurt the model's generalization ability. To this end, we propose to choose
a targeted late-fusion learning method for the given supervised multi-modal
task from Uni-Modal Ensemble(UME) and the proposed Uni-Modal Teacher(UMT),
according to the distribution of uni-modal and paired features. We demonstrate
that, under a simple guiding strategy, we can achieve comparable results to
other complex late-fusion or intermediate-fusion methods on various multi-modal
datasets, including VGG-Sound, Kinetics-400, UCF101, and ModelNet40
LSDP5 Enhances Triglyceride Storage in Hepatocytes by Influencing Lipolysis and Fatty Acid β-Oxidation of Lipid Droplets
Lipid storage droplet protein 5 (LSDP5) is a lipid droplet-associated protein of the PAT (perilipin, adipophilin, and TIP47) family that is expressed in the liver in a peroxisome proliferator-activated receptor alpha (PPARα)-dependent manner; however, its exact function has not been elucidated. We noticed that LSDP5 was localized to the surface of lipid droplets in hepatocytes. Overexpression of LSDP5 enhanced lipid accumulation in the hepatic cell line AML12 and in primary hepatocytes. Knock-down of LSDP5 significantly decreased the triglyceride content of lipid droplets, stimulated lipolysis, and modestly increased the mitochondrial content and level of fatty-acid β-oxidation in the mitochondria. The expression of PPARα was increased in LSDP5-deficient cells and required for the increase in the level of fatty acid β-oxidation in LSDP5-deficient cells. Using serial deletions of LSDP5, we determined that the lipid droplet-targeting domain and the domain directing lipid droplet clustering overlapped and were localized to the 188 amino acid residues at the N-terminus of LSDP5. Our findings suggest that LSDP5, a novel lipid droplet protein, may contribute to triglyceride accumulation by negatively regulating lipolysis and fatty acid oxidation in hepatocytes
- …