Search CORE

22 research outputs found

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Author: Cui Yufeng
Sun Quan
Wang Jinsheng
Wang Xinlong
Yu Qiying
Zhang Fan
Zhang Xiaosong
Publication venue
Publication date: 06/02/2024
Field of study

Scaling up contrastive language-image pretraining (CLIP) is critical for empowering both vision and multimodal models. We present EVA-CLIP-18B, the largest and most powerful open-source CLIP model to date, with 18-billion parameters. With only 6-billion training samples seen, EVA-CLIP-18B achieves an exceptional 80.7% zero-shot top-1 accuracy averaged across 27 widely recognized image classification benchmarks, outperforming its forerunner EVA-CLIP (5-billion parameters) and other open-source CLIP models by a large margin. Remarkably, we observe a consistent performance improvement with the model size scaling of EVA-CLIP, despite maintaining a constant training dataset of 2-billion image-text pairs from LAION-2B and COYO-700M. This dataset is openly available and much smaller than the in-house datasets (e.g., DFN-5B, WebLI-10B) employed in other state-of-the-art CLIP models. EVA-CLIP-18B demonstrates the potential of EVA-style weak-to-strong visual model scaling. With our model weights made publicly available, we hope to facilitate future research in vision and multimodal foundation models

arXiv.org e-Print Archive

CapsFusion: Rethinking Image-Text Data at Scale

Author: Cao Yue
Cui Yufeng
Liu Jingjing
Sun Quan
Wang Xinlong
Yu Qiying
Zhang Fan
Zhang Xiaosong
Publication venue
Publication date: 05/04/2024
Field of study

Large multimodal models demonstrate remarkable generalist ability to perform diverse multimodal tasks in a zero-shot manner. Large-scale web-based image-text pairs contribute fundamentally to this success, but suffer from excessive noise. Recent studies use alternative captions synthesized by captioning models and have achieved notable benchmark performance. However, our experiments reveal significant Scalability Deficiency and World Knowledge Loss issues in models trained with synthetic captions, which have been largely obscured by their initial benchmark success. Upon closer examination, we identify the root cause as the overly-simplified language structure and lack of knowledge details in existing synthetic captions. To provide higher-quality and more scalable multimodal pretraining data, we propose CapsFusion, an advanced framework that leverages large language models to consolidate and refine information from both web-based image-text pairs and synthetic captions. Extensive experiments show that CapsFusion captions exhibit remarkable all-round superiority over existing captions in terms of model performance (e.g., 18.8 and 18.3 improvements in CIDEr score on COCO and NoCaps), sample efficiency (requiring 11-16 times less computation than baselines), world knowledge depth, and scalability. These effectiveness, efficiency and scalability advantages position CapsFusion as a promising candidate for future scaling of LMM training.Comment: CVPR 2024. Code & Dataset: https://github.com/baaivision/CapsFusio

arXiv.org e-Print Archive

Trust Dynamics in WSNs: An Evolutionary Game-Theoretic Approach

Author: En Fan
Jianhua Liu
Keli Hu
Longjun Huang
Qiying Cao
Shigen Shen
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

A sensor node (SN) in Wireless Sensor Networks (WSNs) can decide whether to collaborate with others based on a trust management system (TMS) by making a trust decision. In this paper, we study the trust decision and its dynamics that play a key role to stabilize the whole network using evolutionary game theory. When SNs are making their decisions to select action Trust or Mistrust, a WSNs trust game is created to reflect their utilities. An incentive mechanism bound with one SN’s trust degree is incorporated into this trust game and effectively promotes SNs to select action Trust. The replicator dynamics of SNs’ trust evolution, illustrating the evolutionary process of SNs selecting their actions, are given. We then propose and prove the theorems indicating that evolutionarily stable strategies can be attained under different parameter values, which supply theoretical foundations to devise a TMS for WSNs. Moreover, we can find out the conditions that will lead SNs to choose action Trust as their final behavior. In this manner, we can assure WSNs’ security and stability by introducing a trust mechanism to satisfy these conditions. Experimental results have confirmed the proposed theorems and the effects of the incentive mechanism

Crossref

Directory of Open Access Journals

Generative Pretraining in Multimodality

Author: Cui Yufeng
Gao Hongcheng
Huang Tiejun
Liu Jingjing
Sun Quan
Wang Xinlong
Wang Yueze
Yu Qiying
Zhang Fan
Zhang Xiaosong
Publication venue
Publication date: 11/07/2023
Field of study

We present Emu, a Transformer-based multimodal foundation model, which can seamlessly generate images and texts in multimodal context. This omnivore model can take in any single-modality or multimodal data input indiscriminately (e.g., interleaved image, text and video) through a one-model-for-all autoregressive training process. First, visual signals are encoded into embeddings, and together with text tokens form an interleaved input sequence. Emu is then end-to-end trained with a unified objective of classifying the next text token or regressing the next visual embedding in the multimodal sequence. This versatile multimodality empowers the exploration of diverse pretraining data sources at scale, such as videos with interleaved frames and text, webpages with interleaved images and text, as well as web-scale image-text pairs and video-text pairs. Emu can serve as a generalist multimodal interface for both image-to-text and text-to-image tasks, and supports in-context image and text generation. Across a broad range of zero-shot/few-shot tasks including image captioning, visual question answering, video question answering and text-to-image generation, Emu demonstrates superb performance compared to state-of-the-art large multimodal models. Extended capabilities such as multimodal assistants via instruction tuning are also demonstrated with impressive performance.Comment: Code and Demo: https://github.com/baaivision/Em

arXiv.org e-Print Archive

Gravin Scaffolding Protein Mediates Signaling during Atherosclerosis

Author: Fan Qiying
Publication venue
Publication date: 08/01/2020
Field of study

Atherosclerosis is an inflammatory response that principally in the walls of arteries, contributing to cardiovascular mortality, which can lead to growth factors over-release. Gravin, an A-Kinase Anchoring Proteins, targets Protein Kinase A (PKA), Protein Kinase C (PKC), and Ca2+/Calmodulin-dependent Protein Kinase (CaMKII) and mediates intracellular signaling. This gravin coordinated kinase-dependent substrate phosphorylation leads to changes in intracellular Ca2+, which is associated with cell proliferation and migration. Additionally, recent studies show that gravin can regulate lipid metabolism in the liver and affects cell proliferation and migration. To study the role of gravin in atherosclerosis, five week-old wild-type (WT) and gravin-truncated (gravin-t/t) mice were subjected to high fat diet (HFD) or normal diet (ND) for 16-weeks. Cholesterol, triglyceride and VLDL level in serum were significantly decreased in gravin-t/t HFD compared to WT HFD mice. Furthermore, gravin-t/t (HFD) mice showed lower liver-to-body weight ratio as well as decreased lipid accumulation and decreased liver damage. In addition, gravin-t/t (HFD) mice showed lower genes expressions related to cholesterol biosynthesis via less activation of the sterol-regulatory-element-binding protein 2 (SREBP2). We also observed less aortic lipid accumulation and lower blood pressure in gravin-t/t (HFD) mice. VSMCs, play a crucial role in the progression of atherosclerosis. Gravin-t/t VSMCs showed decreased Ang II induced VSMCs migration and proliferation when compared to WT VSMCs. We also observed less migration in Gravin-t/t VSMCs PDGF. Furthermore, gravin-t/t VSMCs exhibited decreased PKC activity and lower intracellular Ca2+ transients after either Ang II or PDGF treatment. These changes were accompanied by significant differences in PKC phosphorylation and PKC-dependent substrate phosphorylation, which involved ERK1/2 signaling pathway. In conclusion, these findings indicated that the absence of gravin mediated signaling was able to decrease lipid metabolism and accumulation in the liver and lipid accumulation in the aorta as well as to lower the blood pressure in response to HFD. In addition, our findings indicated that in the absence of gravin mediated signaling, Ang II and PDGF induced VSMCs proliferation and migration were suppressed in gravin-t/t VSMCs. Taken together, our data indicates that gravin is involved in the initiation and progression of atherosclerosis and/or vascular remodeling.Pharmacological and Pharmaceutical Sciences, Department o

University of Houston Institutional Repository (UHIR)

Recommended from our members

Workflow and Issue Management

Author: Fan Qiying
Simoneau Catherine Morgan
Publication venue: Worcester Polytechnic Institute - Gordon Library
Publication date: 16/01/2012
Field of study

This project, completed at Deutsche Bank, aims to improve workflow and issue management among group divisions and across borders. The project goals are to create a best practices document (BPD) compiling suggestions for enhancements to the existing issue management module, and to develop mock-up functional screens illustrating the best practices. The WPI MQP Team conducted research and interviews to identify functionality gaps needing improvement. The BPD proposes options for closing these gaps, thereby ensuring financial and managerial transparency and efficiency

Digital WPI

Recommended from our members

Winning strategy for the Littlefield simulation game -- a system dynamics approach

Author: Fan Qiying
Liu Mengjie
Thahirally Murtaza Turab
Wang Siqi
Publication venue: Worcester Polytechnic Institute - Gordon Library
Publication date: 01/01/2011
Field of study

Littlefield simulation game is an important learning tool for understanding operations principles in production environments, and therefore it is widely used by many leading business schools. This project attempts to model this game using system dynamics approach, which allows realistic representation of the production system of Littlefield simulation. It also creates an environment for comparison of the consequences of various policies

Digital WPI

NONPARAMETRIC COINTEGRATING REGRESSION WITH ENDOGENEITY AND LONG MEMORY

Author: Chan
Fan
Kasparis
Peter C. B. Phillips
Qiying Wang
Wang
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

ASYMPTOTIC THEORY FOR ZERO ENERGY FUNCTIONALS WITH NONPARAMETRIC REGRESSION APPLICATIONS

Author: Akonom
Billingsley
Borodin
Fan
Luklacs
Peter C.B. Phillips
Petrov
Qiying Wang
Wang
Publication venue: 'Cambridge University Press (CUP)'
Publication date
Field of study

Crossref

A non-cooperative non-zero-sum game-based dependability assessment of heterogeneous WSNs with malware diffusion

Author: Cao Qiying
Fan En
Hu Keli
Liu Jianhua
Ma Haiping
Shen Shigen
Yu Shui
Publication venue: 'Elsevier BV'
Publication date: 01/08/2017
Field of study

Deakin Research Online