Search CORE

203 research outputs found

Exploiting Category Names for Few-Shot Classification with Vision-Language Models

Author: Cao Liangliang
Dai Shengyang
Wang Zirui
Xiao Taihong
Yang Ming-Hsuan
Yu Jiahui
Publication venue
Publication date: 18/04/2023
Field of study

Vision-language foundation models pretrained on large-scale data provide a powerful tool for many visual understanding tasks. Notably, many vision-language models build two encoders (visual and textual) that can map two modalities into the same embedding space. As a result, the learned representations achieve good zero-shot performance on tasks like image classification. However, when there are only a few examples per category, the potential of large vision-language models is often underperformed, mainly due to the gap between a large number of parameters and a relatively small amount of training data. This paper shows that we can significantly improve the performance of few-shot classification by using the category names to initialize the classification head. With the proposed category name initialization method, our model obtains the state-of-the-art performance on a number of few-shot image classification benchmarks (e.g., 87.37% on ImageNet and 96.08% on Stanford Cars, both using five-shot learning)

arXiv.org e-Print Archive

IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers

Author: Cao Yuan
Qiao Siyuan
Yang Chenglin
Yu Jiahui
Yuille Alan
Zhang Yu
Zhu Tao
Publication venue
Publication date: 27/11/2023
Field of study

Generative training has been demonstrated to be powerful for building visual-language models. However, on zero-shot discriminative benchmarks, there is still a performance gap between models trained with generative and discriminative objectives. In this paper, we aim to narrow this gap by improving the efficacy of generative training on classification tasks, without any finetuning processes or additional modules. Specifically, we focus on narrowing the gap between the generative captioner and the CLIP classifier. We begin by analysing the predictions made by the captioner and classifier and observe that the caption generation inherits the distribution bias from the language model trained with pure text modality, making it less grounded on the visual signal. To tackle this problem, we redesign the scoring objective for the captioner to alleviate the distributional bias and focus on measuring the gain of information brought by the visual inputs. We further design a generative training objective to match the evaluation objective. We name our model trained and evaluated from the novel procedures as Information Gain (IG) captioner. We pretrain the models on the public Laion-5B dataset and perform a series of discriminative evaluations. For the zero-shot classification on ImageNet, IG captioner achieves

> 18\%

improvements over the standard captioner, achieving comparable performances with the CLIP classifier. IG captioner also demonstrated strong performance on zero-shot image-text retrieval tasks on MSCOCO and Flickr30K. We hope this paper inspires further research towards unifying generative and discriminative training procedures for visual-language models

arXiv.org e-Print Archive

Human-System Integration

Author: Cai Jiahui
Cao Xinyue
Liu Siyu
Shi Xiao
Wang Xianpeng
Wu Wanyu
Xu Yutong
Publication venue: 'Purdue University (bepress)'
Publication date: 14/03/2016
Field of study

Purdue E-Pubs

Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners

Author: Cao Yuan
Ghosh Soham
Wang Zirui
Wu Yonghui
Yan Shen
Yu Jiahui
Zhang Mi
Zhu Tao
Publication venue
Publication date: 09/12/2022
Field of study

This work explores an efficient approach to establish a foundational video-text model for tasks including open-vocabulary video classification, text-to-video retrieval, video captioning and video question-answering. We present VideoCoCa that reuses a pretrained image-text contrastive captioner (CoCa) model and adapt it to video-text tasks with minimal extra training. While previous works adapt image-text models with various cross-frame fusion modules (for example, cross-frame attention layer or perceiver resampler) and finetune the modified architecture on video-text data, we surprisingly find that the generative attentional pooling and contrastive attentional pooling layers in the image-text CoCa design are instantly adaptable to ``flattened frame embeddings'', yielding a strong zero-shot transfer baseline for many video-text tasks. Specifically, the frozen image encoder of a pretrained image-text CoCa takes each video frame as inputs and generates

N

token embeddings per frame for totally

T

video frames. We flatten

N \times T

token embeddings as a long sequence of frozen video representation and apply CoCa's generative attentional pooling and contrastive attentional pooling on top. All model weights including pooling layers are directly loaded from an image-text CoCa pretrained model. Without any video or video-text data, VideoCoCa's zero-shot transfer baseline already achieves state-of-the-art results on zero-shot video classification on Kinetics 400/600/700, UCF101, HMDB51, and Charades, as well as zero-shot text-to-video retrieval on MSR-VTT and ActivityNet Captions. We also explore lightweight finetuning on top of VideoCoCa, and achieve strong results on video question-answering (iVQA, MSRVTT-QA, MSVD-QA) and video captioning (MSR-VTT, ActivityNet, Youcook2). Our approach establishes a simple and effective video-text baseline for future research.Comment: Technical repor

arXiv.org e-Print Archive

Improved Road-Network-Flow Control Strategy Based on Macroscopic Fundamental Diagrams and Queuing Length in Connected-Vehicle Network

Author: Chengtao Cao
Jiahui Liu
Jianmin Xu
Peiqun Lin
Xiaohui Lin
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2017
Field of study

Crossref

Coronin 1B Controls Endothelial Actin Dynamics at Cell-Cell Junctions and Is Required for Endothelial Network Assembly

Author: Cao Jiahui
Forne Ignasi
Maier-Begandt Daniela
Montañez Eloi
Pitter Bettina
Salvermoser Melanie
Schnittler Hans-Joachim
Walzog Barbara
Weckbach Ludwig T.
Werner Ann-Cathrin
Publication venue: 'Frontiers Media SA'
Publication date: 31/07/2020
Field of study

Development and homeostasis of blood vessels critically depend on the regulation of endothelial cell-cell junctions. VE-cadherin (VEcad)-based cell-cell junctions are connected to the actin cytoskeleton and regulated by actin-binding proteins. Coronin 1B (Coro1B) is an actin binding protein that controls actin networks at classical lamellipodia. The role of Coro1B in endothelial cells (ECs) is not fully understood and investigated in this study. Here, we demonstrate that Coro1B is a novel component and regulator of cell-cell junctions in ECs. Immunofluorescence studies show that Coro1B colocalizes with VEcad at cell-cell junctions in monolayers of ECs. Live-cell imaging reveals that Coro1B is recruited to, and operated at actin-driven membrane protrusions at cell-cell junctions. Coro1B is recruited to cell-cell junctions via a mechanism that requires the relaxation of the actomyosin cytoskeleton. By analyzing the Coro1B interactome, we identify integrin-linked kinase (ILK) as new Coro1B-associated protein. Coro1B colocalizes with α-parvin, an interactor of ILK, at the leading edge of lamellipodia protrusions. Functional experiments reveal that depletion of Coro1B causes defects in the actin cytoskeleton and cell-cell junctions. Finally, in matrigel tube network assays, depletion of Coro1B results in reduced network complexity, tube number and tube length. Together, our findings point toward a critical role for Coro1B in the dynamic remodeling of endothelial cell-cell junctions and the assembly of endothelial networks

Open Access LMU

Diposit Digital de la Universitat de Barcelona

Enhancing heat stress tolerance in Lanzhou lily (Lilium davidii var. unicolor) with Trichokonins isolated from Trichoderma longibrachiatum SMF2

Author: Dong Hou
Haiyan Li
Jiahui Liang
Juanjuan Sui
Tao Liu
Wenxiu Yue
Xing Cao
Ze Wu
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2023
Field of study

Lanzhou lily (Lilium davidii var. unicolor) is a renowned edible crop produced in China and relatively sensitive to high temperature (HT). Trichokonins (TKs) are antimicrobial peptaibols secreted from Trichoderma longibrachiatum strain SMF2. Here, we report that TKs application improves the thermotolerance of Lanzhou lily. The activity of the antioxidant enzyme system (SOD, CAT, and POD), the level of heat-resistance-associated phytohormones (ABA, SA, and JA), the relative water content (RWC), the content of chlorophyll (Chl), and the net photosynthetic rate (Pn) were promoted by TKs treatment in Lanzhou lily plants subjected to heat stress (HS). TKs treatment also mitigated cell injury as shown by a lower accumulation of malondialdehyde (MDA) and relative electrolyte leakage (REL) under HS conditions. RNA-seq data analysis showed that more than 4.5 times differentially expressed genes (DEGs) responded to TKs treatment under HS compared to non-HS, and TKs treatment reduced protein folding and enhanced cellular repair function under HS conditions. The analyses of DEGs involved in hormone (ABA, SA and JA) synthesis and signaling pathways suggested that TKs might improve Lanzhou lily heat tolerance by promoting ABA synthesis and signal transduction. TKs highly induced DEGs of the HSF-HSP pathway under HS, in which HSFA2 accounted for most of the HSF family. Furthermore, TKs treatment resulted in the upregulation of heat-protective genes LzDREB2B, LzHsfA2a, LzMBF1c, LzHsp90, and LzHsp70 involved in HSF-HSP signal pathway after long-term HS. LzHsfA2a-1 likely plays a key role in acquisition of TKs-induced thermotolerance of Lanzhou lily as evidenced by the sustained response to HS, the enhanced response to TKs treatment under long-term HS, and the high sequence similarity to LlHsfA2a which is a key regulator for the improvement of heat tolerance in Lilium longiflorum. Our results reveal the underlying mechanisms of TKs-mediated thermotolerance in Lanzhou lily and highlight an attractive approach to protecting crop plants from damage caused by HS in a global warming future

Directory of Open Access Journals

Effects of habitat differences on the scatter-hoarding behaviour of rodents (Mammalia, Rodentia) in temperate forests

Author: Chengzhi Zhang
Dianwei Li
Hongjia Shan
Hongwei Ni
Jiahui Liu
Ming Gao
Yuwei Cao
Zhimin Jin
Publication venue: 'Pensoft Publishers'
Publication date: 01/01/2023
Field of study

To discover the differences in hoarding strategies of rodents for different seeds in different habitats, we labelled and released three different types of seeds, including Pinus koraiensis, Corylus mandshurica, and Quercus mongolica, in temperate forests of northeastern China and investigated the fate of seeds in four different habitats that included a broad-leaved forest, mixed-forest edge, mixed forest, and artificial larch forest. Our research showed that the hoarding strategy of rodents was found to vary substantially in different habitats. The survival curves of seeds from different habitats showed the same trend, but the rates of consumption in different habitats varied. More than 50% of the seeds in the four habitats were consumed by the tenth day. It took 20 days to consume more than 70% of the seeds. The rate of consumption of P. koraiensis seeds reached 96.70%; 99.09% of the C. mandshurica seeds were consumed, and 93.07% of the Q. mongolica seeds were consumed. The seeds were consumed most quickly in the artificial larch forest. In general, most of the early seeds were quickly devoured. After day 20, the consumption gradually decreased. Rodents found the seeds in the artificial larch forest in a shorter average time than those in the other types of forests. The average earliest discovery time was 1.4 ± 0.9 d (1–3 d). The average earliest discovery time in all the other three habitats exceeded 7 d. The median removal times (MRT) was distributed around the seeds at 14.24 ± 10.53 d (1–60 d). There were significant differences in the MRT among different habitats. It was shortest in the artificial larch forest at 7.67 ± 6.80 d (1–28 d). In contrast, the MRT in the broad-leaved forest was the longest at 17.52 ± 12.91 d (4–60 d). There were significant differences in the MRT between the artificial larch forest and the other habitats. There was less predation of the three types of seeds at the mixed-forest edge, and the most seeds were dispersed. The rates of predation of the P. koraiensis, C. mandshurica, and Q. mongolica seeds were 28.33%, 15.83%, and 44.0%, and 59.17%, 84.17%, and 48.0% of the seeds were dispersed, respectively. The average dispersal distances of all the seeds were less than 6 m, and the longest distance recorded was 18.66 m. The dispersal distances and burial depths differed significantly among the four types of habitats. The distance of seed dispersal was primarily distributed in 1–6 m

ZENODO

Directory of Open Access Journals

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

ARPHA Preprints